Last updated: April 19, 2026
Application No. 18/066,713
AUDIO INTERFERENCE CANCELLATION

Non-Final OA §103
Filed
Dec 15, 2022
Examiner
LEE, EUNICE SOMIN
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Comcast Cable Communications LLC
OA Round
3 (Non-Final)
Interview Optional

— +27.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 27 resolved cases, 2023–2026
Examiner Intelligence

LEE, EUNICE SOMIN View full profile →
Grants 89% — above average
Career Allow Rate
24 granted / 27 resolved
+26.9% vs TC avg
Strong +27% interview lift
Without
With
+27.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
20 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
18.7%
-21.3% vs TC avg
§103
53.0%
+13.0% vs TC avg
§102
7.3%
-32.7% vs TC avg
§112
2.7%
-37.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 27 resolved cases
Office Action

§103
DETAILED ACTION
This communication is responsive to the applicant’s amendment dated January 21, 2026. 
Claims 1, 3 - 9, 11- 17 and 19 - 20 are pending and have been examined. 
Claims 1, 9 and 17 are independent.
Claims 2, 10 and 18 are cancelled.


           

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .



Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on January 21, 2026.


Response to Amendment
This communication is responsive to the applicant’s amendment dated January 21, 2026. Applicant amended Claims 1, 3 - 5, 7 - 9, 11 - 13, 15 - 17, and 19 - 20. 


Response to Arguments 
Applicant has provided the following argument (see remarks page 11):

    PNG
    media_image1.png
    385
    1010
    media_image1.png
    Greyscale

In Reply, Applicant’s characterization of Defraene in view of Pan is not accurate. “Attenuation of audio”/ “not attenuating audio” is known from prior art with different beamforming zones (first beamforming zone, second beamforming zone, etc.), different frequency bands (second frequency band, first frequency band, etc.), different power levels (power level of the first audio, power level of the second audio, etc.) is obvious to person(s) having ordinary skill in the art, straightforward, and amounts to normal teachings of Defraene in view of Pan. The combination teaches "determining that a power level of first audio, received by one or more microphones of the keyword detection device and associated with the first audio source, is greater than a power level of second audio, received by the one or more microphones of the keyword detection device and associated with the second audio source, in a first frequency band associated with the first beamforming zone and the second beamforming zone, determining not to attenuate audio received from the first beamforming zone and within the first frequency band." Defraene discloses “tracking different talkers (i.e., the claimed “first audio” / “first audio source”/“first talker”), “second audio” / “second audio source”/“second talker”), in different frequency bands (i.e., the claimed “first frequency band”, “second frequency band”) positioned in different angular directions (i.e., the claimed “first beamforming zone”, “second beamforming zone”)” (Defraene, Par. 0090). Defraene discloses “power (i.e., the claimed “power level” of “first audio”/  “second audio”) that satisfies a predetermined threshold (i.e., the claimed “is greater than a power level),” (Defraene, Par. 0090). According to Specification Par. 20 of the instant Application, "attenuation of audio" is based on "beamformer" performing "filtering" that "makes use of the fact” different audio (first audio, second audio, etc.) “tend to arrive at the device from different directions". Defraene teaches the fundamental elements in the Applicant’s Specification Par. 20: "beamformer" performing "filtering" that makes use of "contributions from the desired angular direction" can be “cancelled”/ “not cancelled” /attenuated/“not attenuated” via filtering (Defraene, Par. 0053). Pan also teaches the fundamental elements in the Applicant’s Specification Par. 20: Pan explicitly teaches causing “attenuating audio signals” by “beamforming” performing “filtering” (Pan, Col. 5:32- 45) and Pan also teaches determining not to attenuate audio: “amplify audio signals (i.e., the claimed “determining not to attenuate audio”) from directions other than the direction of an audio source (i.e., the claimed “audio” received from the “first beamforming zone”/ “second beamforming zone”).” Col. 9:22-27; Figure 7B, “Direction 1”, “Direction 2”, etc (i.e, the claimed “zone”: “Zone 1”, “Zone 2”, etc. Figure 4 of the instant Application) in order to mitigate “noise from non-desired directions” and advance capabilities of “speech recognition combined with natural language understanding processing” systems (Pan, Col. 1:18-19). Taken as a whole, the combination teaches “determining that a power level of first audio, received by one or more microphones of the keyword detection device and associated with the first audio source, is greater than a power level of second audio, received by the one or more microphones of the keyword detection device and associated with the second audio source, in a first frequency band associated with the first beamforming zone and the second beamforming zone, determining not to attenuate audio received from the first beamforming zone and within the first frequency band”:
determining that a power level of first audio, received by one or more microphones of the keyword detection device and associated with the first audio source (Defraene, “Different local minimum speech-leakage-estimation-powers (i.e, the claimed “power level of (first, second) audio) can correspond (i.e., the claimed “associate”) to speech signals from different talkers (i.e, the claimed “associated with the (first, second) audio source), either positioned in different angular directions (i.e., the claimed “beamforming zones”) or talking in different frequency bands (i.e, (first, second) frequency band) because the different talkers have voices in different pitch registers. In this way, signal processors of the present disclosure can track different talkers, in different frequency bands, or positioned in different angular directions.” Par. 0089”; Defraene, “determine the error-power-signal and the noise-reference power-signal based on the selected subset of frequency bins (i.e, the claimed “(first, second) power level).” Par. 0134; Defraene, “Signal processors of the present disclosure can be relevant to many multi-microphone (i.e., the claimed “one or more microphones”) speech enhancement and interference cancellation tasks, e.g. noise cancellation, dereverberation, echo cancellation and source localization. The possible applications of signal processors of the present disclosure include multi-microphone (i.e., the claimed “on or more microphones”) voice communication systems, front-ends for automatic speech recognition systems (ASR) (i.e., the claimed “keyword detection device”), and hearing assistive devices.” Par. 0123, is greater than a power level of second audio, (Defraene, “each speech leakage measure that has a speech leakage-estimation-power (i.e., the claimed “power level of second audio) that satisfies a predetermined threshold (i.e., the claimed “is greater than a power level of second audio”),” Par. 0090; According the Wikipedia, “satisfies” indicates “meets or exceeds” (i.e., the claimed “greater than”); received by the one or more microphones of the keyword detection device and associated with the second audio source, (Defraene, “Signal processors of the present disclosure can be relevant to many multi-microphone (i.e., the claimed “one or more microphones”) speech enhancement and interference cancellation tasks, e.g. noise cancellation, dereverberation, echo cancellation and source localization. The possible applications of signal processors of the present disclosure include multi-microphone (i.e., the claimed “on or more microphones”) voice communication systems, front-ends for automatic speech recognition systems (ASR) (i.e., the claimed “keyword detection device”), and hearing assistive devices.” Par. 0123; Defraene, “second talker (i.e., the claimed “second audio associated with the second audio source”),” Par. 0063) in a first frequency band associated with the first beamforming zone (Defraene, “The first-speech signal can be based on a first speech-reference signal and a first noise-reference signal provided by a first beamforming module focusing a beam into a first angular direction (i.e., the claimed “first beamforming zone”). The first beamforming-module can process the first-frequency sub-band signals (i.e., the claimed “first frequency band”)”, Par. 0064) and the second beamforming zone (Defraene, “It will be appreciated that tracking based on frequency band may be combined with tracking based on using different angular directions (i.e., the claimed “first beamforming zone and the second beamforming zone”) in the same signal processor.” Par. 0064), determining not to attenuate audio received from the first beamforming zone and within the first frequency band. (Defraene, “Conversely, for a beam focusing into an undesired direction (i.e., the claimed “audio received from the first beamforming zone”), the speech leakage into a noise reference signal is expected to be high (i.e., the claimed “determining not to attenuate audio”).” Par. 0048; Pan, “attenuating audio signals that originate from other directions” by “beamforming” performing “filtering” (Pan, Col. 5:32- 45); “The device 110 may also operate an adaptive noise canceller (ANC) unit 460 to amplify audio signals (i.e., the claimed “determining not to attenuate audio”) from directions other than the direction of an audio source (i.e., the claimed “audio received from the first beamforming zone”).” Col. 9:22-27; Figure 7B, “Direction 1”, “Direction 2”, etc. (i.e, the claimed “zone”: “Zone 1”, “Zone 2”, etc. Figure 4 of the instant Application; Defraene, “It will be appreciated that tracking based on frequency band (i.e., the claimed “first frequency band”) may be combined with tracking based on using different angular directions (i.e., the claimed “first beamforming zone”/ “second beamforming zone”) in the same signal processor.” Par. 0064)

Applicant has provided the following argument (see remarks page 11):

    PNG
    media_image2.png
    107
    1018
    media_image2.png
    Greyscale

In Reply, Applicant’s characterization of Defraene in view of Pan is not accurate. 
Regarding “determining not to attenuate audio received from the first beamforming zone and within the first frequency band”, please refer to the above reply. Taken as a whole, the combination teaches “determining not to attenuate audio received from the first beamforming zone and within the first frequency band” as detailed in the above reply.

Applicant has provided the following argument (see remarks page 11):

    PNG
    media_image2.png
    107
    1018
    media_image2.png
    Greyscale

In Reply, Applicant’s characterization of Defraene in view of Pan is not accurate. 
The combination teaches  “determining not to attenuate audio”. According to Specification Par. 20 of the instant Application, “not to attenuate”/ "attenuation of audio" is based on "beamformer" performing "filtering" that "makes use of the fact” different audio (first audio, second audio, etc.) “tend to arrive at the device from different directions". Defraene teaches the fundamental elements in the Applicant’s Specification Par. 20: "beamformer" performing "filtering" that makes use of "contributions from the desired angular direction" can be “cancelled”/attenuated/ not cancelled/ “not attenuated” via filtering (Defraene, Par. 0026, Par. 0053). Pan also teaches the fundamental elements in the Applicant’s Specification Par. 20: Pan teaches causing “attenuating audio signals” by “beamforming” performing “filtering” (Pan, Col. 5:32- 45) and Pan also teaches determining not to attenuate audio: “amplify audio signals (i.e., the claimed “determining not to attenuate audio”) from directions other than the direction of an audio source (i.e., the claimed “audio” received from the “first beamforming zone”/ “second beamforming zone”).” Col. 9:22-27; Figure 7B, “Direction 1”, “Direction 2”, etc (i.e, the claimed “zone”: “Zone 1”, “Zone 2”, etc. Figure 4 of the instant Application) in order to mitigate “noise from non-desired directions” and advance capabilities of “speech recognition combined with natural language understanding processing” systems (Pan, Col. 1:18-19). Taken as a whole, the combination teaches “determining not to attenuate audio”. 

Applicant has provided the following argument (see remarks page 12):

    PNG
    media_image3.png
    143
    1032
    media_image3.png
    Greyscale

In Reply, Applicant’s characterization of Defraene in view of Pan is not accurate. Regarding “based on determining that a power level of first audio, received by one or more microphones of the keyword detection device and associated with the first audio source, is greater than a power level of second audio, received by the one or more microphones of the keyword detection device and associated with the second audio source, in a first frequency band associated with the first beamforming zone and the second beamforming zone,” please refer to the above reply. Taken as a whole, the combination teaches “based on determining that a power level of first audio, received by one or more microphones of the keyword detection device and associated with the first audio source, is greater than a power level of second audio, received by the one or more microphones of the keyword detection device and associated with the second audio source, in a first frequency band associated with the first beamforming zone and the second beamforming zone” as detailed in the above reply.

Applicant has provided the following argument (see remarks page 12):

    PNG
    media_image3.png
    143
    1032
    media_image3.png
    Greyscale

In Reply, Applicant’s characterization of Defraene in view of Pan is not accurate. Regarding "a power level of first audio, received by one or more microphones of the keyword detection device and associated with the first audio source is greater than a power level of second audio, received by the one or more microphones of the keyword detection device and associated with the second audio source, in a first frequency band associated with the first beamforming zone and the second beamforming zone," please refer to the above reply. Taken as a whole, the combination teaches "a power level of first audio, received by one or more microphones of the keyword detection device and associated with the first audio source is greater than a power level of second audio, received by the one or more microphones of the keyword detection device and associated with the second audio source, in a first frequency band associated with the first beamforming zone and the second beamforming zone," as detailed in the above reply.

Applicant has provided the following argument (see remarks page 12):

    PNG
    media_image4.png
    304
    1055
    media_image4.png
    Greyscale

In Reply, Applicant’s characterization of Defraene in view of Pan is not accurate. “Attentuation of audio”/ “not attenuating audio” is known from prior art with different beamforming zones (first beamforming zone, second beamforming zone, etc.), different frequency bands (second frequency band, first frequency band, etc.), different power levels (power level of the first audio, power level of the second audio, etc.) is obvious to person(s) having ordinary skill in the art, straightforward, and amounts to normal teachings of Defraene in view of Pan. The combination teaches "based on determining that the power level of the second audio is greater than the power level of the first audio in a second frequency band associated with the first beamforming zone and the second beamforming zone, causing attenuation of audio received from the first beamforming zone and within the second frequency band." Defraene discloses “tracking different talkers (i.e., the claimed “first audio” / “first audio source”/“first talker”), “second audio” / “second audio source”/“second talker”), in different frequency bands (i.e., the claimed “first frequency band”, “second frequency band”) positioned in different angular directions (i.e., the claimed “first beamforming zone”, “second beamforming zone”)” (Defraene, Par. 0090). Defraene discloses “power (i.e., the claimed “power level” of “first audio”/  “second audio”) that satisfies a predetermined threshold (i.e., the claimed “is greater than a power level),” (Defraene, Par. 0090). According to Specification Par. 20 of the instant Application, "attenuation of audio" is based on "beamformer" performing "filtering" that "makes use of the fact” different audio (first audio, second audio, etc.) “tend to arrive at the device from different directions". Defraene teaches the fundamental elements in the Applicant’s Specification Par. 20: "beamformer" performing "filtering" that makes use of "contributions from the desired angular direction" can be “cancelled”/attentuated via filtering (Defraene, Par. 0053). Pan also teaches the fundamental elements in the Applicant’s Specification Par. 20: Pan explicitly teaches causing “attenuating audio signals” by “beamforming” performing “filtering” (Pan, Col. 5:32- 45) and Pan also teaches determining not to attenuate audio: “amplify audio signals (i.e., the claimed “determining not to attenuate audio”) from directions other than the direction of an audio source (i.e., the claimed “audio” received from the “first beamforming zone”/ “second beamforming zone”).” Col. 9:22-27; Figure 7B, “Direction 1”, “Direction 2”, etc (i.e, the claimed “zone”: “Zone 1”, “Zone 2”, etc. Figure 4 of the instant Application) in order to mitigate “noise from non-desired directions” and advance capabilities of “speech recognition combined with natural language understanding processing” systems (Pan, Col. 1:18-19). Taken as a whole, the combination teaches "based on determining that the power level of the second audio is greater than the power level of the first audio in a second frequency band associated with the first beamforming zone and the second beamforming zone, causing attenuation of audio received from the first beamforming zone and within the second frequency band":
"based on determining that the power level of the second audio (Defraene, “Different local minimum speech-leakage-estimation-powers (i.e, the claimed “power level of (first, second) audio) can correspond (i.e., the claimed “associate”) to speech signals from different talkers (i.e, the claimed “associated with the (first, second) audio source), either positioned in different angular directions (i.e., the claimed “beamforming zones”) or talking in different frequency bands (i.e, (first, second) frequency band) because the different talkers have voices in different pitch registers. In this way, signal processors of the present disclosure can track different talkers, in different frequency bands, or positioned in different angular directions.” Par. 0089”; Defraene, “determine the error-power-signal and the noise-reference power-signal based on the selected subset of frequency bins (i.e, the claimed “(first, second) power level).” Par. 0134; is greater than the power level of the first audio (Defraene, “each speech leakage measure that has a speech leakage-estimation-power (i.e., the claimed “power level of first audio) that satisfies a predetermined threshold (i.e., the claimed “is greater than a power level of first audio”),” Par. 0090; According the Wikipedia, “satisfies” indicates “meets or exceeds” (i.e., the claimed “greater than”); in a second frequency band (Defraene, “…the second frequency range (i.e., the  claimed “second frequency band”) can be chosen to match a frequency range (i.e., the claimed “frequency band”) of a second talker (i.e., the claimed “second audio associated with the second audio source”),” Par. 0063) associated with the first beamforming zone and the second beamforming zone (Defraene, “It will be appreciated that tracking based on frequency band (i.e., the claimed “second frequency band) may be combined with tracking based on using different angular directions (i.e., the claimed “first beamforming zone and the second beamforming zone”) in the same signal processor.” Par. 0064), causing attenuation of audio  (Pan, “attenuating audio signals that originate from other directions” by “beamforming” performing “filtering” (Pan, Col. 5:32- 45); received from the first beamforming zone and within the second frequency band" (Defraene, “It will be appreciated that tracking based on frequency band (i.e., the claimed “second frequency band) may be combined with tracking based on using different angular directions (i.e., the claimed “first beamforming zone and within the second beamforming zone”) in the same signal processor.” Par. 0064; Defraene, “In one or more embodiments, the plurality of beamforming-modules (i.e., “beamformer”) may each comprise a noise-canceller block configured to: adaptively filter the respective noise-reference-signal to provide a respective filtered-noise-signal; and subtract the filtered-noise-signal (i.e., subtraction attenuates signal) from the respective speech-reference-signal to provide the respective beamformer output signal.” Par. 0026; Defraene,“The second beamforming module can process the second-frequency-sub band signals (i.e., the claimed “second frequency band”). In such cases, the first angular direction (i.e., the claimed “first beamforming zone”) may or may not be different than the second angular direction (i.e., the claimed “second beamforming zone”).” Par. 0064)

Applicant has provided the following argument (see remarks page 12 - 13):

    PNG
    media_image5.png
    40
    1022
    media_image5.png
    Greyscale


    PNG
    media_image6.png
    67
    1012
    media_image6.png
    Greyscale

In Reply, Applicant’s characterization of Defraene in view of Pan is not accurate. The combination teaches “causing attenuation of audio”. According to Specification Par. 20 of the instant Application, "attenuation of audio" is based on "beamformer" performing "filtering" that "makes use of the fact” different audio (first audio, second audio, etc.) “tend to arrive at the device from different directions". Defraene teaches the fundamental elements in the Applicant’s Specification Par. 20: "beamformer" performing "filtering" that makes use of "contributions from the desired angular direction" can be “cancelled”/attentuated via filtering (Defraene, Par. 0053). Pan also teaches the fundamental elements in the Applicant’s Specification Par. 20: Pan explicitly teaches causing “attenuating audio signals” by “beamforming” performing “filtering” (Pan, Col. 5:32- 45) and Pan also teaches determining not to attenuate audio: “amplify audio signals (i.e., the claimed “determining not to attenuate audio”) from directions other than the direction of an audio source (i.e., the claimed “audio” received from the “first beamforming zone”/ “second beamforming zone”).” Col. 9:22-27; Figure 7B, “Direction 1”, “Direction 2”, etc (i.e, the claimed “zone”: “Zone 1”, “Zone 2”, etc. Figure 4 of the instant Application) in order to mitigate “noise from non-desired directions” and advance capabilities of “speech recognition combined with natural language understanding processing” systems (Pan, Col. 1:18-19). Taken as a whole, the combination teaches “causing attenuation of audio”.

Applicant has provided the following argument (see remarks page 13):

    PNG
    media_image7.png
    145
    1017
    media_image7.png
    Greyscale

In Reply, Applicant’s characterization of Defraene in view of Pan is not accurate. Regarding “causing attenuation of audio received from the first beamforming zone and within the second frequency band” and “based on determining that the power level of the second audio is greater than the power level of the first audio in a second frequency band associated with the first beamforming zone and the second beamforming zone” please refer to the above reply. Taken as a whole, the combination teaches “causing attenuation of audio received from the first beamforming zone and within the second frequency band” and “based on determining that the power level of the second audio is greater than the power level of the first audio in a second frequency band associated with the first beamforming zone and the second beamforming zone” as detailed in the above reply.

Applicant has provided the following argument (see remarks page 13):

    PNG
    media_image7.png
    145
    1017
    media_image7.png
    Greyscale

In Reply, Applicant’s characterization of Defraene in view of Pan is not accurate. Regarding “the power level of the second audio [associated with the second source]" being "greater than the power level of the first audio [associated with the first source] in a second frequency band associated with the first beamforming zone and the second beamforming zone” / “the power level of the second audio is greater than the power level of the first audio in a second frequency band associated with the first beamforming zone and the second beamforming zone”, please refer to the above reply. Taken as a whole, the combination teaches “the power level of the second audio [associated with the second source]" being "greater than the power level of the first audio [associated with the first source] in a second frequency band associated with the first beamforming zone and the second beamforming zone” / “the power level of the second audio is greater than the power level of the first audio in a second frequency band associated with the first beamforming zone and the second beamforming zone” as detailed in the above reply.


Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.


Claims 1, 3 - 9, 11- 17 and 19 - 20 are rejected under 35 U.S.C. 103(a) as being unpatentable over Defraene et al., (U.S. Patent Application Publication 2018/0359560) in view of Pan et al., (U.S. Patent 11,483,646), hereinafter referred to as Pan, Chatlani et al., (U.S. Patent 11,425,494), hereinafter referred to as Chatlani, Tetelbaum et al., (U.S. Patent Application Publication 2014/0335917), hereinafter referred to as Tetelbaum, and Zhang et al., (U.S. Patent Application Publication 11,404,073), hereinafter referred to as Zhang.
Regarding Claims 1, 9 and 17, Defraene teaches:
1. A method comprising, 9. A computer-readable medium storing instructions that, when executed, cause, and 17. A device comprising: one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the device to: [Defraene, “In other examples, the set of instructions/methods illustrated herein and data and instructions associated there with are stored in respective storage devices, which are implemented as one or more non-transient machine or computer-readable or computer-usable storage media or mediums.” Par. 0130; “Such instructions are loaded for execution on a processor (such as one or more CPUs).” Par. 0129]
determining, by a keyword detection device, [Defraene, “Signal processors of the present disclosure can be relevant to many multi-microphone speech enhancement and interference cancellation tasks, e.g. noise cancellation, dereverberation, echo cancellation and source localization. The possible applications of signal processors of the present disclosure include multi-microphone voice communication systems, front-ends for automatic speech recognition systems (ASR) (i.e., the claimed “keyword detection device”), and hearing assistive devices.” Par. 0123; “In the context of speech enhancement, multi-microphone acoustic beamforming systems (i.e., the claimed “key word detection device”) can be used for performing interference cancellation, by exploiting spatial information of a desired speech signal and an undesired interference signal.” Par. 0040]
a first beamforming zone associated with a location of a first audio source, [Defraene, “The first-speech signal can be based on a first speech-reference signal and a first noise-reference signal provided by a first beamforming module focusing a beam into a first angular direction (i.e., the claimed “first beamforming zone”). The first beamforming-module can process the first-frequency sub-band signals (i.e., the claimed “first frequency band”)”, Par. 0064; “The first-speech-signal can be based on a first speech-reference-signal and a first noise-reference-signal provided by a first beamforming-module (i.e., the claimed “key word detection device”) focusing a beam into a first angular direction (i.e., the claimed “zone”) (Referring to the Specification of the instant Application [0045] indicates that beamformer may “point in a different direction” and may be “associated with a beamforming zone”/position/location). The first beamforming-module can process the first-frequency-sub-band-signals. Similarly, the second-speech-signal can be based on a second speech reference- signal and a second noise-reference-signal provided by a second beamforming-module focusing a beam into a second angular direction. The second beamforming module can process the second-frequency-sub-band-signals. In such cases, the first angular direction may or may not be different than the second angular direction. In this way, the signal processor 200 can independently track speech signals from two different talkers, who may or may not be located in different positions, and provide a output signal that includes noise cancelled representations of both different speech signals. The output signal can be provided as either a single signal, or as multiple sub-signals as described above.” Par. 0064]
wherein the first beamforming zone is associated with a first portion of an area proximate the keyword detection device [Defraene, “The first-speech signal can be based on a first speech-reference signal and a first noise-reference signal provided by a first beamforming module focusing a beam into a first angular direction (i.e., the claimed “first beamforming zone”). The first beamforming-module can process the first-frequency sub-band signals (i.e., the claimed “first frequency band”)”, Par. 0064; “In the context of speech enhancement, multi-microphone acoustic beamforming systems (i.e., the claimed “key word detection device”) can be used for performing interference cancellation, by exploiting spatial (i.e., the claimed “proximate”) information of a desired speech signal and an undesired interference signal.” Par. 0040; According to Vocabulary.com, “proximate” is “short distance”. Defraene teaches beamforming calculates distance/”proximate” microphone of the keyword detection device”: “The beamforming module 300…distance Dmic is small (i.e., the claimed “proximate”),” Par. 0067]
determining that a power level of first audio, received by one or more microphones of the keyword detection device and associated with the first audio source, [Defraene, “Different local minimum speech-leakage-estimation-powers (i.e, the claimed “power level of first audio can correspond (i.e., the claimed “associate”) to speech signals from different talkers (i.e, the claimed “associated with the first audio source, either positioned in different angular directions (i.e., the claimed “beamforming zones”) or talking in different frequency bands (i.e, (first, second) frequency band) because the different talkers have voices in different pitch registers. In this way, signal processors of the present disclosure can track different talkers, in different frequency bands, or positioned in different angular directions.” Par. 0089”; Defraene, “determine the error-power-signal (i.e, the claimed “power level of first audio) and the noise-reference power-signal (i.e, the claimed “power level of first audio) based on the selected subset of frequency bins.” Par. 0134; Defraene, “Signal processors of the present disclosure can be relevant to many multi-microphone (i.e., the claimed “one or more microphones”) speech enhancement and interference cancellation tasks, e.g. noise cancellation, dereverberation, echo cancellation and source localization. The possible applications of signal processors of the present disclosure include multi-microphone (i.e., the claimed “on or more microphones”) voice communication systems, front-ends for automatic speech recognition systems (ASR) (i.e., the claimed “keyword detection device”), and hearing assistive devices.” Par. 0123]
is greater than a power level of second audio, [Defraene, “each speech leakage measure that has a speech leakage-estimation-power (i.e., the claimed “power level of second audio) that satisfies a predetermined threshold (i.e., the claimed “is greater than a power level of second audio”),” Par. 0090; According the Wikipedia, “satisfies” indicates “meets or exceeds” (i.e., the claimed “greater than”).]
received by the one or more microphones of the keyword detection device and associated with the second audio source, [Defraene, “Signal processors of the present disclosure can be relevant to many multi-microphone (i.e., the claimed “one or more microphones”) speech enhancement and interference cancellation tasks, e.g. noise cancellation, dereverberation, echo cancellation and source localization. The possible applications of signal processors of the present disclosure include multi-microphone (i.e., the claimed “on or more microphones”) voice communication systems, front-ends for automatic speech recognition systems (ASR) (i.e., the claimed “keyword detection device”), and hearing assistive devices.” Par. 0123; Defraene, “second talker (i.e., the claimed “second audio associated with the second audio source”),” Par. 0063]
in a first frequency band associated with the first beamforming zone and the second beamforming zone, [Defraene, “The first-speech signal can be based on a first speech-reference signal and a first noise-reference signal provided by a first beamforming module focusing a beam into a first angular direction (i.e., the claimed “first beamforming zone”). The first beamforming-module can process the first-frequency sub-band signals (i.e., the claimed “first frequency band”)”, Par. 0064; “It will be appreciated that tracking based on frequency band (i.e., the claimed “first frequency band”) may be combined with tracking based on using different angular directions (i.e., the claimed “first beamforming zone and the second beamforming zone”) in the same signal processor.” Par. 0064]
determining not to attenuate audio [According to Specification Par. 20 of the instant Application, “not to attenuate”/ "attenuation of audio" is based on "beamformer" performing "filtering" that "makes use of the fact” different audio (first audio, second audio, etc.) “tend to arrive at the device from different directions". Defraene teaches the fundamental elements in the Applicant’s Specification Par. 20: "beamformer" performing "filtering" that makes use of "contributions from the desired angular direction" can be “cancelled”/“not cancelled”/ attenuated/not attenuated via filtering (Defraene, Par. 0053): “In one or more embodiments, the plurality of beamforming-modules (i.e., “beamformer”) may each comprise a noise-canceller block configured to: adaptively filter the respective noise-reference-signal to provide a respective filtered-noise-signal; and subtract the filtered-noise-signal (i.e., subtraction attenuates signal) from the respective speech-reference-signal to provide the respective beamformer output signal.” Par. 0026; “In each respective noise-canceller block 228, the respective noise-reference-signal 226 v.sub.i(n) is adaptively cancelled (i.e., “attenuate”) from the respective speech-reference-signal 224 s.sub.i(n), to provide respective beamformer output signals 230 ŝ.sub.i(n), which can collectively be described as beamformer-signalling. There is no specific requirement for the filter structure or design procedure for either the fixed beamformers 220 or the adaptive noise cancellers 228. As discussed above, each of the fixed beamformers 220 can steer a constructive beam in a respective desired angular direction, while the associated adaptive noise canceller 228 can cancel (i.e., “attenuate”) contributions from the desired angular direction”. Par. 0053; “Conversely, for a beam focusing into an undesired direction (i.e., the claimed “audio received from the first beamforming zone”), the speech leakage into a noise reference signal is expected to be high (i.e., the claimed “not to attenuate audio”).” Par. 0048]
received from the first beamforming zone and within the first frequency band; and [Defraene, “It will be appreciated that tracking based on frequency band (i.e., the claimed “first frequency band”) may be combined with tracking based on using different angular directions (i.e., the claimed “first beamforming zone”) in the same signal processor.” Par. 0064]
based on determining that the power level of the second audio [Defraene, “Different local minimum speech-leakage-estimation-powers (i.e, the claimed “power level of second audio) can correspond (i.e., the claimed “associate”) to speech signals from different talkers (i.e, the claimed “associated with the second audio source), either positioned in different angular directions (i.e., the claimed “beamforming zones”) or talking in different frequency bands (i.e, (first, second) frequency band) because the different talkers have voices in different pitch registers. In this way, signal processors of the present disclosure can track different talkers, in different frequency bands, or positioned in different angular directions.” Par. 0089”; Defraene, “determine the error-power-signal (i.e, the claimed “power level of the second audio”) and the noise-reference power-signal (i.e, the claimed “power level of the second audio”) based on the selected subset of frequency bins.” Par. 0134]
is greater than the power level of the first audio [Defraene, “each speech leakage measure that has a speech leakage-estimation-power (i.e., the claimed “power level of first audio) that satisfies a predetermined threshold (i.e., the claimed “is greater than a power level of first audio”),” Par. 0090; According the Wikipedia, “satisfies” indicates “meets or exceeds” (i.e., the claimed “greater than”).]
in a second frequency band [Defraene, “…the second frequency range (i.e., the  claimed “second frequency band”) can be chosen to match a frequency range (i.e., the claimed “frequency band”) of a second talker (i.e., the claimed “second audio associated with the second audio source”),” Par. 0063]
associated with the first beamforming zone and the second beamforming zone, [Defraene, “It will be appreciated that tracking based on frequency band (i.e., the claimed “second frequency band) may be combined with tracking based on using different angular directions (i.e., the claimed “first beamforming zone and the second beamforming zone”) in the same signal processor.” Par. 0064]
causing attenuation of audio [According to Specification Par. 20 of the instant Application, "attenuation of audio" is based on "beamformer" performing "filtering" that "makes use of the fact” different audio (first audio, second audio, etc.) “tend to arrive at the device from different directions". Defraene teaches the fundamental elements in the Applicant’s Specification Par. 20: "beamformer" performing "filtering" that makes use of "contributions from the desired angular direction" can be “cancelled”/attenuated via filtering (Defraene, Par. 0053):: “In one or more embodiments, the plurality of beamforming-modules (i.e., “beamformer”) may each comprise a noise-canceller block configured to: adaptively filter the respective noise-reference-signal to provide a respective filtered-noise-signal; and subtract the filtered-noise-signal (i.e., subtraction attenuates signal) from the respective speech-reference-signal to provide the respective beamformer output signal.” Par. 0026; “In each respective noise-canceller block 228, the respective noise-reference-signal 226 v.sub.i(n) is adaptively cancelled (i.e., “attenuate”) from the respective speech-reference-signal 224 s.sub.i(n), to provide respective beamformer output signals 230 ŝ.sub.i(n), which can collectively be described as beamformer-signalling. There is no specific requirement for the filter structure or design procedure for either the fixed beamformers 220 or the adaptive noise cancellers 228. As discussed above, each of the fixed beamformers 220 can steer a constructive beam in a respective desired angular direction, while the associated adaptive noise canceller 228 can cancel (i.e., “attenuate”) contributions from the desired angular direction”. Par. 0053]
received from the first beamforming zone and within the second frequency band [Defraene, “It will be appreciated that tracking based on frequency band (i.e., the claimed “second frequency band) may be combined with tracking based on using different angular directions (i.e., the claimed “first beamforming zone and within the second beamforming zone”) in the same signal processor.” Par. 0064; Defraene,“The second beamforming module can process the second-frequency-sub band signals (i.e., the claimed “second frequency band”). In such cases, the first angular direction (i.e., the claimed “first beamforming zone”) may or may not be different than the second angular direction (i.e., the claimed “second beamforming zone”).” Par. 0064]

Defraene fails to explicitly teach proximate, determining a first beamforming zone associated with a location of a first audio source and determining a second beamforming zone associated with a location of a second audio source. While Defraene teaches the limitations greater than and not attenuating, Defraene does not expressly state greater than and not attenuate. 
However, Pan teaches:
determining, by a keyword detection device, a first beamforming zone associated with a location of a first audio source; [Pan, Figure 7B, “Direction 1”, “Direction 2”, etc. (i.e, the claimed “zone”: “Zone 1”, “Zone 2”, etc. Figure 4 of the instant Application); “Certain devices capable of capturing speech for speech processing may operate using a microphone array comprising multiple microphones (i.e., the claimed “keyword detection device”), where beamforming techniques may be used to isolate desired audio including speech.” Col. 2:35-40; “The combination of speech recognition and natural language understanding processing (i.e., the claimed “keyword detection”) techniques is commonly referred to as speech processing.” Col. 1:21-24; “For example, if audio data corresponding to a user's speech is first detected and/or is most strongly detected by microphone 502g (i.e., the claimed audio source), the device may determine that the user is located in a location in direction 7 (i.e., the claimed “zone”).” Col. 8: 51-54; “For example, direction 1 (i.e., the claimed “zone”) is associated with microphone 502a (i.e., the claimed audio source), direction 2 is associated with microphone 502b, and so on.” Col. 8:18-20]
determining a second beamforming zone associated with a location of a second audio source; [Pan, Figure 7B, “Direction 1”, “Direction 2”, etc. (i.e, the claimed “zone”: “Zone 1”, “Zone 2”, etc. Figure 4 of the instant Application); “For example, if audio data corresponding to a user's speech is first detected and/or is most strongly detected by microphone 502g (i.e., the claimed audio source), the device may determine that the user is located in a location in direction 7 (i.e., the claimed “zone”).” Col. 8: 51-54; “For example, direction 1 is associated with microphone 502a, direction 2 (i.e., the claimed “zone”) is associated with microphone 502b (i.e., the claimed audio source), and so on.” Col. 8:18-20]
based on determining that the power level of the second audio is greater than the power level of the first audio [Pan, “Based on the DI (directivity index) values, the system 100 may determine an average DI value within a desired frequency range for the first direction of interest (i.e., determining that the power level of the second audio is greater than the power level of the first audio).” Col. 34: 21-28; “For example, the system 100 may identify the best average DI (directivity index) values (e.g., power values) for the first direction of interest. (i.e., determining that the power level of the second audio is greater than the power level of the first audio)”, Col. 34:51-53]
determining not to attenuate audio [According to Specification Par. 20 of the instant Application, “not to attenuate”/ "attenuation of audio" is based on "beamformer" performing "filtering" that "makes use of the fact” different audio (first audio, second audio, etc.) “tend to arrive at the device from different directions". Pan teaches the fundamental elements in the Applicant’s Specification Par. 20: Pan explicitly teaches causing “attenuating audio signals” by “beamforming” performing “filtering” (Pan, Col. 5:32- 45) and Pan also teaches determining not to attenuate audio: “amplify audio signals (i.e., the claimed “determining not to attenuate audio”) from directions other than the direction of an audio source (i.e., the claimed “audio” received from the “first beamforming zone”/ “second beamforming zone”).” Col. 9:22-27; Figure 7B, “Direction 1”, “Direction 2”, etc (i.e, the claimed “zone”: “Zone 1”, “Zone 2”, etc. Figure 4 of the instant Application) in order to mitigate “noise from non-desired directions” and advance capabilities of “speech recognition combined with natural language understanding processing” systems (Pan, Col. 1:18-19). Pan, “attenuating audio signals that originate from other directions” by “beamforming” performing “filtering” (Pan, Col. 5:32- 45); “The device 110 may also operate an adaptive noise canceller (ANC) unit 460 to amplify audio signals (i.e., the claimed “determining not to attenuate audio”) from directions other than the direction of an audio source (i.e., the claimed “audio received from the first beamforming zone”).” Col. 9:22-27; Figure 7B, “Direction 1”, “Direction 2”, etc. (i.e, the claimed “zone”: “Zone 1”, “Zone 2”, etc. Figure 4 of the instant Application]
causing attenuation of audio [Pan, “attenuating audio signals that originate from other directions” by “beamforming” performing “filtering”, Col. 5:32- 45; According to Specification Par. 20 of the instant Application, "attenuation of audio" is based on "beamformer" performing "filtering" that "makes use of the fact” different audio (first audio, second audio, etc.) “tend to arrive at the device from different directions". Pan teaches the fundamental elements in the Applicant’s Specification Par. 20: Pan explicitly teaches causing “attenuating audio signals” by “beamforming” performing “filtering” (Pan, Col. 5:32- 45) and Pan also teaches determining not to attenuate audio: “amplify audio signals (i.e., the claimed “determining not to attenuate audio”) from directions other than the direction of an audio source (i.e., the claimed “audio” received from the “first beamforming zone”/ “second beamforming zone”).” Col. 9:22-27; Figure 7B, “Direction 1”, “Direction 2”, etc (i.e, the claimed “zone”: “Zone 1”, “Zone 2”, etc. Figure 4 of the instant Application) in order to mitigate “noise from non-desired directions” and advance capabilities of “speech recognition combined with natural language understanding processing” systems (Pan, Col. 1:18-19). Pan, “attenuating audio signals that originate from other directions” by “beamforming” performing “filtering” (Pan, Col. 5:32- 45); “The device 110 may also operate an adaptive noise canceller (ANC) unit 460 to amplify audio signals (i.e., the claimed “determining not to attenuate audio”) from directions other than the direction of an audio source (i.e., the claimed “audio received from the first beamforming zone”).” Col. 9:22-27; Figure 7B, “Direction 1”, “Direction 2”, etc. (i.e, the claimed “zone”: “Zone 1”, “Zone 2”, etc. Figure 4 of the instant Application]
The combination fails to explicitly teach proximate. While the combination teaches the limitations greater than and not attenuating, the combination does not expressly state greater than and not attenuating.
However, Chatlani teaches:
wherein the first beamforming zone is associated with a first portion of an area proximate the keyword detection device [Chatlani, “proximate the device 100 (i.e., the claimed “keyword detection device”),” Col. 12:36]
While the combination teaches the limitations greater than and not attenuating, the combination does not expressly state greater than and not attenuating.
However, Tetelbaum expressly states:
determining that a power level of first audio, received by one or more microphones of the keyword detection device and associated with the first audio source, is greater than a power level of second audio, [Tetelbaum, “first audio (i.e., the claimed “first audio associated with the first audio source”) beam power level being greater than a product of a second audio power level” Par. 0014; “plurality of microphones affixed to a telephone device (i.e., the claimed “one or more microphones of the keyword detection device”),” Par. 0015]
While the combination teaches not attenuating, the combination does not expressly state not attenuating.
However, Zhang expressly states:
determining not to attenuate audio [Zhang, “without attenuating (i.e., the claimed “not to attenuate”)”, Col. 3:64]
Defraene, Pan, Chatlani, Tetelbaum and Zhang pertain to speech recognition systems and are analogous to the instant invention. Accordingly, it would have been obvious to one of ordinary skill in the speech recognition systems art to modify Defraene’s teachings of "beamformer" performing "filtering" that makes use of "contributions from the desired angular direction" can be “cancelled”/attenuated/ not cancelled/ “not attenuated” via filtering (Defraene, Par. 0026, Par. 0053) with explicit teachings of “detecting by microphone 502g (i.e., the claimed audio source), the device may determine that the user is located in a location in direction 7 (i.e., the claimed “zone”)” (Pan, Col. 8: 51-54) taught by Pan, the explicit teachings of “proximate” (Chatlani, Col. 12:36) taught by Chatlani,  the explicit teachings of “greater than” (Tetelbaum, Par. 0014) taught by Tetelbaum, and the explicit teachings of “without attenuating” (i.e., the claimed “not to attenuate”) (Zhang, Col. 3:64) taught by Zhang in order to mitigate “noise from non-desired directions” and advance capabilities of “speech recognition combined with natural language understanding processing” systems (Pan, Col. 1:18-19), “isolate audio from one or more particular directions” (Chatlani, Col. 2: 12-14), improve “beamforming technology” (Tetelbaum, Par. 0003), and “identify unique sources of audible noise” (Zhang, Col. 2:44-45). Further, Pan also explicitly teaches causing “attenuating audio signals” (Pan, Col. 5:32- 45). Moreover, “attentuation of audio”/ “not attenuating audio” is known from prior art with different beamforming zones (first beamforming zone, second beamforming zone, etc.), different frequency bands (second frequency band, first frequency band, etc.), different power levels (power level of the first audio, power level of the second audio, etc.) is obvious to person(s) having ordinary skill in the art, straightforward, and amounts to normal teachings of Defraene in view of Pan, Chatlani, Tetelbaum and Zhang.

    PNG
    media_image8.png
    688
    1365
    media_image8.png
    Greyscale


Regarding Claims 3, 11 and 19, Defraene in view of Pan, Chatlani, Tetelbaum and Zhang has been discussed above. The combination further teaches:
wherein determining the first beamforming zone associated with the location of the first audio source comprises determining that the location of the first audio source is located in the first portion of the area surrounding the keyword detection device, and [Defraene, see mapping applied to claim 1; Pan, see mapping applied to claim 1; Chatlani, see mapping applied to claim 1; Tetelbaum, see mapping applied to claim 1; Zhang, see mapping applied to claim 1; Pan clearly shows Figure 3A “202b” located in the center of an area surrounding “202b”; Figure 7B, “Direction 1”, “Direction 2”, etc. (i.e, the claimed “zone”: “Zone 1”, “Zone 2”, etc. Figure 4 of the instant Application); “For example, if audio data corresponding to a user's speech is first detected and/or is most strongly detected by microphone 502g, the device may determine that the user is located in a location in direction 7.” Col. 8: 51-54; “For example, direction 1 (i.e., the claimed “first beamforming zone associated with the location of the first audio source”) is associated with microphone 502a (i.e., the claimed “first audio source”), direction 2 is associated with microphone 502b, and so on.” Col. 8:18-20; Certain devices capable of capturing speech for speech processing may operate using a microphone array comprising multiple microphones (i.e., the claimed “keyword detection device”), where beamforming techniques may be used to isolate desired audio including speech.” Col. 2:35-40; “The combination of speech recognition and natural language understanding processing (i.e., the claimed “keyword detection”) techniques is commonly referred to as speech processing.” Col. 1:21-24] 
wherein determining the second beamforming zone associated with the location of the second audio source comprises determining that the location of the second audio source is located in the second portion of the area proximate the keyword detection device. [Defraene, see mapping applied to claim 1; Pan, see mapping applied to claim 1; Chatlani, see mapping applied to claim 1; Tetelbaum, see mapping applied to claim 1; Zhang, see mapping applied to claim 1; Pan clearly shows Figure 3A “202b” located in the center of an area surrounding “202b”; Figure 7B, “Direction 1”, “Direction 2”, etc. (i.e, the claimed “zone”: “Zone 1”, “Zone 2”, etc. Figure 4 of the instant Application); “For example, if audio data corresponding to a user's speech is first detected and/or is most strongly detected by microphone 502g, the device may determine that the user is located in a location in direction 7.” Col. 8: 51-54; “For example, direction 1 is associated with microphone 502a, direction 2 (i.e., the claimed “second beamforming zone associated with the location of the second audio source”) is associated with microphone 502b, and so on.” Col. 8:18-20; Certain devices capable of capturing speech for speech processing may operate using a microphone array comprising multiple microphones (i.e., the claimed “keyword detection device”), where beamforming techniques may be used to isolate desired audio including speech.” Col. 2:35-40; “The combination of speech recognition and natural language understanding processing (i.e., the claimed “keyword detection”) techniques is commonly referred to as speech processing.” Col. 1:21-24]

Regarding Claims 4 and 12, Defraene in view of Pan, Chatlani, Tetelbaum and Zhang has been discussed above. The combination further teaches:
wherein the first beamforming zone is associated with a first measured output power level in the first frequency band and the second beamforming zone is associated with a second measured output power level in the first frequency band, and [Defraene, see mapping applied to claim 1; Pan, see mapping applied to claim 1; Chatlani, see mapping applied to claim 1; Tetelbaum, see mapping applied to claim 1; Zhang, see mapping applied to claim 1; Defraene, “Different local minimum speech-leakage-estimation-powers can correspond to speech signals from different talkers, either positioned in different angular directions or talking in different frequency bands because the different talkers have voices in different pitch registers. In this way, signal processors of the present disclosure can track different talkers, in different frequency bands, or positioned in different angular directions (i.e., the claimed “beamforming zone”).” Par. 0090; “Alternatively, the frequency bin selection can be based on a pitch estimate representative of a pitch of a speech-component of the plurality of microphone-signals, where only powers at pitch harmonic frequencies are selected.” Par. 0085;

    PNG
    media_image9.png
    74
    195
    media_image9.png
    Greyscale
“It will be appreciated that this approach can be extended straightforwardly to a speech leakage estimation where multiple frequency bands are considered independently, and the speech leakage feature is computed - as per the above described method - for each of these frequency bands separately.” Par. 0088-0089; “The first-speech-signal can be based on a first speech-reference-signal and a first noise-reference-signal provided by a first beamforming-module focusing a beam into a first angular direction (i.e., the claimed “first beamforming zone”) (Referring to the Specification of the instant Application [0045] indicates that beamformer may “point in a different direction” and may be “associated with a beamforming zone”/position/location). The first beamforming-module can process the first-frequency-sub-band-signals (i.e., the claimed “first frequency band”). Similarly, the second-speech-signal can be based on a second speech reference- signal and a second noise-reference-signal provided by a second beamforming-module focusing a beam into a second angular direction (i.e., the claimed “second beamforming zone”). The second beamforming module can process the second-frequency-sub-band-signals. In such cases, the first angular direction may or may not be different than the second angular direction. In this way, the signal processor 200 can independently track speech signals from two different talkers, who may or may not be located in different positions, and provide a output signal (i.e., the claimed “audio output”) that includes noise cancelled (i.e., the claimed “attenuation”) representations of both different speech signals. The output signal can be provided as either a single signal, or as multiple sub-signals as described above.” Par. 0064; “determine a selected-beamforming-module that is associated with the lowest speech-leakage-estimation-power” Par. 0134; “determine the error-power-signal and the noise-reference power-signal based on the selected subset of frequency bins (i.e, the claimed “(first, second) output power in the (first, second) frequency band).” Par. 0134; “Different local minimum speech-leakage-estimation-powers (i.e, the claimed “(first, second) output power) can correspond to speech signals from different talkers, either positioned in different angular directions (i.e., the claimed “beamforming zones”) or talking in different frequency bands (i.e, (first, second) frequency band) because the different talkers have voices in different pitch registers. In this way, signal processors of the present disclosure can track different talkers, in different frequency bands, or positioned in different angular directions.” Par. 0089
wherein determining that the power level of the first audio is greater than the power level of the second audio in the first frequency band comprise determining that the first measured output power level is greater than the second measured output power level. [Defraene, see mapping applied to claim 1; Pan, see mapping applied to claim 1; Chatlani, see mapping applied to claim 1; Tetelbaum, see mapping applied to claim 1; Zhang, see mapping applied to claim 1; Defraene, “each speech leakage measure that has a speech leakage-estimation-power (i.e., the claimed “power level of second audio) that satisfies a predetermined threshold (i.e., the claimed “is greater than a power level of second audio”),” Par. 0090; According the Wikipedia, “satisfies” indicates “meets or exceeds” (i.e., the claimed “greater than”). Pan, “Based on the DI (directivity index) values, the system 100 may determine an average DI value within a desired frequency range for the first direction of interest (i.e., determine that the first audio associated with the first audio source dominates the first frequency band).” Col. 34: 21-28; “For example, the system 100 may identify the best average DI (directivity index) values (e.g., power values) for the first direction of interest. (i.e., determine that the first audio associated with the first audio source dominates the first frequency band)”, Col. 34:51-53; “The first-speech-signal (i.e., the claimed “first audio associated with the first audio source”) can be based on a first speech-reference-signal and a first noise-reference-signal provided by a first beamforming-module focusing a beam into a first angular direction. The first beamforming-module can process the first-frequency-sub-band-signals (i.e., the claimed “first frequency band”).” Par. 0064; “In some examples the output-signal 216 may be a linear combination of the first-speech-signal and the second-speech-signal. The first-speech-signal can be based on a first-frequency-sub band-signal representative of a first filtered representation of the input-signalling, the first filtered representation spanning a first frequency range. The second-speech-signal can be based on a second-frequency-sub-band-signal representative of a second filtered representation of the input-signalling, the second filtered representation spanning a second frequency range.” Par. 0062; Pan, “…determining the best power values (i.e., the claimed “power level of the first audio is greater than the power level of the second audio”), and/or determining the virtual microphones 804 and/or filter coefficient value(s) to associate with each physical micro-phone 802 for each direction of interest, the disclosure is not limited thereto.” Col. 22:27-31; “FIG. 9B illustrates an example of an average DI (directivity index) chart 920 that indicates average power values (e.g., average of power values within a frequency range) corresponding to look direction,” Col. 59-61; “Instead, the power values may be determined using any technique known to one of skill in the art, such as determining a white noise gain (WNG) value or the like.” Col. 28:1-3; Pan teaches determining first measured output power level is greater than the second measured output power level: “The system 100 may select (1222) a first direction of interest and may determine (1224) a best pair of virtual microphones for the first direction of interest. For example, system 100 may determine the best power values associated with the first direction (i.e., the claimed “power level of the first audio is greater than the power level of the second audio”) of interest and identify the pair of virtual microphones corresponding to the best power values. In some examples, there may be multiple power values that are similar to each other, and the system 100 may select a pair of virtual microphones based on other considerations and/or criteria in addition to the power values in a specific direction of interest, such as power values across multiple directions of interest or the like. For example, a first pair of virtual microphones may perform well across a wide range of look directions (e.g., have high power values from 0 degrees to 100 degrees), whereas a second pair of virtual microphones may perform extremely well in a narrow range of look directions (e.g., have high power values from 0 degrees to 30 degrees) but have weak performance in other directions (e.g., have low power values from 30 degrees to 100 degrees). Thus, instead of selecting the second pair of virtual microphones from 0 degrees to 30 degrees (e.g., as the second pair outperforms the first pair within this range) and selecting the first pair of virtual microphones from 30 degrees to 100 degrees, the system 100 may instead select the first pair of virtual microphones from 0 degrees to 100 degrees (e.g., despite the first pair of virtual microphones not having the highest power values between 0-30 degrees).” Col. 31:35-61]

Regarding Claims 5 and 13, Defraene in view of Pan, Chatlani, Tetelbaum and Zhang has been discussed above. The combination further teaches:
wherein the first beamforming zone is associated with a first measured output power level in the second frequency band and the second beamforming zone is associated with a second measured output power level in the second frequency band, and [Claim is directed to repeating the subject matter another audio is greater than another audio (first audio greater than second audio, second audio greater than first audio, etc.). However, repeating steps known from prior art with different audio (first audio greater than second audio, second audio greater than first audio, etc.) is straightforward, amounts to the normal use of the teachings of Defraene in view of Pan, Chatlani, Tetelbaum and Zhang and are rejected under similar rationale; Defraene, see mapping applied to claims 1, 4; Pan, see mapping applied to claims 1, 4; Chatlani, see mapping applied to claims 1, 4; Tetelbaum, see mapping applied to claims 1, 4; Zhang, see mapping applied to claims 1, 4; Defraene, “Different local minimum speech-leakage-estimation-powers can correspond to speech signals from different talkers, either positioned in different angular directions or talking in different frequency bands because the different talkers have voices in different pitch registers. In this way, signal processors of the present disclosure can track different talkers, in different frequency bands, or positioned in different angular directions (i.e., the claimed “beamforming zone”).” Par. 0090; “Alternatively, the frequency bin selection can be based on a pitch estimate representative of a pitch of a speech-component of the plurality of microphone-signals, where only powers at pitch harmonic frequencies are selected.” Par. 0085;

    PNG
    media_image9.png
    74
    195
    media_image9.png
    Greyscale
“It will be appreciated that this approach can be extended straightforwardly to a speech leakage estimation where multiple frequency bands are considered independently, and the speech leakage feature is computed - as per the above described method - for each of these frequency bands separately.” Par. 0088-0089; “The first-speech-signal can be based on a first speech-reference-signal and a first noise-reference-signal provided by a first beamforming-module focusing a beam into a first angular direction (i.e., the claimed “first beamforming zone”) (Referring to the Specification of the instant Application [0045] indicates that beamformer may “point in a different direction” and may be “associated with a beamforming zone”/position/location). The first beamforming-module can process the first-frequency-sub-band-signals (i.e., the claimed “first frequency band”). Similarly, the second-speech-signal can be based on a second speech reference- signal and a second noise-reference-signal provided by a second beamforming-module focusing a beam into a second angular direction (i.e., the claimed “second beamforming zone”). The second beamforming module can process the second-frequency-sub-band-signals. In such cases, the first angular direction may or may not be different than the second angular direction. In this way, the signal processor 200 can independently track speech signals from two different talkers, who may or may not be located in different positions, and provide a output signal (i.e., the claimed “audio output”) that includes noise cancelled (i.e., the claimed “attenuation”) representations of both different speech signals. The output signal can be provided as either a single signal, or as multiple sub-signals as described above.” Par. 0064; “determine a selected-beamforming-module that is associated with the lowest speech-leakage-estimation-power” Par. 0134; “determine the error-power-signal and the noise-reference power-signal based on the selected subset of frequency bins (i.e, the claimed “(first, second) output power in the (first, second) frequency band)”.” Par. 0134; “Different local minimum speech-leakage-estimation-powers (i.e, the claimed “(first, second) output power) can correspond to speech signals from different talkers, either positioned in different angular directions (i.e., the claimed “beamforming zones”) or talking in different frequency bands (i.e, (first, second) frequency band) because the different talkers have voices in different pitch registers. In this way, signal processors of the present disclosure can track different talkers, in different frequency bands, or positioned in different angular directions.” Par. 0089]
wherein determining that the power level of the second audio is greater than the power level of the first audio in the second frequency band comprises determining that the second measured output level is greater than the first measured output level. [Defraene, see mapping applied to claim 1; Pan, see mapping applied to claim 1; Chatlani, see mapping applied to claims 1, 4; Tetelbaum, see mapping applied to claims 1, 4; Zhang, see mapping applied to claims 1, 4; Defraene, Defraene, “each speech leakage measure that has a speech leakage-estimation-power (i.e., the claimed “power level of second audio) that satisfies a predetermined threshold (i.e., the claimed “is greater than a power level of second audio”),” Par. 0090; According the Wikipedia, “satisfies” indicates “meets or exceeds” (i.e., the claimed “greater than”). Similarly, the second-speech-signal (i.e., the claimed “second audio”) can be based on a second speech reference- signal and a second noise-reference-signal provided by a second beamforming-module focusing a beam into a second angular direction (i.e., the claimed “second beamforming zone”). The second beamforming module can process the second-frequency-sub-band-signals (i.e., the claimed “second frequency band”).” Par. 0064; “In some examples the output-signal 216 may be a linear combination of the first-speech-signal and the second-speech-signal. The first-speech-signal can be based on a first-frequency-sub band-signal representative of a first filtered representation of the input-signalling, the first filtered representation spanning a first frequency range. The second-speech-signal can be based on a second-frequency-sub-band-signal representative of a second filtered representation of the input-signalling, the second filtered representation spanning a second frequency range.” Par. 0062; Pan, “…determining the best power values (i.e., the claimed “power level of the second audio is greater than the power level of the first audio”), and/or determining the virtual microphones 804 and/or filter coefficient value(s) to associate with each physical micro-phone 802 for each direction of interest, the disclosure is not limited thereto.” Col. 22:27-31; “FIG. 9B illustrates an example of an average DI (directivity index) chart 920 that indicates average power values (e.g., average of power values within a frequency range) corresponding to look direction,” Col. 59-61; “Instead, the power values may be determined using any technique known to one of skill in the art, such as determining a white noise gain (WNG) value or the like.” Col. 28:1-3; Pan teaches in Figure 12 conceptual steps to determine the second measured output power level is greater than the first measured output power level: “1216 Select desired frequency range(s) (i.e., the claimed “second frequency band)”, “1218 Determine average power values for individual beam angles within desired frequency range(s) (i.e, the claimed “second measured output level” and “first measured output level”), “1222 Select first direction of interest (i.e., the claimed “beamforming zone”)”, “1224 Determine best (i.e., claimed “determine the second measured output power level is greater than the first measured output power level”) the pair of virtual microphones for first direction of interest” (i.e., the claimed “beamforming zone”); “Similarly, the device 110 may select the second audio data as a second target signal and the first audio data and the third audio data as second reference signals,” Col. 17: 30-33]

Regarding Claims 6 and 14, Defraene in view of Pan, Chatlani, Tetelbaum and Zhang has been discussed above. The combination further teaches:
wherein the first audio source comprises a human and the first audio comprises a voice command spoken by the human. [Defraene, see mapping applied to claim 1; Pan, see mapping applied to claim 1; Chatlani, see mapping applied to claim 1; Tetelbaum, see mapping applied to claim 1; Zhang, see mapping applied to claim 1; Pan, see mapping applied to claim 1; Defraene, “Signal processors of the present disclosure can be used for improving human-to-machine interaction for mobile and smart home applications through noise reduction, echo cancellation and dereverberation.” Par. 0124; “The beam selection module 600 has a speech activity detector 602 that is configured to detect presence of a speech component in a plurality of microphone-signals (not shown), such as when the microphone signals contain speech signals (i.e., the claimed “audio source”/ “voice command”) from a talker (i.e., the claimed “spoken by a human”). Par. 0091; “Different local minimum speech-leakage-estimation-powers can correspond to speech signals (i.e., the claimed “audio source”/ “voice command”) from different talkers (i.e., the claimed “spoken by a human”), either positioned in different angular directions or talking in different frequency bands because the different talkers have voices in different pitch registers. In this way, signal processors of the present disclosure can track different talkers, in different frequency bands, or positioned in different angular directions.” Par. 0089; “The first-speech-signal (i.e., the claimed “first audio source”) can be based on a first speech-reference-signal and a first noise-reference-signal provided by a first beamforming-module focusing a beam into a first angular direction. The first beamforming-module can process the first-frequency-sub-band-signals.” Par. 0064]

Regarding Claims 7 and 15, Defraene in view of Pan, Chatlani, Tetelbaum and Zhang has been discussed above. The combination further teaches:
causing processing of the audio received from the first beamforming zone and within the first frequency band to determine the voice command. [Defraene, see mapping applied to claim 1; Pan, see mapping applied to claim 1; Chatlani, see mapping applied to claim 1; Tetelbaum, see mapping applied to claim 1; Zhang, see mapping applied to claim 1; Defraene, “The first-speech-signal can be based on a first speech-reference-signal and a first noise-reference-signal provided by a first beamforming-module focusing a beam into a first angular direction (the claimed “first beamforming zone”) (Referring to the Specification of the instant Application [0045] indicates that beamformer may “point in a different direction” and may be “associated with a beamforming zone”/position/location). The first beamforming-module can process the first-frequency-sub-band-signals (i.e., the claimed “first frequency band”). Similarly, the second-speech-signal can be based on a second speech reference- signal and a second noise-reference-signal provided by a second beamforming-module focusing a beam into a second angular direction. The second beamforming module can process the second-frequency-sub-band-signals. In such cases, the first angular direction may or may not be different than the second angular direction. In this way, the signal processor 200 can independently track speech signals from two different talkers (i.e., the claimed “determine the voice command”), who may or may not be located in different positions, and provide a output signal (i.e., the claimed “audio output”) that includes noise cancelled representations of both different speech signals. The output signal can be provided as either a single signal, or as multiple sub-signals as described above.” Par. 0064]

Regarding Claims 8 and 16, Defraene in view of Pan, Chatlani, Tetelbaum and Zhang has been discussed above. The combination further teaches:
further comprising retaining the audio in the first frequency band based on determining not to attenuate the audio from the first beamforming zone and within the first frequency band. [Defraene, see mapping applied to claim 1; Pan, see mapping applied to claim 1; Chatlani, see mapping applied to claim 1; Tetelbaum, see mapping applied to claim 1; Zhang, see mapping applied to claim 1; “In the context of speech enhancement, multi-microphone acoustic beamforming systems can be used for performing interference cancellation, by exploiting spatial information (i.e., the claimed “zone”) of a desired speech signal (i.e., retaining the desired audio) and an undesired interference signal.” Par. 0040; “The first-speech-signal (i.e., the claimed “first audio”) can be based on a first speech-reference-signal and a first noise-reference-signal provided by a first beamforming-module focusing a beam into a first angular direction (the claimed “first beamforming zone”) (Referring to the Specification of the instant Application [0045] indicates that beamformer may “point in a different direction” and may be “associated with a beamforming zone”/position/location). The first beamforming-module can process the first-frequency-sub-band-signals (i.e., the claimed “first frequency band”).” Par. 0064; “For example, the error-power-signal 532 Pe(k) and/or the noise reference-power-signal 536 Pvf(k) may be computed in the frequency domain, retaining only a particular selected subset of frequency bins in the power computation. This frequency bin selection can be based on a speech activity detection (i.e., the claimed “retaining the first audio in the first frequency band”).” Par. 0085; “This bandpass filtering can be advantageous in finding correlations in the relevant frequency band where speech signals can be dominant.” Par. 0078; Pan, “For example, if audio data corresponding to a user's speech is first detected and/or is most strongly detected by microphone 502g, the device may determine that the user is located in a location in direction 7 (for example, the claimed “first beamforming zone”). Using a FBF unit or other such component, the device may isolate audio coming from direction 7 (for example, the claimed “first beamforming zone”) using techniques known to the art and/or explained herein. Thus, as shown in FIG. 4B, the device 110 may boost audio (i.e., the claimed “not to attenuate audio”/ “retaining the firstD audio”) coming from direction 7 (for example the claimed “first beamforming zone”), thus increasing the amplitude of audio data corresponding to speech from user 301 relative to other audio captured from other directions. In this manner, noise from diffuse sources that is coming from all the other directions will be dampened relative to the desired audio (i.e., for example the claimed “retaining the first audio”) (e.g., speech from user 301) coming from direction 7 (for example the claimed “first beamforming zone”).]

Regarding Claim 20, Defraene in view of Pan, Chatlani, Tetelbaum and Zhang has been discussed above. The combination further teaches:
wherein the first beamforming zone is associated with a first measured output power level in the first frequency band and the second beamforming zone is associated with a second measured output power level in the first frequency band, [Defraene, see mapping applied to claims 1, 4, 5; Pan, see mapping applied to claims 1, 4, 5; Chatlani, see mapping applied to claim 1, 4, 5; Tetelbaum, see mapping applied to claim 1, 4, 5; Zhang, see mapping applied to claim 1, 4, 5; Defraene, “Different local minimum speech-leakage-estimation-powers can correspond to speech signals from different talkers, either positioned in different angular directions or talking in different frequency bands because the different talkers have voices in different pitch registers. In this way, signal processors of the present disclosure can track different talkers, in different frequency bands, or positioned in different angular directions (i.e., the claimed “beamforming zone”).” Par. 0090; “Alternatively, the frequency bin selection can be based on a pitch estimate representative of a pitch of a speech-component of the plurality of microphone-signals, where only powers at pitch harmonic frequencies are selected.” Par. 0085;

    PNG
    media_image9.png
    74
    195
    media_image9.png
    Greyscale
“It will be appreciated that this approach can be extended straightforwardly to a speech leakage estimation where multiple frequency bands are considered independently, and the speech leakage feature is computed - as per the above described method - for each of these frequency bands separately.” Par. 0088-0089; “The first-speech-signal can be based on a first speech-reference-signal and a first noise-reference-signal provided by a first beamforming-module focusing a beam into a first angular direction (i.e., the claimed “first beamforming zone”) (Referring to the Specification of the instant Application [0045] indicates that beamformer may “point in a different direction” and may be “associated with a beamforming zone”/position/location). The first beamforming-module can process the first-frequency-sub-band-signals (i.e., the claimed “first frequency band”). Similarly, the second-speech-signal can be based on a second speech reference- signal and a second noise-reference-signal provided by a second beamforming-module focusing a beam into a second angular direction (i.e., the claimed “second beamforming zone”). The second beamforming module can process the second-frequency-sub-band-signals. In such cases, the first angular direction may or may not be different than the second angular direction. In this way, the signal processor 200 can independently track speech signals from two different talkers, who may or may not be located in different positions, and provide a output signal (i.e., the claimed “audio output”) that includes noise cancelled (i.e., the claimed “attenuation”) representations of both different speech signals. The output signal can be provided as either a single signal, or as multiple sub-signals as described above.” Par. 0064; “determine a selected-beamforming-module that is associated with the lowest speech-leakage-estimation-power” Par. 0134; “determine the error-power-signal and the noise-reference power-signal based on the selected subset of frequency bins (i.e, the claimed “(first, second) output power in the (first, second) frequency band).” Par. 0134; “Different local minimum speech-leakage-estimation-powers (i.e, the claimed “(first, second) output power) can correspond to speech signals from different talkers, either positioned in different angular directions (i.e., the claimed “beamforming zones”) or talking in different frequency bands (i.e, (first, second) frequency band) because the different talkers have voices in different pitch registers. In this way, signal processors of the present disclosure can track different talkers, in different frequency bands, or positioned in different angular directions.” Par. 0089;
wherein the instructions that, when executed by the one or more processors, cause the device to determine that the power level of the first audio is greater than the power level of the second audio in the first frequency band cause the device to determine that the first measured output power level in the first frequency band is greater than the second measured output power level in the first frequency band, [Defraene, see mapping applied to claims 1, 4, 5; Pan, see mapping applied to claims 1, 4, 5; Chatlani, see mapping applied to claim 1, 4, 5; Tetelbaum, see mapping applied to claim 1, 4, 5; Zhang, see mapping applied to claim 1, 4, 5; Defraene, “In other examples, the set of instructions/methods illustrated herein and data and instructions associated there with are stored in respective storage devices, which are implemented as one or more non-transient machine or computer-readable or computer-usable storage media or mediums.” Par. 0130; “Such instructions are loaded for execution on a processor (such as one or more CPUs).” Par. 0129; Pan, “In operation, the device 110 may include computer-readable and computer executable instructions that reside on the device, as will be discussed further below.” Col. 46:26-29; “The system 100 may apply (1326) the filter coefficient values to input audio data from the physical microphones and may determine (1328) power values and an average power value within a desired frequency range.” Col. 34:14-17; “Based on the DI (directivity index) values, the system 100 may determine an average DI value within a desired frequency range for the first direction of interest (i.e., determine that the first audio associated with the first audio source dominates the first frequency band).” Col. 34: 21-28; “For example, the system 100 may identify the best average DI (directivity index) values (e.g., power values) for the first direction of interest. (i.e., determine that the first audio associated with the first audio source dominates the first frequency band)”, Col. 34:51-53; “…determining the best power values, and/or determining the virtual microphones 804 and/or filter coefficient value(s) to associate with each physical micro-phone 802 for each direction of interest, the disclosure is not limited thereto.” Col. 22:27-31; “FIG. 9B illustrates an example of an average DI (directivity index) chart 920 that indicates average power values (e.g., average of power values within a frequency range) corresponding to look direction,” Col. 59-61; “Instead, the power values may be determined using any technique known to one of skill in the art, such as determining a white noise gain (WNG) value or the like.” Col. 28:1-3; Pan teaches determining first measured output power level is greater than the second measured output power level: “The system 100 may select (1222) a first direction of interest and may determine (1224) a best pair of virtual microphones for the first direction of interest. For example, system 100 may determine the best power values associated with the first direction of interest and identify the pair of virtual microphones corresponding to the best power values. In some examples, there may be multiple power values that are similar to each other, and the system 100 may select a pair of virtual microphones based on other considerations and/or criteria in addition to the power values in a specific direction of interest, such as power values across multiple directions of interest or the like. For example, a first pair of virtual microphones may perform well across a wide range of look directions (e.g., have high power values from 0 degrees to 100 degrees), whereas a second pair of virtual microphones may perform extremely well in a narrow range of look directions (e.g., have high power values from 0 degrees to 30 degrees) but have weak performance in other directions (e.g., have low power values from 30 degrees to 100 degrees). Thus, instead of selecting the second pair of virtual microphones from 0 degrees to 30 degrees (e.g., as the second pair outperforms the first pair within this range) and selecting the first pair of virtual microphones from 30 degrees to 100 degrees, the system 100 may instead select the first pair of virtual microphones from 0 degrees to 100 degrees (e.g., despite the first pair of virtual microphones not having the highest power values between 0-30 degrees).” Col. 31:35-61
wherein the first beamforming zone is associated with a first measured output power level in the second frequency band and the second beamforming zone is associated with a second measured output power level in the second frequency band, and [Defraene, see mapping applied to claims 1, 4, 5; Pan, see mapping applied to claims 1, 4, 5; Chatlani, see mapping applied to claim 1, 4, 5; Tetelbaum, see mapping applied to claim 1, 4, 5; Zhang, see mapping applied to claim 1, 4, 5;  [Defraene, “Different local minimum speech-leakage-estimation-powers can correspond to speech signals from different talkers, either positioned in different angular directions or talking in different frequency bands because the different talkers have voices in different pitch registers. In this way, signal processors of the present disclosure can track different talkers, in different frequency bands, or positioned in different angular directions (i.e., the claimed “beamforming zone”).” Par. 0090; “Alternatively, the frequency bin selection can be based on a pitch estimate representative of a pitch of a speech-component of the plurality of microphone-signals, where only powers at pitch harmonic frequencies are selected.” Par. 0085;

    PNG
    media_image9.png
    74
    195
    media_image9.png
    Greyscale
“It will be appreciated that this approach can be extended straightforwardly to a speech leakage estimation where multiple frequency bands are considered independently, and the speech leakage feature is computed - as per the above described method - for each of these frequency bands separately.” Par. 0088-0089; “The first-speech-signal can be based on a first speech-reference-signal and a first noise-reference-signal provided by a first beamforming-module focusing a beam into a first angular direction (i.e., the claimed “first beamforming zone”) (Referring to the Specification of the instant Application [0045] indicates that beamformer may “point in a different direction” and may be “associated with a beamforming zone”/position/location). The first beamforming-module can process the first-frequency-sub-band-signals (i.e., the claimed “first frequency band”). Similarly, the second-speech-signal can be based on a second speech reference- signal and a second noise-reference-signal provided by a second beamforming-module focusing a beam into a second angular direction (i.e., the claimed “second beamforming zone”). The second beamforming module can process the second-frequency-sub-band-signals. In such cases, the first angular direction may or may not be different than the second angular direction. In this way, the signal processor 200 can independently track speech signals from two different talkers, who may or may not be located in different positions, and provide a output signal (i.e., the claimed “audio output”) that includes noise cancelled (i.e., the claimed “attenuation”) representations of both different speech signals. The output signal can be provided as either a single signal, or as multiple sub-signals as described above.” Par. 0064; “determine a selected-beamforming-module that is associated with the lowest speech-leakage-estimation-power” Par. 0134; “determine the error-power-signal and the noise-reference power-signal based on the selected subset of frequency bins (i.e, the claimed “(first, second) output power in the (first, second) frequency band).” Par. 0134; “Different local minimum speech-leakage-estimation-powers (i.e, the claimed “(first, second) output power) can correspond to speech signals from different talkers, either positioned in different angular directions (i.e., the claimed “beamforming zones”) or talking in different frequency bands (i.e, (first, second) frequency band) because the different talkers have voices in different pitch registers. In this way, signal processors of the present disclosure can track different talkers, in different frequency bands, or positioned in different angular directions.” Par. 0089]
wherein the instructions that cause the device to determine that the power level of the second audio is greater than the power level of the first audio in the second frequency band cause the device to determine that the second measured output level in the second frequency band is greater than the first measured output level in the second frequency band. [Claim is directed to repeating the subject matter another audio is greater than another audio (first audio greater than second audio, second audio greater than first audio, etc.). However, repeating steps known from prior art with different audio (first audio greater than second audio, second audio greater than first audio, etc.) is straightforward, amounts to the normal use of the teachings of Defraene in view of Pan, Chatlani, Tetelbaum and Zhang and are rejected under similar rationale; Defraene, see mapping applied to claims 1, 4, 5; Pan, see mapping applied to claims 1, 4, 5; Chatlani, see mapping applied to claim 1, 4, 5; Tetelbaum, see mapping applied to claim 1, 4, 5; Zhang, see mapping applied to claim 1, 4, 5; Pan, “In operation, the device 110 may include computer-readable and computer executable instructions that reside on the device, as will be discussed further below.” Col. 46:26-29; “The system 100 may apply (1326) the filter coefficient values to input audio data from the physical microphones and may determine (1328) power values and an average power value within a desired frequency range.” Col. 34:14-17; “Based on the DI (directivity index) values, the system 100 may determine an average DI value within a desired frequency range (i.e., second frequency band) for the (i.e, the claimed “second”) direction of interest. The system 100 may determine (1330) if there is an additional direction of interest and, if so, may loop to step 1322 and repeat steps 1322-1328 for the additional direction of interest.” Col. 34:21-28; “In some examples, there may be multiple power values that are similar to each other, and the system 100 may select the best power values (i.e., second audio associated with the second audio source dominates the second frequency band) based on other considerations and/or criteria,” Col.34:64-67; Defraene, Similarly, the second-speech-signal (i.e., the claimed “second audio associated with the second audio source”) can be based on a second speech reference- signal and a second noise-reference-signal provided by a second beamforming-module focusing a beam into a second angular direction (i.e., the claimed “second beamforming zone”). The second beamforming module can process the second-frequency-sub-band-signals (i.e., the claimed “second frequency band”).” Par. 0064; “In some examples the output-signal 216 may be a linear combination of the first-speech-signal and the second-speech-signal. The first-speech-signal can be based on a first-frequency-sub band-signal representative of a first filtered representation of the input-signalling, the first filtered representation spanning a first frequency range. The second-speech-signal can be based on a second-frequency-sub-band-signal representative of a second filtered representation of the input-signalling, the second filtered representation spanning a second frequency range (i.e., second audio associated with the second audio source dominates the second frequency band).” Par. 0062]


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Leblang, et al., (U.S. Patent 11,862,168) teaches “attenuation of audio”/ “not attenuating audio” with different beamforming zones, different frequency bands, different power levels.
Dickens, et al., (U.S. Patent Application Publication 2023/0319190) teaches “attenuation of audio”/ “not attenuating audio” with different beamforming zones, different frequency bands, different power levels..
Rosenwein, et al., (U.S. Patent Application Publication 2022/0312128) teaches “attenuation of audio”/ “not attenuating audio” with different beamforming zones, different frequency bands, different power levels.
Elliot, et al., (U.S. Patent 11,416,210) teaches “attenuation of audio”/ “not attenuating audio” with different beamforming zones, different frequency bands, different power levels.
Raghavan et al., (U.S. Patent Application Publication 2020/0235791) teaches beamforming devices for audio interference.
Marquez et al. (U.S. Patent Application Publication 2018/0226065) teaches beamforming devices for audio interference.
Ayrapetian et al. (U.S. Patent 10,522,167) teaches audio noise cancellation using deep neural networks.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to EUNICE LEE whose telephone number is 571-272-1886. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/EUNICE LEE/Examiner, Art Unit 2656                                                                                                                                                                                                        /BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656
Read full office action
Prosecution Timeline

Dec 15, 2022
Application Filed
Mar 13, 2025
Non-Final Rejection — §103
Jun 20, 2025
Response Filed
Aug 20, 2025
Final Rejection — §103
Oct 22, 2025
Response after Non-Final Action
Nov 24, 2025
Notice of Allowance
Nov 24, 2025
Response after Non-Final Action
Dec 09, 2025
Response after Non-Final Action
Jan 21, 2026
Request for Continued Examination
Jan 28, 2026
Response after Non-Final Action
Mar 09, 2026
Examiner Interview (Telephonic)
Mar 12, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/449,809
Patent 12603078
GENERATING SPEECH DATA USING ARTIFICIAL INTELLIGENCE TECHNIQUES
2y 5m to grant Granted Apr 14, 2026
17/992,605
Patent 12597365
AUTOMATIC TRANSLATION BETWEEN SIGN LANGUAGE AND SPOKEN LANGUAGE
2y 5m to grant Granted Apr 07, 2026
18/205,615
Patent 12585876
METHOD OF TRAINING POS TAGGING MODEL, COMPUTER-READABLE RECORDING MEDIUM AND POS TAGGING METHOD
2y 5m to grant Granted Mar 24, 2026
18/518,786
Patent 12579385
EMBEDDED TRANSLATE, SUMMARIZE, AND AUTO READ
2y 5m to grant Granted Mar 17, 2026
18/140,389
Patent 12566928
READABILITY BASED CONFIDENCE SCORE FOR LARGE LANGUAGE MODELS
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
89%
Grant Probability
99%
With Interview (+27.3%)
2y 10m
Median Time to Grant
High
PTA Risk
Based on 27 resolved cases by this examiner. Grant probability derived from career allow rate.
AUDIO INTERFERENCE CANCELLATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email