Last updated: April 19, 2026
Application No. 18/327,418
DEEP AHS: A DEEP LEARNING APPROACH TO ACOUSTIC HOWLING SUPPRESSION

Non-Final OA §103§DP
Filed
Jun 01, 2023
Examiner
CASTILLO-TORRES, KEISHA Y
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Tencent America LLC
OA Round
3 (Non-Final)
Interview Optional

— +30.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 108 resolved cases, 2023–2026
Examiner Intelligence

CASTILLO-TORRES, KEISHA Y View full profile →
Grants 74% — above average
Career Allow Rate
80 granted / 108 resolved
+12.1% vs TC avg
Strong +30% interview lift
Without
With
+30.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
32 currently pending
Career history
140
Total Applications
across all art units
Statute-Specific Performance

§101
26.2%
-13.8% vs TC avg
§103
42.9%
+2.9% vs TC avg
§102
15.1%
-24.9% vs TC avg
§112
8.8%
-31.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 108 resolved cases
Office Action

§103 §DP
DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on 02/02/2025. 
Claims 3 and 13 have been canceled by the Applicant.
Claim(s) 1-2, 4-12, and 14-20 are pending and have been examined. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/26/2026 has been entered.
 
Response to Arguments and Amendments
Amendments to the claims by the Applicant have been considered and addressed below. 
With respect to the Double Patenting and 35 USC § 103 rejections, the Applicant provides several arguments in which the Examiner will respond accordingly, below.
Double Patenting rejection(s)
Arguments on pages 9-10 in the Remarks filed on 02/02/2026
Examiner’s Response to Arguments:
Applicant’s arguments with respect to claim(s) the Double Patenting rejection(s) and associated with held in abeyance until all pending claims are found otherwise allowable have been fully considered and acknowledged.
For more details, please refer to updated Double Patenting rejections below.

35 USC § 103 rejection(s)
Arguments on pages 10-12 in the Remarks filed on 02/02/2026
Examiner’s Response to Arguments:
Applicant’s arguments with respect to claim(s) 1, 11, and 20 under 35 U.S.C. § 103, have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Zhang et al. ("Hybrid AHS: A hybrid of Kalman filter and deep learning for acoustic howling suppression." arXiv preprint arXiv:2305.02583 (2023) (https://doi.org/10.48550/arXiv.2305.02583) and further in view of Ge et al. (CN 112669868 A).
For more details, please refer to updated 35 U.S.C. § 103 rejections for claims 1, 11, and 20 (as well as all dependent claims), below.

Claim Objections
Claim 4 objected to because of the following informalities: 
“according to claim 3” should now read:
according to claim [[3]] 1. 
Appropriate correction is required.
Claim 14 objected to because of the following informalities: 
“according to claim 13” should now read:
according to claim [[13]] 11 .  
Appropriate correction is required.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-2, 4-12, 14-20 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-3, 1-12, and 16 of U.S. Patent No. US 12548585 B2 in view of Zhang et al. ("Hybrid AHS: A hybrid of Kalman filter and deep learning for acoustic howling suppression." arXiv preprint arXiv:2305.02583 (2023) (https://doi.org/10.48550/arXiv.2305.02583)). 
The claims of the issued patent are similar in scope than that of the instant application. However, the claims of the issued patent (U.S. Patent No. US 12548585 B2) does not explicitly teach but, as will be mapped further below, Zhang et al. ("Hybrid AHS: A hybrid of Kalman filter and deep learning for acoustic howling suppression." arXiv preprint arXiv:2305.02583 (2023) (https://doi.org/10.48550/arXiv.2305.02583)) does teach:
wherein the neural-network based AHS model is trained based on:
teacher-forced learning so that a target AHS signal replaces another AHS signal output from the neural-network based AHS model before the neural-network based AHS model performs subsequent computations; and
a loss function comprising a combination of a scale-invariance signal-to-distortion ratio (SI-SDR) in time domain and mean absolute error (MAE) of spectrum magnitude in frequency domain scaled by a factor of 10,000.

Claims 1-2, 4-12, 14-20 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 9-11, and 19-20 of U.S. Patent No. US 12507005 B2 in view of Zhang et al. ("Hybrid AHS: A hybrid of Kalman filter and deep learning for acoustic howling suppression." arXiv preprint arXiv:2305.02583 (2023) (https://doi.org/10.48550/arXiv.2305.02583)). 
The claims of the issued patent are similar in scope than that of the instant application. However, the claims of the issued patent (U.S. Patent No. US 12507005 B2) does not explicitly teach but, as will be mapped further below, Zhang et al. ("Hybrid AHS: A hybrid of Kalman filter and deep learning for acoustic howling suppression." arXiv preprint arXiv:2305.02583 (2023) (https://doi.org/10.48550/arXiv.2305.02583)) does teach:
wherein the neural-network based AHS model is trained based on:
teacher-forced learning so that a target AHS signal replaces another AHS signal output from the neural-network based AHS model before the neural-network based AHS model performs subsequent computations; and
a loss function comprising a combination of a scale-invariance signal-to-distortion ratio (SI-SDR) in time domain and mean absolute error (MAE) of spectrum magnitude in frequency domain scaled by a factor of 10,000.

Please see below for pertinent mappings of the instant application in comparison to the issued patent.  
Table 1 shows the overall claim mapping comparing equivalence between claims from instant application and issued patent. 
Table 2 shows the limitations of the claims 1-2 of the instant application and corresponding claims of the issued patent(s), respectively, wherein the underlined portions indicate the main differences between instant application and issued patent. 






Table 1: Overall claim mapping comparing Instant Application and Issued Patent.
Instant Application
Issued Patent
US 12548585 B2
Issued Patent
US 12507005 B2
1*
Combination of 1*, 2-3
Combination of 1*,10
11*
Combination of 9*, 10-12
11*
20*
16*
20*
2, 12
Has/have no equivalent
9, 19
4, 14
Has/have no equivalent
Has/have no equivalent
5, 15
Has/have no equivalent
Has/have no equivalent
6, 16
Has/have no equivalent
Has/have no equivalent
7, 17
Has/have no equivalent
Has/have no equivalent
8, 18
Has/have no equivalent
Has/have no equivalent
9, 19
Has/have no equivalent
Has/have no equivalent
10
Has/have no equivalent
Has/have no equivalent
Note: * denotes an independent claim




























Table 2: Claim mapping (comparing each of the limitations)
Instant Application
Issued Patent
US 12548585 B2
Issued Patent
US 12507005 B2
Claim 1:
Claim 1:
Claim 1:
1. A method of acoustic howling suppression (AHS), the method performed by at least one processor and comprising:


1. A method of hybrid acoustic howling suppression based on a frequency filter model and a deep neural network, the method being executed by at least one processor, the method comprising: 
1. A method of acoustic howling suppression (AHS), the method performed by at least one processor and comprising: 
receiving an audio signal obtained from a microphone;
receiving a speech signal, the speech signal including target speech, feedback, and noise;
receiving an audio signal obtained from a microphone; 
inputting the audio signal into a neural-network based AHS model, 
 inputting the speech signal into a trained hybrid neural-network based howling suppression model, 
inputting the audio signal into a neural-network based AHS model; 
wherein the neural-network based AHS model is trained using a training audio signal; and


wherein the trained hybrid neural-network based howling suppression model is trained using training speech signal and pre-processed acoustic feedback from a first frequency filter model; and 
training the neural-network based AHS model based on input signals which are recursively generated from the audio signal during training of the AHS model; and 
outputting an AHS signal from the neural-network based AHS model in which AHS is applied to the audio signal, 



wherein the AHS signal is a version of the audio signal in which acoustic howling noise of the audio signal is suppressed and target audio of the audio signal is sustained,


generating an enhanced speech signal with suppressed howling as an output of the trained hybrid neural-network based howling suppression model, 

wherein the enhanced speech signal is used to update parameters of the first frequency filter model.
outputting an AHS signal from the neural-network based AHS model in which AHS is applied to the audio signal, 



wherein the AHS signal is a version of the audio signal in which acoustic howling noise of the audio signal is suppressed and target audio of the audio signal is sustained.
















Claim 1 (cont’d):
Claim 2:
Claim 9:




wherein the neural-network based AHS model is trained based on:











  teacher-forced learning so that a target AHS signal replaces another AHS signal output from the neural-network based AHS model before the neural-network based AHS model performs subsequent computations; and

2. The method of claim 1,
 
wherein training the hybrid neural-network based howling suppression model comprises:
 generating a teacher speech signal, the teacher speech signal comprising a modified microphone signal, 
wherein the modified microphone signal comprises a target speech signal, a training noise signal, and a one-time playback signal, 
wherein the one-time playback signal is based on the target speech signal, and wherein the one-time playback signal replaces feedback in an initial microphone signal; and 
training the hybrid neural-network based howling suppression model for speech separation using the teacher speech signal and the pre-processed acoustic feedback from the first frequency filter model.
9. The method according to claim 8, 
wherein training the neural-network based AHS model further comprises 
a second output of the neural-network based AHS model, being output from the neural-network based AHS model depending on the at least one of the input signals, being fed back to the Kalman filter, combined with at least a next frame of the audio signal, and fed back to the neural-network based AHS model.





















Claim 1 (cont’d):
Claim 3:
Claim 10:


  a loss function comprising a combination of a scale-invariance signal-to-distortion ratio (SI-SDR) in time domain and mean absolute error (MAE) of spectrum magnitude in frequency domain scaled by a factor of 10,000.
3. The method of claim 2, 
wherein training the hybrid neural-network based howling suppression model for speech separation is based on a combined loss function, the combined loss function comprising a first component based on scale-invariance signal-to-distortion ratio and a second component based on a mean absolute error of spectrum magnitude in a frequency domain.
10. The method according to claim 1, 
wherein training the neural-network based AHS model comprises updating the neural-network based AHS model using an utterance-level mean absolute error (MAE) of real and imaginary spectrograms as loss function.




Claim 2:

Claim 9:
2. (previously presented): The method according to claim 1, further comprising 

looping the AHS signal output from the neural-network based AHS model back as an added input to the neural-network based AHS model in addition to a subsequent audio signal obtained from the microphone.

9. The method according to claim 8, wherein training the neural-network based AHS model further comprises 
a second output of the neural-network based AHS model, being output from the neural-network based AHS model depending on the at least one of the input signals, being fed back to the Kalman filter, combined with at least a next frame of the audio signal, and fed back to the neural-network based AHS model.
Note: Main differences between instant application and issued patent are underlined.


As to independent claim(s) 1, 11, and 20, the claims of the issued patents (U.S. Patent No. US 12548585 B2 / U.S. Patent No. US 12507005 B2) do not explicitly teach but, Zhang et al. does teach:
wherein the neural-network based AHS model is trained (see Fig. 1(a-b) and ¶ 2.1. Acoustic howling section on page 1, ¶ 3.3. Inputs and feature extraction on page 2, and ¶ 3.1 Problem formulation on page 2 citations as in limitations above. More specifically: ¶ 3.1 Problem formulation on page 2: “…This proposed approach is based on the assumption that the Hybrid AHS model, once properly trained, can attenuate interferences and transmit only the target speech to the loudspeaker…”) based on:
  teacher-forced learning so that a target AHS signal replaces another AHS signal output from the neural-network based AHS model before the neural-network based AHS model performs subsequent computations (see Fig. 1(a-b) and ¶ 2.1. Acoustic howling section on page 1, ¶ 3.3. Inputs and feature extraction on page 2, and ¶ 3.1 Problem formulation on page 2 citations as in limitations above. More specifically: ¶ 3.1 Problem formulation on page 2: “…To address this challenge, we follow the approach of Deep AHS [14] and adopt the teacher-forcing training strategy to formulate AHS as a speech separation problem during model training. This proposed approach is based on the assumption that the Hybrid AHS model, once properly trained, can attenuate interferences and transmit only the target speech to the loudspeaker. Consequently, the actual output ^s(t) in Figure 1(b) can be replaced with the ideal target (teacher signal) s(t) during model training, and the recursively defined microphone signal in equation (2) is converted into a mixture of target signal, background noise, and an one-time playback signal determined by s(t)…”); and 
 a loss function comprising a combination of a scale-invariance signal-to-distortion ratio (SI-SDR) in time domain and mean absolute error (MAE) of spectrum magnitude in frequency domain scaled by a factor of 10,000. (see 3.5. Loss functions on page 3: “: We utilize a combination of scale-invariance signal-to distortion ratio (SI-SDR) [23] in the time domain and mean absolute error (MAE) of spectrum magnitude in the frequency domain for model training: Loss = -SI-SDR(^s; s) + [Symbol font/0x6C]MAE(j ^ Sj; jSj) (6), [Symbol font/0x6C] is set to 10000 to balance the value range of these two losses.).

U.S. Patent No. US 12548585 B2 / U.S. Patent No. US 12507005 B2 in view of Zhang et al. ("Hybrid AHS: A hybrid of Kalman filter and deep learning for acoustic howling suppression." arXiv preprint arXiv:2305.02583 (2023) (https://doi.org/10.48550/arXiv.2305.02583)) are considered to be analogous to the claimed invention because they are in the same field of endeavor in audio/speech processing (i.e., acoustic howling suppression). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified U.S. Patent No. US 12548585 B2 / U.S. Patent No. US 12507005 B2) to incorporate the teachings of Zhang et al. wherein the neural-network based AHS model is trained based on: teacher-forced learning so that a target AHS signal replaces another AHS signal output from the neural-network based AHS model before the neural-network based AHS model performs subsequent computations; and a loss function comprising a combination of a scale-invariance signal-to-distortion ratio (SI-SDR) in time domain and mean absolute error (MAE) of spectrum magnitude in frequency domain scaled by a factor of 10,000 which provides the benefit of enhanced suppression performance and speech quality for nonlinear AHS in comparison to baseline techniques (6. Conclusion of Zhang et al.).

Regarding claim(s) 2 and 12, which depends on claim 1 and 11, is/are rejected under the same combination as applied to claims 1 and 11 above (U.S. Patent No. US 12548585 B2 / U.S. Patent No. US 12507005 B2 in view of Zhang et al.) and as presented in the 35 U.S.C. 103 rejection(s) below.

Regarding claim(s) 4 and 14, which depends on claim 3 and 13 (i.e., should depend on claims 1 and 11 – please refer to claim objections), is/are rejected under the same combination as applied to claims 1 and 11 above (U.S. Patent No. US 12548585 B2 / U.S. Patent No. US 12507005 B2 in view of Zhang et al.) and as presented in the 35 U.S.C. 103 rejection(s) below.

Regarding claim(s) 5 and 15, which depends on claim 4 and 14, is/are rejected under the same combination as applied to claims 4 and 14 above (U.S. Patent No. US 12548585 B2 / U.S. Patent No. US 12507005 B2 in view of Zhang et al.) and as presented in the 35 U.S.C. 103 rejection(s) below.

Regarding claim(s) 6 and 16, which depends on claim 4 and 14, is/are rejected under the same combination as applied to claims 4 and 14 above (U.S. Patent No. US 12548585 B2 / U.S. Patent No. US 12507005 B2 in view of Zhang et al.) and as presented in the 35 U.S.C. 103 rejection(s) below.

Regarding claim(s) 7 and 17, which depends on claim 6 and 16, is/are rejected under the same combination as applied to claims 6 and 16 above (U.S. Patent No. US 12548585 B2 / U.S. Patent No. US 12507005 B2 in view of Zhang et al.) and as presented in the 35 U.S.C. 103 rejection(s) below.
Regarding claim(s) 8 and 18, which depends on claim 7 and 17, is/are rejected under the same combination as applied to claims 7 and 17 above (U.S. Patent No. US 12548585 B2 / U.S. Patent No. US 12507005 B2 in view of Zhang et al.) and as presented in the 35 U.S.C. 103 rejection(s) below.

Regarding claim(s) 9 and 19, which depends on claim 8 and 18, is/are rejected under the same combination as applied to claims 8 and 18 above (U.S. Patent No. US 12548585 B2 / U.S. Patent No. US 12507005 B2 in view of Zhang et al.) and as presented in the 35 U.S.C. 103 rejection(s) below.

Regarding claim(s) 10, which depends on claim 9, is/are rejected under the same combination as applied to claims 4 and 14 above (U.S. Patent No. US 12548585 B2 / U.S. Patent No. US 12507005 B2 in view of Zhang et al.) and as presented in the 35 U.S.C. 103 rejection(s) below.








Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 11-12 and 14-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. "Hybrid AHS: A hybrid of Kalman filter and deep learning for acoustic howling suppression." arXiv preprint arXiv:2305.02583 (2023) (https://doi.org/10.48550/arXiv.2305.02583) and further in view of Ge et al. (CN 112669868 A).



As to independent claim 1, Zhang et al. teaches:
1. (currently amended): A method of acoustic howling suppression (AHS) (see ¶ 4 under the section 1. Introduction on page 1: “The proposed method, called Hybrid AHS, combines two approaches to address acoustic howling:…”), 
receiving an audio signal obtained from a microphone (see Fig. 1(a-b) and ¶ 2.1. Acoustic howling section on page 1: “A typical single-channel acoustic amplification system is shown in Figure 1(a). It consists of a microphone and a loudspeaker where the target speech is picked up by the microphone as s(t),…”); 
inputting the audio signal into a neural-network based AHS model (see Fig. 1(a-b) and ¶ 2.1. Acoustic howling section on page 1 citations as in limitations above and further ¶ 3.3. Inputs and feature extraction on page 2: “The DNN module, illustrated in Figure 2, accepts a preprocessed signal using the Kalman fitler e and an ideal microphone signal…”), 
wherein the neural-network based AHS model is trained using a training audio signal (see Fig. 1(a-b) and ¶ 2.1. Acoustic howling section on page 1 and further ¶ 3.3. Inputs and feature extraction on page 2 citations as in limitations above and further ¶ 3.3. Inputs and feature extraction on page 2 and further ¶ 3.1 Problem formulation on page 2: “…To address this challenge, we follow the approach of Deep AHS [14] and adopt the teacher-forcing training strategy to formulate AHS as a speech separation problem during model training. This proposed approach is based on the assumption that the Hybrid AHS model, once properly trained, can attenuate interferences and transmit only the target speech to the loudspeaker. Consequently, the actual output ^s(t) in Figure 1(b) can be replaced with the ideal target (teacher signal) s(t) during model training, and the recursively defined microphone signal in equation (2) is converted into a mixture of target signal, background noise, and an one-time playback signal determined by s(t)…”); and 
outputting an AHS signal from the neural-network based AHS model in which AHS is applied to the audio signal (see Fig. 1(a-b) and ¶ 2.1. Acoustic howling section on page 1, ¶ 3.3. Inputs and feature extraction on page 2, and ¶ 3.1 Problem formulation on page 2 citations as in limitations above. More specifically: ¶ 3.1 Problem formulation on page 2: “…This proposed approach is based on the assumption that the Hybrid AHS model, once properly trained, can attenuate interferences and transmit only the target speech to the loudspeaker. Consequently, the actual output ^s(t) in Figure 1(b)…”), 
wherein the AHS signal is a version of the audio signal in which acoustic howling noise of the audio signal is suppressed and target audio of the audio signal is sustained (see Fig. 1(a-b) and ¶ 2.1. Acoustic howling section on page 1, ¶ 3.3. Inputs and feature extraction on page 2, and ¶ 3.1 Problem formulation on page 2 citations as in limitations above. More specifically: ¶ 3.1 Problem formulation on page 2: “…This proposed approach is based on the assumption that the Hybrid AHS model, once properly trained, can attenuate interferences and transmit only the target speech to the loudspeaker. Consequently, the actual output ^s(t) in Figure 1(b)…”),
wherein the neural-network based AHS model is trained (see Fig. 1(a-b) and ¶ 2.1. Acoustic howling section on page 1, ¶ 3.3. Inputs and feature extraction on page 2, and ¶ 3.1 Problem formulation on page 2 citations as in limitations above. More specifically: ¶ 3.1 Problem formulation on page 2: “…This proposed approach is based on the assumption that the Hybrid AHS model, once properly trained, can attenuate interferences and transmit only the target speech to the loudspeaker…”) based on:
  teacher-forced learning so that a target AHS signal replaces another AHS signal output from the neural-network based AHS model before the neural-network based AHS model performs subsequent computations (see Fig. 1(a-b) and ¶ 2.1. Acoustic howling section on page 1, ¶ 3.3. Inputs and feature extraction on page 2, and ¶ 3.1 Problem formulation on page 2 citations as in limitations above. More specifically: ¶ 3.1 Problem formulation on page 2: “…To address this challenge, we follow the approach of Deep AHS [14] and adopt the teacher-forcing training strategy to formulate AHS as a speech separation problem during model training. This proposed approach is based on the assumption that the Hybrid AHS model, once properly trained, can attenuate interferences and transmit only the target speech to the loudspeaker. Consequently, the actual output ^s(t) in Figure 1(b) can be replaced with the ideal target (teacher signal) s(t) during model training, and the recursively defined microphone signal in equation (2) is converted into a mixture of target signal, background noise, and an one-time playback signal determined by s(t)…”); and 
 a loss function comprising a combination of a scale-invariance signal-to-distortion ratio (SI-SDR) in time domain and mean absolute error (MAE) of spectrum magnitude in frequency domain scaled by a factor of 10,000. (see 3.5. Loss functions on page 3: “: We utilize a combination of scale-invariance signal-to distortion ratio (SI-SDR) [23] in the time domain and mean absolute error (MAE) of spectrum magnitude in the frequency domain for model training: Loss = -SI-SDR(^s; s) + [Symbol font/0x6C]MAE(j ^ Sj; jSj) (6), [Symbol font/0x6C] is set to 10000 to balance the value range of these two losses.).

However, Zhang et al. does not explicitly teach, but Ge et al. does teach:
the method performed by at least one processor (see ¶ 5 of page 3: “a voice howling suppression device based on a command scheduling system is characterized by comprising a processor;…”)
Zhang et al. and Ge et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in speech/audio processing and enhancement (i.e., howling suppression). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhang et al. to incorporate the teachings of Ge et al. of the method performed by at least one processor which provides the benefit of calculation complexity being lower, the hardware requirement being reduced, and the practicability of voice squeal suppression being improved. (¶ 11 of page 3 of Ge et al.).

As to independent claim 11, Zhang et al. in combination with Ge et al. teach the limitations as in claim 1, above.
Ge et al. further teaches:
11. An apparatus for video coding (see ¶ 5 of page 3: “a voice howling suppression device based on a command scheduling system is characterized by comprising a processor, a memory and a computer program which is stored on the memory and can run on the processor…”), the apparatus comprising: 
at least one memory configured to store computer program code (see ¶ 5 of page 3 citation as in limitation above.); 
at least one processor configured to access the computer program code and operate as instructed by the computer program code (see ¶ 5 of page 3 citation as in limitation above.), the computer program code including:
[the limitations as taught by Zhang et al. in combination with Ge et al. as in claim 1, above.]
Examiner notes that language associated to the "video coding" in the preamble was not given any weight and suggests it is changed to "acoustic howling suppression (AHS)".

Zhang et al. and Ge et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in speech/audio processing and enhancement (i.e., howling suppression). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhang et al. to incorporate the teachings of Ge et al. of an apparatus comprising: at least one memory configured to store computer program code and at least one processor configured to access the computer program code and operate as instructed by the computer program code which provides the benefit of calculation complexity being lower, the hardware requirement being reduced, and the practicability of voice squeal suppression being improved. (¶ 11 of page 3 of Ge et al.).

 As to independent claim 20, Zhang et al. in combination with Ge et al. teach the limitations as in claim 1, above.
Ge et al. further teaches:
20. A non-transitory computer readable medium storing a program causing a computer  (see ¶ 8-9 of page 4: “Based on the same inventive concept, in addition, the present invention further provides a storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the voice howling suppression method based on the command scheduling system.
Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer-readable medium may contain any suitable combination of elements that may be modified in accordance with the requirements of statutory and patent practice in the jurisdiction, for example, in some jurisdictions, computer-readable media may not contain electrical carrier signals or telecommunications signals in accordance with statutory and patent practice.”) to:
[perform the limitations as taught by Zhang et al. in combination with Ge et al. as in claim 1, above.]
Zhang et al. and Ge et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in speech/audio processing and enhancement (i.e., howling suppression). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhang et al. to incorporate the teachings of Ge et al. of A non-transitory computer readable medium storing a program causing a computer   which provides the benefit of calculation complexity being lower, the hardware requirement being reduced, and the practicability of voice squeal suppression being improved. (¶ 11 of page 3 of Ge et al.).
Regarding claims 2 and 12, Zhang et al. in combination with Ge et al. teach all of the limitations as in claims 1 and 11, above.
Zhang et al. further teaches:
2 and 12. The method/apparatus according to claims 1 and 11, 
further comprising looping the AHS signal output from the neural-network based AHS model back as an added input to the neural-network based AHS model in addition to a subsequent audio signal obtained from the microphone (see Fig. 1(a-b) and ¶ 2.1. Acoustic howling section on page 1, ¶ 3.3. Inputs and feature extraction on page 2, and ¶ 3.1 Problem formulation on page 2 citations as in limitations above. More specifically: ¶ 3.1. Problem formulation on page 2: “Suppressing howling is best achieved by incorporating the AHS method within the acoustic loop considering the recursive nature of howling…” and further ¶ 2 of 4.2. Evaluation metrics: “In streaming inference, we insert the deep learning module into the acoustic loop and generate the enhanced signal recursively. This manner of evaluation considers the potential re-entry of leakage/distortion in the close acoustic loop and evaluates the proposed method’s real-time howling suppression performance [14]...”).

Regarding claims 4 and 14, Zhang et al. in combination with Ge et al. teach all of the limitations as in claims 3 and 13, above.
Zhang et al. further teaches:
4 and 14. The method/apparatus according to claims 3 and 13, 
wherein the neural-network based AHS model comprises a first gated recurrent unit (GRU) layer configured to apply an estimate to the audio signal (see Fig. 2 and ¶ 3.4. Network structure: “The DNN module is implemented using a self-attentive recurrent neural network (SARNN). The neural network is composed of three main parts. The first part comprises a gated recurrent unit (GRU) layer with 257 hidden units and two 1D convolution layers. These layers estimate two complex-valued filters which are applied on the input signals using deep filtering [22] to obtain intermediate outputs, denoted as ~ Y and ~E.”).

Regarding claims 5 and 15, Zhang et al. in combination with Ge et al. teach all of the limitations as in claims 4 and 14, above.
Zhang et al. further teaches:
5 and 15. The method/apparatus according to claims 4 and 14, 
wherein the first GRU layer comprises 257 hidden units and two one-dimensional (1D) convolution layers (see Fig. 2 and  ¶ 3.4. Network structure: “The DNN module is implemented using a self-attentive recurrent neural network (SARNN). The neural network is composed of three main parts. The first part comprises a gated recurrent unit (GRU) layer with 257 hidden units and two 1D convolution layers. These layers estimate two complex-valued filters which are applied on the input signals using deep filtering [22] to obtain intermediate outputs, denoted as ~ Y and ~E.”).


Regarding claims 6 and 16, Zhang et al. in combination with Ge et al. teach all of the limitations as in claims 4 and 14, above.
Zhang et al. further teaches:
6 and 16. The method/apparatus according to claims 4 and 14, 
wherein the neural-network based AHS model further comprises a second GRU layer configured to receive an output of the first GRU layer and to generate a covariance matrix of the acoustic howling noise and the target audio (see Fig. 2 and ¶ 3.4. Network structure: “…The neural network is composed of three main parts. The first part comprises a gated recurrent unit (GRU) layer with 257 hidden units and two 1D convolution layers. These layers estimate two complex-valued filters which are applied on the input signals using deep filtering [22] to obtain intermediate outputs, denoted as ~ Y and ~E. The motivation behind obtaining these intermediate outputs is that they can be used as learnt nonlinear reference signals [16, 17] and provide more information for howling suppression. Later, the LPS of these intermediate signals are concatenated with the fused feature and then used as inputs for another GRU layer. We regard Y , ~ Y , and ~E as three-channel inputs and employ two 1D convolution layers for each input channel to estimate the playback/noise and target speech components in it. The corresponding covariance matrices of playback/noise ^NN and target speech ^SS are calculated and concatenated as the input to the third part, SARNN.”).



Regarding claims 7 and 17, Zhang et al. in combination with Ge et al. teach all of the limitations as in claims 6 and 16, above.
Zhang et al. further teaches:
7 and 17. The method/apparatus according to claims 6 and 16, 
wherein the second GRU layer is configured to receive both the audio signal and the output from the first GRU layer (see Fig. 2 (input signals: Y and E; output from first GRU layer: ~Y, and ~E; received at the second GRU layer (within the Self-attentive RNN) Y, ~Y, and ~E) and ¶ 3.4. Network structure: “…Later, the LPS of these intermediate signals are concatenated with the fused feature and then used as inputs for another GRU layer. We regard Y , ~ Y , and ~E as three-channel inputs and employ two 1D convolution layers for each input channel to estimate the playback/noise and target speech components in it. The corresponding covariance matrices of playback/noise ^NN and target speech ^SS are calculated and concatenated as the input to the third part, SARNN. The SARNN part employs two linear layers, two multi-head self-attention (MHSA), a GRU, and residual connections to estimate a three-channel enhancement filter. The enhanced signal ^ S is then obtained through multi-channel deep filtering. Finally, an inverse STFT (iSTFT) is used to get waveform ^s.”).

    PNG
    media_image1.png
    318
    954
    media_image1.png
    Greyscale

Regarding claims 8 and 18, Zhang et al. in combination with Ge et al. teach all of the limitations as in claims 7 and 17, above.
Zhang et al. further teaches:
8 and 18. The method/apparatus according to claims 7 and 17, 
wherein the neural-network based AHS model further comprises an enhancement filter estimation layer comprising a self-attentive recurrent neural network (RNN) configured to provide a speech enhancement filter to an input channel of the audio signal (see Fig. 2 (input signals: Y and E; output from first GRU layer: ~Y, and ~E; received at the second GRU layer (within the Self-attentive RNN) Y, ~Y, and ~E) and ¶ 3.4. Network structure citations as in claims 7 and 17, above. More specifically: ¶ 3.4. Network structure: “The DNN module is implemented using a self-attentive recurrent neural network (SARNN)… The SARNN part employs two linear layers, two multi-head self-attention (MHSA), a GRU, and residual connections to estimate a three-channel enhancement filter. The enhanced signal ^ S is then obtained through multi-channel deep filtering. Finally, an inverse STFT (iSTFT) is used to get waveform ^s.”).

Regarding claims 9 and 19, Zhang et al. in combination with Ge et al. teach all of the limitations as in claims 8 and 18, above.
Zhang et al. further teaches:
9 and 19. The method/apparatus according to claims 8 and 18, 
wherein inputs to the first GRU layer comprise the audio signal, a normalized log-power spectra (LPS) of the audio signal, a temporal correlation of the audio signal, a frequency correlation of the audio signal, and a channel covariance of the audio signal (see Fig. 2 and ¶ 3.3. Inputs and features extracted on pages 2-3: “…The input signals, which are sampled at 16 kHz, are split into frames of 32 ms and a frame shift of 16 ms. A 512-point STFT is then performed on each frame, resulting in the frequency domain inputs, Y and E. Besides the normalized log-power spectra (LPS), we extract the correlation matrix across time frames and frequency bins of the input signals to capture the signals’ temporal and frequency dependency. These features help in differentiating between howling and tonal components. Channel covariance of input signals (Y and E) is calculated as another input feature to account for cross-correlation between them. A concatenation of these features is used for model training with a linear layer for feature fusion.”).

Regarding claim 10, Zhang et al. in combination with Ge et al. teach all of the limitations as in claim 9, above.
Zhang et al. further teaches:
10. The method according to claim 9, 
wherein the inputs to the first GRU layer comprise a concatenation of the temporal correlation of the audio signal, the frequency correlation of the audio signal, and the channel covariance of the audio signal (see Fig. 2 and ¶ 3.3. Inputs and features extracted on pages 2-3 citations as in claim 9, above. More specifically: “… Besides the normalized log-power spectra (LPS), we extract the correlation matrix across time frames and frequency bins of the input signals to capture the signals’ temporal and frequency dependency. These features help in differentiating between howling and tonal components. Channel covariance of input signals (Y and E) is calculated as another input feature to account for cross-correlation between them. A concatenation of these features is used for model training with a linear layer for feature fusion.”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Regarding speech/audio signal processing (pertinent to claims 1, 11, and 20):
Braun et al. ("A consolidated view of loss functions for supervised deep learning-based speech enhancement." 2021 44th International Conference on Telecommunications and Signal Processing (TSP). IEEE, 2021. arXiv:2009.12286) (associated with neural network related to SI-SDR and MAE – Table 2, abstract, 1. Introduction, 5.2. Results and discussion).

Zhong et al. (Zhong, M.; Tan, Y.; Li, J.; Zhang, H.; Yu, S. Cattle Number Estimation on Smart Pasture Based on Multi-Scale Information Fusion. Mathematics 2022, 10, 3856. https://doi.org/10.3390/math10203856) (associated with MAI scaling – Table 5, ¶ 3 of section 4.2.3. Ablation Experiments).

Yu et al. (“NeuralEcho: A self-attentive recurrent neural network for unified acoustic echo suppression and speech enhancement.” arXiv preprint arXiv:2205.10401 (2022)) (associated with neural network based AHS – Fig. 1, ¶ 2 of section 2.1. NeuralEcho Model, ¶ 2.3. AGC Branch).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Keisha Y Castillo-Torres whose telephone number is (571)272-3975. The examiner can normally be reached Monday - Friday, 9:00 am - 4:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Keisha Y. Castillo-Torres
Examiner
Art Unit 2659




/Keisha Y. Castillo-Torres/Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Jun 01, 2023
Application Filed
Jun 12, 2025
Non-Final Rejection — §103, §DP
Sep 16, 2025
Response Filed
Nov 21, 2025
Final Rejection — §103, §DP
Jan 23, 2026
Interview Requested
Jan 29, 2026
Applicant Interview (Telephonic)
Jan 30, 2026
Examiner Interview Summary
Feb 02, 2026
Response after Non-Final Action
Feb 26, 2026
Request for Continued Examination
Feb 27, 2026
Response after Non-Final Action
Mar 19, 2026
Non-Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/710,137
Patent 12573402
GENERATING AND/OR UTILIZING UNINTENTIONAL MEMORIZATION MEASURE(S) FOR AUTOMATIC SPEECH RECOGNITION MODEL(S)
2y 5m to grant Granted Mar 10, 2026
18/187,330
Patent 12536989
Language-agnostic Multilingual Modeling Using Effective Script Normalization
2y 5m to grant Granted Jan 27, 2026
17/995,518
Patent 12531050
VOICE DATA CREATION DEVICE
2y 5m to grant Granted Jan 20, 2026
17/954,845
Patent 12499332
TRANSLATING TEXT USING GENERATED VISUAL REPRESENTATIONS AND ARTIFICIAL INTELLIGENCE
2y 5m to grant Granted Dec 16, 2025
17/334,543
Patent 12488180
SYSTEMS AND METHODS FOR GENERATING DIALOG TREES
2y 5m to grant Granted Dec 02, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+30.5%)
3y 0m
Median Time to Grant
High
PTA Risk
Based on 108 resolved cases by this examiner. Grant probability derived from career allow rate.