Last updated: April 19, 2026

Application No. 18/654,728

AUDIO PROCESSING DEVICE AND METHOD FOR SUPPRESSING NOISE

Non-Final OA §103

Filed

May 03, 2024

Examiner

KY, KEVIN

Art Unit

2671

Tech Center

2600 — Communications

Assignee

Iris Audio Technologies Limited

OA Round

1 (Non-Final)

Interview Optional

— +25.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 549 resolved cases, 2023–2026

Examiner Intelligence

KY, KEVIN View full profile →

Grants 76% — above average

Career Allow Rate

420 granted / 549 resolved

+14.5% vs TC avg

Strong +25% interview lift

Without

With

+25.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 6m

Avg Prosecution

33 currently pending

Career history

582

Total Applications

across all art units

Statute-Specific Performance

§101

17.6%

-22.4% vs TC avg

§103

46.5%

+6.5% vs TC avg

§102

20.8%

-19.2% vs TC avg

§112

9.9%

-30.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 549 resolved cases

Office Action

§103

DETAILED ACTION
Election/Restrictions
Applicant’s election without traverse of Group 1, Claims 1-7 and 12-20, in the reply filed on 1/14/2026 is acknowledged.
Claims 8-11 are withdrawn from further consideration pursuant to 37 CFR 1.142(b) as being drawn to a nonelected Invention, there being no allowable generic or linking claim. Election was made without traverse in the reply filed on 1/14/2026.

Claim Interpretation
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: receiving unit, transform unit, noise suppression unit, and inverse transform unit in claim 1.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof:
Referring to the specifications as filed, the receiving unit corresponds to Fig. 1 & ¶23-24 receiving unit 110 “e.g. using an input microphone”, transform unit corresponds to Fig. 1 & ¶25-27 transform unit 120 “the transform unit 120 is configured to use a short time Fourier transform, SFTF, to transform a window of the input audio signal 10”, thus reciting algorithm structure, noise suppression unit corresponds to Fig. 1 & ¶28-35 “noise suppression network 130 including an encoder module 131, a gated recurrent unit (GRU) network 132 and a decoder module 133”, thus reciting algorithm structure, and inverse transform unit corresponds to Fig. 1 & ¶47 inverse transform unit 140 “inverse transform unit 140 is configured to use an inverse short time Fourier transform, ISTFT”, thus reciting algorithm structure.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 5, 12, 16 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Braun et al (NPL: TOWARDS EFFICIENT MODELS FOR REAL-TIME DEEP NOISE SUPPRESSION) in view of Eskimez et al (US 20230116052) 
Regarding claim 1, Braun discloses an audio processing device for suppressing noise in an audio signal, comprising:
a receiving unit configured to receive an input audio signal (pg. 1 Fig. 1 input noisy audio (e.g. from microphones: a vast amount of noise types, and varying microphone signal levels.));
a transform unit configured to generate an input spectrogram based on the input audio signal (pg. 1-2 Fig. 1 STFT (which generates a spectrogram));
a noise suppression network configured to process the input spectrogram (pg. 1-2 Fig. 1 suppression filter; pg. 1 2 Enhancement System and Training Objective; We use spectral suppression-based enhancement systems; The network predicts a real-valued, time-varying suppression gain per time-frequency bin, that is applied to the complex input spectrum, and transformed back to time-domain as shown in Fig. 1 in the upper branch.);
a gated recurrent unit, GRU, network including a plurality of GRU cells (pg. 2 3.1 NSET2: The network proposed in [11], referred to as NSnet2, consists only of fully connected (FC) and gated recurrent unit (GRU) [18] layers in the format FC-GRU-GRU-FC-FC-FC; see also Fig. 2 architecture);
an inverse transform unit configured to generate an output audio signal based on an output spectrogram from the noise suppression network (pg. 1-2 Fig. 1 iSTFT outputs an enhanced audio).
Braun fails to specifically teach where Eskimez teaches an encoder module including a plurality of complex convolutional layers (¶44 A pDCCRN approach uses a U-Net architecture with encoder and decoder blocks and two complex LSTM layers in-between. Each block contains complex 2-D convolutional layers followed by complex batch normalization); and
a decoder module including a plurality of complex deconvolutional layers (¶44 A pDCCRN approach uses a U-Net architecture with encoder and decoder blocks and two complex LSTM layers in-between. Each block contains complex 2-D convolutional layers followed by complex batch normalization).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of an encoder module including a plurality of complex convolutional layers and a decoder module including a plurality of complex deconvolutional layers from Eskimez into the audio processing device as disclosed by Braun. The motivation for doing this is to improve speech enhancement models.

Regarding claim 5, the combination of Braun and Eskimez disclose the audio processing device of claim 1, wherein the transform unit is configured to use a short time Fourier transform, SFTF, to transform a window of the input audio signal (Braun pg. 2 Fig. 1 STFT (which generates a spectrogram)), and the inverse transform unit is configured to use an inverse short time Fourier transform, ISTFT (Braun pg. 1 Fig. 1 iSTFT outputs an enhanced audio).

Regarding claim(s) 12 and 16 (drawn to a method):               
The rejection/proposed combination of Braun and Eskimez, explained in the rejection of device claim(s) 1 and 5, anticipates/renders obvious the steps of the method of claim(s) 12 and 16 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 1 and 5 is/are equally applicable to claim(s) 12 and 16.

Regarding claim(s) 20 (drawn to a CRM):               
The rejection/proposed combination of Braun and Eskimez, explained in the rejection of device claim(s) 1, anticipates/renders obvious the steps of the computer readable medium of claim(s) 20 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 1 is/are equally applicable to claim(s) 20.

	
Claim(s) 2-3 and 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Braun and Eskimez as applied to claim 1 and 12 above, and further in view of Cho et al (NPL: Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation).
 Regarding claim 2, the combination of Braun and Eskimez disclose the audio processing device of claim 1, but fail to teach where Cho teaches wherein each GRU cell is configured to preserve a long-term memory of a respective cell state (pg. 3 2.3 Hidden Unit that Adaptively Remembers and Forgets: the update gate controls how much information from the previous hidden state will carry over to the current hidden state). 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein each GRU cell is configured to preserve a long-term memory of a respective cell state from Cho into the audio processing device as disclosed by the combination of Braun and Eskimez. The motivation for doing this is to improve the performance of encoder-decoder neural networks.

Regarding claim 3, the combination of Braun, Eskimez, and Cho disclose the audio processing device of claim 2, wherein each GRU cell includes an update gate configured to update the cell state with a new candidate stale, and a reset gate configured to keep or discard the previous cell state (Cho pg. 3 2.3 Hidden Unit that Adaptively Remembers and Forgets: reset gate rj and update gate zj). 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein each GRU cell includes an update gate configured to update the cell state with a new candidate stale, and a reset gate configured to keep or discard the previous cell state from Cho into the audio processing device as disclosed by the combination of Braun and Eskimez. The motivation for doing this is to improve the performance of encoder-decoder neural networks.

Regarding claim(s) 13 and 14 (drawn to a method):               
The rejection/proposed combination of Braun, Eskimez, and Cho, explained in the rejection of device claim(s) 2-3, anticipates/renders obvious the steps of the method of claim(s) 13 and 14 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 2-3 is/are equally applicable to claim(s) 13 and 14.
	
Claim(s) 4 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Braun and Eskimez as applied to claim 1 and 12 above, and further in view of Helwani et al (US Patent 11875810 B1).
Regarding claim 4, the combination of Braun and Eskimez disclose the audio processing device of claim 1, but fails to teach where Helwani teaches wherein inputs and outputs of each GRU cell are complex values (col 9 lines 65-67 to col 10 lines 1-5 The transformed reference signal and microphone output may both be provided as input to a recurrent complex-valued neural network (RCNN) 325 in the depicted embodiment, which may for example comprise some number of gated recurrent units (also referred to as GRU cells) of the kind depicted in FIG. 5.; col 10 lines 65-60 the input signals received at the RCNNs may be represented as complex numbers; col 11 lines 10-20 The output of the STFT may be 257 complex sub-bands in such an implementation. The RCNN may comprise multiple GRU layers with input of dimensionality 2*257 (e.g., 257 elements each from the reference and the microphone output in the case of the NLEH), and the output may comprise 257 dimensions).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein inputs and outputs of each GRU cell are complex values from Helwani into the audio processing device as disclosed by combination of Braun and Eskimez. The motivation for doing this is to improve the quality of audio transmitted.

Regarding claim(s) 15 (drawn to a method):               
The rejection/proposed combination of Braun, Eskimez, and Helwani, explained in the rejection of device claim(s) 3, anticipates/renders obvious the steps of the method of claim(s) 15 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 4 is/are equally applicable to claim(s) 15.

Claim(s) 6 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Braun and Eskimez as applied to claim 5 and 16 above, and further in view of Pau et al (US 20140198995).
Regarding claim 6, the combination of Braun and Eskimez disclose the audio processing device of claim 5, but fail to teach where Pau teaches wherein the transform unit includes a buffer and is configured to overlap adjacent windows according to the buffer size (¶101 a peak finder 890 (possibly concurrently with other operations involving other buffers or the transform module) may then identify keypoint candidates in the current block and store them in a queue 801; ¶102 produce an overlap between adjacent blocks in order to avoid loss of cross-block keypoints; this may be obtained, e.g., by means of a set of overlap buffers 880; ¶103 this case a set of overlap buffers 821 may be provided, with the proviso that these may have different characteristics (e.g., size) with respect to the buffers at 880).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein the transform unit includes a buffer and is configured to overlap adjacent windows according to the buffer size from Pau into the audio processing device as disclosed by combination of Braun and Eskimez. The motivation for doing this is to improve data processing.

Regarding claim(s) 17 (drawn to a method):               
The rejection/proposed combination of Braun, Eskimez, and Pau, explained in the rejection of device claim(s) 6, anticipates/renders obvious the steps of the method of claim(s) 17 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 6 is/are equally applicable to claim(s) 17.

Claim(s) 7 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Braun and Eskimez as applied to claim 1 and 12 above, and further in view of Berchin et al (US Patent 7542815 B1).
Regarding claim 7, the combination of Braun and Eskimez disclose the audio processing device of claim 1, but where to teach where Berchin teaches an immersive audio processing unit comprising a polyphase infinite impulse response, IIR, filter (col 5 lines 54-59 various implementations thereof, including, but not limited to, direct computation using the defining equations, linear-algebra/matrix operations, convolution using FIR or IIR filter structures, polyphase filterbanks) configured to process the output audio signal and generate an immersive audio signal (col 8 lines 30-45 requires four inverse-transform operations to return to the time-domain instead of three, but allows access to both the common-inphase and common-quadrature time-domain data; Applications in which access to common-quadrature and common-inphase data is useful include, but are not limited to, stereo signals that incorporate matrix-encoded surround material).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of an immersive audio processing unit comprising a polyphase infinite impulse response, IIR, filter configured to process the output audio signal and generate an immersive audio signal from Berchin into the audio processing device as disclosed by combination of Braun and Eskimez. The motivation for doing this is to improve digital audio signal processing for transforming sound.

Regarding claim(s) 18 (drawn to a method):               
The rejection/proposed combination of Braun, Eskimez, and Berchin, explained in the rejection of device claim(s) 7, anticipates/renders obvious the steps of the method of claim(s) 18 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 7 is/are equally applicable to claim(s) 18.

Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Braun and Eskimez as applied to claim 1 above, and further in view of Yoshioka et al (US 20200349949)
Regarding claim 19, the combination of Braun and Eskimez disclose the audio processing device of claim 1, wherein the noise suppression network has been trained using operations comprising: 
receiving training data including a plurality of clean audio samples and a plurality of environmental noise samples (Braun pg. 3 4.1. Dataset: We a use large-scale synthetic training set and test on real recordings to ensure generalization of our results to real-world signals. The training set uses 544 h of high mean opinion score (MOS) rated speech recordings from the LibriVox corpus, 247 h noise recordings from Audioset, Freesound, internal noise recordings and 1 h of colored stationary noise; While already reverberant speech files are mixed with noise as is, non-reverberant speech files were augmented with acoustic impulse responses randomly drawn from a set of 7000 measured and simulated responses from several public and internal databases.); 
generating training samples by merging at least one clean audio sample and at least one environmental noise sample (Braun pg. 3 4.1. Dataset: While already reverberant speech files are mixed with noise as is, non-reverberant speech files were augmented with acoustic impulse responses randomly drawn from a set of 7000 measured and simulated responses from several public and internal databases; The reverberant speech and noise is mixed with a signal-to-noise ratio (SNR) drawn from a Gaussian distribution with N(5; 10) dB; see Fig. 4 pipeline: Training data generation: Reverberant speech is used as is, while non-reverberant speech is augmented with RIRs, and the training targets are created using shaped RIRs.); 
processing the training samples using the noise suppression network (Braun pg. 1 2. ENHANCEMENT SYSTEM AND TRAINING OBJECTIVE: We use spectral suppression-based enhancement systems due to their robust generalization, logical interpretation and control, and easier integration with existing speech processing algorithms; pg. 3 4.1. Dataset: We a use large-scale synthetic training set and test on real recordings to ensure generalization of our results to real-world signals); 
The combination of Braun and Eskimez fails to teach where Yoshioka teaches calculating a value for a permutation invariant loss function based on the output audio signal of the audio processing device and the clean audio sample corresponding to each training sample (¶80 After getting two channel output from the mask reconstructor, permutation invariant training objective function was applied between the reconstructed mask and the clean reference, where the Euclidean distance of each permutation pair of output and clean reference are measured first, and then minimum distance and corresponding permutation is selected to update the neural network); and 
updating one or more parameters of the noise suppression network to reduce the value of the permutation invariant loss function (¶71 Acoustic beamforming, or simply beamforming, is a technique to enhance target speech by reducing unwanted sounds such as background noise from multi-channel audio signals; ¶80 minimum distance and corresponding permutation is selected to update the neural network).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of calculating a value for a permutation invariant loss function based on the output audio signal of the audio processing device and the clean audio sample corresponding to each training sample and updating one or more parameters of the noise suppression network to reduce the value of the permutation invariant loss function from Yoshioka into the audio processing device as disclosed by the combination of Braun and Eskimez. The motivation for doing this is to improve the accuracy of downstream speech processing, such as speech recognition and speaker diarization.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN KY whose telephone number is (571)272-7648. The examiner can normally be reached Monday-Friday 9-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached at 571-272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KEVIN KY/               Primary Examiner, Art Unit 2671

Read full office action

Prosecution Timeline

May 03, 2024

Application Filed

Mar 19, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/676,432

Patent 12597158

POSE ESTIMATION

2y 5m to grant Granted Apr 07, 2026

18/814,687

Patent 12597291

IMAGE ANALYSIS FOR PERSONAL INTERACTION

2y 5m to grant Granted Apr 07, 2026

18/222,090

Patent 12586393

KNOWLEDGE-DRIVEN SCENE PRIORS FOR SEMANTIC AUDIO-VISUAL EMBODIED NAVIGATION

2y 5m to grant Granted Mar 24, 2026

18/570,168

Patent 12586559

METHOD AND APPARATUS FOR GENERATING SPEECH OUTPUTS IN A VEHICLE

2y 5m to grant Granted Mar 24, 2026

19/080,452

Patent 12579382

NATURAL LANGUAGE GENERATION USING KNOWLEDGE GRAPH INCORPORATING TEXTUAL SUMMARIES

2y 5m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

76%

Grant Probability

99%

With Interview (+25.3%)

2y 6m

Median Time to Grant

Low

PTA Risk

Based on 549 resolved cases by this examiner. Grant probability derived from career allow rate.