Last updated: July 17, 2026

Application No. 18/643,582

NOISE SUPPRESSION MODEL USING GATED LINEAR UNITS

Final Rejection §103

Filed

Apr 23, 2024

Priority

Apr 25, 2023 — provisional 63/461,660

Examiner

MCLEAN, IAN SCOTT

Art Unit

2654

Tech Center

2600 — Communications

Assignee

Skyworks Solutions Inc.

OA Round

2 (Final)

This examiner grants 45% of cases after interview

— +33.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 51 resolved cases, 2023–2026

Examiner Intelligence

MCLEAN, IAN SCOTT View full profile →

Grants 45% of resolved cases

Career Allowance Rate

23 granted / 51 resolved

-16.9% vs TC avg

Strong +33% interview lift

Without

With

+33.3%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

24 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§103

89.9%

+49.9% vs TC avg

§102

10.1%

-29.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 51 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
2.	Applicant's arguments filed 04/06/2026 have been fully considered but they are not persuasive.
Applicant argues that Kim discloses leaky-ReLU activations and therefore fails to disclose the encoding layer not including a rectified linear unit (ReLU). However, the rejection does not rely on Kim for teach non-ReLU encoder activations. Rather Tan expressly teaches: “We apply exponential linear unit (ELUs) to all convolutional and deconvolutional layers except the output layer” (Tan Section 2.1). AN ELU activation is not a ReLU activation function. Tan further teaches use of softplus activation in the output layer, which likewise is not a ReLU. Accordingly, Tan expressly teaches encoder layers that do not include ReLU activations. Kim is relied upon for teaching the gating operations and multiplicative mask, including a sigmoid gated element wise multiplication structure corresponding to a GLU component. The combination of Tan and Kim therefore continues to teach or suggest all limitations of amended claim 1. 

3.	Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Tan et al. “A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement” herein Tan in view of Kim et al. “Multi-Domain Processing via Hybrid Denoising Networks for Speech Enhancement” herein Kim.

Regarding Claim 1:
	Tan discloses a computer-implemented method comprising:
receiving audio data including a known noisy acoustic signal, the known noisy acoustic signal including a known clean acoustic signal and at least one known additive noise (Tan: Sections 2.3, 3.1 discloses constructing noisy audio which is clean audio plus additive noise, and uses the noisy mixture as input and uses the clean speech as known ground truth during supervised training. This is exactly the claimed “known noisy acoustic signal including known clean and known additive noise”);
transforming the audio data into frequency-domain data (Tan: Section 2.3 discloses that it transforms the time-domain waveform into frequency domain STFT data); and 
training a convolutional neural network (Tan: Section 1 uses a convolutional network to train directly on STFT frequency domain data,), the convolutional neural network configured to outputsignal (Tan: Section 2.1 discloses outputting the clean magnitude directly), the convolutional neural network including a plurality of neurons, the plurality of neurons arranged in a plurality of layers including at least one hidden layer, the plurality of layers including an encoding layer (Tan: Section 2.1-2.3 and Table 1 disclose a convolutional encoder-decoder neural network architecture including multiple convolutional and deconvolutional layers and stacked LSTM hidden layers connected together within the neural network, Section 2.1 explicitly discloses a layer NOT including a rectified linear unit (ReLU) “we apply exponential linear units (ELUs) to all convolutional and deconvolutional layers except the output layer… In the output layer, we utilize softplus activation” ELU and softplus activations are not (emphasis added) ReLU activations).
	
Tan does not explicitly discloses including at least on gated linear unit (GLU) [in its convolutional neural network]
The convolutional neural network outputting a frequency multiplicative mask that to be multiplied to the frequency domain data to estimate the known clean signal.
an encoding layer including the GLU component.
However, Kim discloses including at least on gated linear unit (GLU) (Kim: “Model Architecture” discloses what is commonly understood as a GLU in the art is a mechanism that operates on an input by performing two independent linear transformations, specifically, two pieces of convolutional output that it combines with a gating mechanism, typically a sigmoid is applied to one of these unit. The general form of this is GLU = A(x) ⊗  σ(B). Kim does exactly this at under “B Model Architecture” where it discloses the operation ⊗ means element wise multiplication and the preceding layer of this operation uses sigmoid as an activation function);
The convolutional neural network outputting a frequency multiplicative mask that to be multiplied to the frequency domain data to estimate the known clean signal (Kim: Section 2.2 discloses CNN outputs a ratio mask, the mask is multiplied with the frequency bins of the STFT and results in estimates of clean audio signal also see Fig. 3);
an encoding layer including the GLU component (Kim: “Model Architecture” discloses what is commonly understood as a GLU in the art is a mechanism that operates on an input by performing two independent linear transformations, specifically, two pieces of convolutional output that it combines with a gating mechanism).
Tan and Kim are combinable because they are in the same field of endeavor, supervised speech enhancement using convolutional neural networks. Tan discloses a network architecture that input frequency data to denoise output by training on known ground truth clean and noisy signal in order to denoise the original signal. Kim discloses a GLU block designed for spectrogram based on CNN speech enhancement, using spectrograms/frequency data. Replacing a standard convolutional block, instead with a gated convolutional block, and applying it as a mask would be a simple substitution within Tan’s architecture. In fact, Tan notes previous applications that take this approach within its own introduction, it merely does not use this specific approach in its own model architecture. The suggestion/motivation for applying Kim’s approach is “the spectrogram approach (U-Net) successfully removes high frequency noise” as disclosed in section 3.2 of Kim.

Regarding Claim 3:
The combination of Tan and Kim further discloses the computer-implemented method of claim 1, further comprising:
constructing the convolutional neural network, the convolutional neural network including the plurality of neurons, the plurality of neurons arranged in a the plurality of layers including the at least one hidden layer, the plurality of layers including a layer including the GLU component, and the plurality of neurons being connected by the plurality of connect (Tan: Section 2.1 discloses an encoder with five convolutional layers and a decoder with five deconvolutional layers, also shows hidden layers e.g., two stacked LSTM layers (section 2.2); Kim discloses a U-net block where the preceding layer uses a sigmoid and ⊗ means element wise multiplication. Examiner Interpretation {Tan discloses the layered CNN structure with neurons and connections, Kim provides the GLU layer embedded within a CNN layer}).
Tan and Kim are combinable because they are in the same field of endeavor, supervised speech enhancement using convolutional neural networks. Tan discloses a network architecture that input frequency data to denoise output by training on known ground truth clean and noisy signal in order to denoise the original signal. Kim discloses a GLU block designed for spectrogram based on CNN speech enhancement, using spectrograms/frequency data. Replacing a standard convolutional block, instead with a gated convolutional block, and applying it as a mask would be a simple substitution within Tan’s architecture. In fact, Tan notes previous applications that take this approach within its own introduction, it merely does not use this specific approach in its own model architecture. The suggestion/motivation for applying Kim’s approach is “the spectrogram approach (U-Net) successfully removes high frequency noise” as disclosed in section 3.2 of Kim.

Regarding Claim 3:
The combination of Tan and Kim further discloses the computer-implemented method of claim 1 wherein the encoding layer including the GLU component includes a convolutional block that is configured to calculate a first convolutional output and a second convolutional output, the first convolutional output and the second convolutional output calculated based on the frequency-domain data, and a gating block that uses the first convolutional output to partially or completely block the second convolutional output (Kim: “B Model Architecture” discloses two convolutional paths one producing a linear output and one producing a sigmoid gate and are combined in the GLU calculation, both very clearly derived from the frequency data).

Regarding Claim 4:
The combination of Tan and Kim further discloses the computer-implemented method of claim 3 wherein a logistic function, including a sigmoid function, receives the first convolutional output and outputs a weight, and the gating block performs an element-wise multiplication with the second convolutional output and the weight (Kim: “B Model Architecture” discloses two convolutional paths one producing a linear output and one producing a sigmoid gate and are combined in the GLU calculation).

Regarding Claim 5:
	The combination of Tan and Kim further discloses the computer-implemented method of claim 3 wherein the convolutional block is configured to zero-pad at least a portion of the frequency-domain data (Tan: Section 2.3 encoder-decoder uses padding on convolution operations).

Regarding Claim 6:
	The combination of Tan and Kim further discloses the computer-implemented method of claim 2 wherein the at least one hidden layer of the convolutional neural network includes at least one long short-term memory layer (Tan: Section 2.2 discloses thar are stacked LSTM layers).

Regarding Claim 7:
	The combination of Tan and Kim further discloses the wherein a first layer of the plurality of layers is configured to encode frequencies in the frequency-domain data into a lower-dimension feature space, and a second layer of the plurality of layers is configured to decode feature space to high-dimension and output the frequency multiplicative mask (Tan: Section 2.1 discloses reducing the frequency dimension “we halve the frequency dimension size”; Kim Section 2.2 discloses the learning an ideal ratio mask, by multiplying the estimated mask to the noisy spectrogram i.e., decodes into the high dimension and outputs the multiplicative mask).

Regarding Claim 8:
The combination of Tan and Kim further discloses the computer-implemented method of claim 1, further comprising providing the trained convolutional neural network to a wearable or portable audio device wherein the audio device is capable of receiving real-time audio data, transforming the real-time audio data into real-time frequency-domain data, outputting a real-time frequency multiplicative mask using the trained convolutional neural network and the real-time audio data, and applying the real-time frequency multiplicative mask to the real-time frequency-domain data (Tan: Abstract, Introduction, disclose hearing aids and similar devices, which are portable audio devices, overall repeatedly cited them as a use case for real-time audio processing).

Regarding Claim 9:
The combination of Tan and Kim further discloses the computer-implemented method of claim 1 wherein the audio data includes a plurality of frames wherein the transforming the audio data into the frequency-domain data further includes calculating spectral features for a plurality of frequency bins based on the plurality of frames (Tan: Section 2.3, discloses taking the STFT frames (frequency bins which are always done at specific time snapshots, wherein there are a plurality of time snapshots) and uses this as input, i.e., it calculates spectral features).

Regarding Claim 10:
The combination of Tan and Kim further discloses the computer-implemented method according to claim 1 further comprising receiving a test data set, the test data set including audio data with unseen noise, and evaluating the trained convolutional neural network using the received test data set (Tan: Section 3.1 discloses untrained/unseen noises which is used for testing).

Regarding Claim 11:
	The combination of Tan and Kim further discloses the computer-implemented method of claim 1 wherein the frequency multiplicative mask is at least one of a complex ratio mask or an ideal ratio mask (Kim: Section 2.2, as previously explained, aims to learn an ideal ration mask).


Regarding Claim 12:
	The combination of Tan and Kim further discloses the computer-implemented method of claim 1 wherein the audio data is synthetic audio data with a known noisy acoustic signal and at least one of a known clean acoustic signal or a known additive noise (Tan: Section 2.3 discloses constructing artificial mixtures (synthetic noisy data) where clean speech and noise are known).

Regarding Claim 13:
The combination of Tan and Kim further discloses the computer-implemented method of claim 1 wherein the known noisy acoustic signal is a known noisy speech signal and the known clean acoustic signal is a known clean speech signal (Tan: Section 2.3 explicitly trains on known noisy signal and known clean speech with the goal of clean speech as the training target).

Regarding Claim 14:
Claim 14 has been analyzed with regard to claim 1 (see rejection above) and
is rejected for the same reasons of obviousness used above.
	It is noted that Tan discloses experimental computer implemented tests of the trained model that necessarily include a processing device with memory to store instructions at least at Section 3.1.

Regarding Claim 15:
Claim 15 has been analyzed with regard to claim 2 (see rejection above) and
is rejected for the same reasons of obviousness used above.


Regarding Claim 16:
Claim 16 has been analyzed with regard to claim 3 (see rejection above) and
is rejected for the same reasons of obviousness used above.

Regarding Claim 17:
Claim 17 has been analyzed with regard to claim 4 (see rejection above) and
is rejected for the same reasons of obviousness used above.

Regarding Claim 18:
Claim 18 has been analyzed with regard to claim 6 (see rejection above) and
is rejected for the same reasons of obviousness used above.

Regarding Claim 19:
Claim 19 has been analyzed with regard to claim 7 (see rejection above) and
is rejected for the same reasons of obviousness used above.

Regarding Claim 20:
Claim 20 has been analyzed with regard to claim 1 (see rejection above) and
is rejected for the same reasons of obviousness used above.
	It is noted that Tan discloses experimental computer implemented tests of the trained model that necessarily include a processing device with memory to store instructions at least at Section 3.1.



Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IAN SCOTT MCLEAN whose telephone number is (703)756-4599. The examiner can normally be reached "Monday - Friday 8:00-5:00 EST, off Every 2nd Friday".
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at (571) 272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/IAN SCOTT MCLEAN/Examiner, Art Unit 2654           

/HAI PHAN/Supervisory Patent Examiner, Art Unit 2654

Read full office action

Prosecution Timeline

Apr 23, 2024

Application Filed

Dec 05, 2025

Non-Final Rejection mailed — §103

Apr 06, 2026

Response Filed

May 22, 2026

Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/499,296

Patent 12609127

NEUTRALIZING DISTORTION IN AUDIO DATA

2y 5m to grant Granted Apr 21, 2026

18/245,802

Patent 12602553

SPEECH TRANSLATION METHOD, DEVICE, AND STORAGE MEDIUM

3y 0m to grant Granted Apr 14, 2026

17/952,401

Patent 12494199

VOICE INTERACTION METHOD AND ELECTRONIC DEVICE

3y 2m to grant Granted Dec 09, 2025

18/063,167

Patent 12443805

Systems and Methods for Multilingual Data Processing and Arrangement on a Multilingual User Interface

2y 10m to grant Granted Oct 14, 2025

17/559,283

Patent 12437144

Content Recommendation Method and User Terminal

3y 9m to grant Granted Oct 07, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

45%

Grant Probability

78%

With Interview (+33.3%)

3y 1m (~10m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 51 resolved cases by this examiner. Grant probability derived from career allowance rate.