Last updated: May 29, 2026
Application No. 18/685,656
DETECTING ENVIRONMENTAL NOISE IN USER-GENERATED CONTENT

Final Rejection §103
Filed
Feb 22, 2024
Priority
Aug 26, 2021 — CN PCT/CN2021/114746 +4 more
Examiner
BRINEY III, WALTER F
Art Unit
2692
Tech Center
2600 — Communications
Assignee
Dolby Laboratories Licensing Corporation
OA Round
2 (Final)
Interview Optional

— +5.2% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 65% grant rate with +5.2% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 544 resolved cases, 2023–2026
Examiner Intelligence

BRINEY III, WALTER F View full profile →
Grants 65% — above average
Career Allowance Rate
356 granted / 544 resolved
+3.4% vs TC avg
Moderate +5% lift
Without
With
+5.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
35 currently pending
Career history
603
Total Applications
across all art units
Statute-Specific Performance

§101
0.8%
-39.2% vs TC avg
§103
76.4%
+36.4% vs TC avg
§102
7.7%
-32.3% vs TC avg
§112
9.3%
-30.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 544 resolved cases
Office Action

§103
Detailed Action
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . See 35 U.S.C. § 100 (note).
The Examiner previously considered this Application in a Non-Final Office Action (03 December 2025). Claims 1–10, 13 and 18–20 were rejected. Claims 11, 12 and 14–17 were objected to for containing allowable subject matter while depending on a rejected base claim. In response, Applicant has amended the claims, including the cancellation of claim 3 and adding new claims 21 and 22.
Art Rejections
Obviousness
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 2, 4, 6–8, 10, 13 and 18–22 are rejected under 35 U.S.C. § 103 as being unpatentable over the combination of US Patent Application Publication 2023/0336912 (effectively filed 04 December 2015 as PCT/US2015/064139) (“Benattar”); Miyake et al., Noise Detection and Classification in Speech Signals with Boosting, 14th Workshop on Statistical Signal Processing IEEE 778 (01 August 2007) (“Miyake”); Theodoros Giannakopoulos, Intro to Audio Analysis: Recognizing Sounds Using Machine Learning, https://hackernoon.com/intro-to-audio-analysis-recognizing-sounds-using-machine-learning-qy2r3ufl (12 September 2020) (last accessed 24 April 2026) (“Giannakopoulos”) and Theoretical Computer Science Stack Exchange, Dimensionality Reduction in Machine Learning, https://cstheory.stackexchange.com/questions/31228/dimensionality-reduction-in-machine-learning (April 2015) (last accessed 24 April 2026) (“TCSSE”).
Claims 5 and 9 are rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Benattar; Miyake; Giannakopoulos; TCSSE and Nikonas Simou, Towards Blind Quality Assessment of Concert Audio Recordings Using Deep Neural Networks, IEEE Int’l Conf. on Acoustics, Speech and Signal Processing 3477 (04-08 May 2020) (“Simou”).
Claim 1 is drawn to “a computer-implemented method of audio processing.” The following table illustrates the correspondence between the claimed method and the Benattar reference.
Claim 1
The Benattar Reference
“1. A computer-implemented method of audio processing, the method comprising:
The Benattar reference similarly describes a computerized method performed by a computer system to provide user-customized audio processing. Benattar at Abs., ¶¶ 3, 69–80, 157. Benattar’s method and system process audio by distinguishing between voice and ambient noise and distinguishing between multiple classes of ambient noise and processing the classes differently. Id. at ¶¶ 88, 89, 97, 109–111.
“receiving an audio signal;
Benattar’s method and system receive audio from multiple sources, including prerecorded and live audio. Id. at ¶¶ 94, 97, 294, 295, FIG.4.
“calculating a first confidence score of the audio signal using a first machine learning model trained to classify an audio signal as non-noise or noise;
“when the first confidence score indicates a presence of non-noise: generating a processed audio signal by processing the audio signal according to a first audio processing process;
“when the first confidence score indicates a presence of noise: calculating a second confidence score of the audio signal using a second machine learning model trained to distinguish between noise of a first type and noise of a second type;
“when the second confidence score indicates a presence of noise of the first type: generating the processed audio signal by processing the audio signal according to a second audio processing process; and
“when the second confidence score indicates a presence of noise of the second type: generating the processed audio signal by processing the audio signal according to the first audio processing process,
These claim limitations require a two-step classification of received audio. A first machine learning model is trained to split audio into non-noise or noise. A second machine learning model is trained to split noise into a first and second type of noise (e.g., undesirable noise and desirable noise). The first type of noise is processed in a second audio processing process (e.g., a noise cancellation process). Non-noise and the second type of noise are processed in a first audio processing process (e.g., an audio enhancement process).
Benattar similarly describes classifying the received audio signals in various ways. For example, Benattar’s method and system includes recognizing desired speech/music and non-speech/music ambient noise. Id. at ¶¶ 100, 110, 124, 126, 127, 177. The ambient noise may be further classified as desired or undesired noise. Id. at ¶¶ 97, 109–110, 120, 126. Undesired noise is cancelled using a noise cancelling process. Id. at ¶¶ 94, 97, 105. Audio classified as speech and desired ambient noise, however, is processed separately in order to highlight the audio, rather than to cancel it. Id.
While Benattar distinguishes between at least three classes of audio—namely, non-noise (speech), a first type of undesirable noise and a second type of desirable noise—Benattar simply does not describe the classification process as a two-step process as claimed. Benattar’s limited description on classification also excludes a description of the use of trained machine learning models.
“wherein the audio signal comprises a plurality of samples, wherein the plurality of samples is arranged into a plurality of frames;
“wherein the first confidence score is calculated on a short clip-by-short clip basis;
“wherein the second confidence score is calculated on a clip-by-clip basis; and
“wherein a given short clip and a given clip each include a number of frames of the audio signal, wherein the given short clip includes fewer frames than the given clip.”
Benattar also does not describe the use of two different clip lengths (where each clip includes at least one frame of samples) to generate first and second confidence scores.

Table 1
The table above shows that the Benattar reference describes a method that corresponds closely to the claimed method. While similar, the Benattar reference does not anticipate the claimed method because it does not describe the claimed use of trained machine learning models to classify audio. Benattar does not describe the claimed two-step detection and classification of sounds. And Benattar does not describe the use of two different clip lengths (where each clip includes at least one frame of samples) to generate first and second confidence scores.
Use of a Two-Step Process for Identification and Classification of Audio
The differences between the claimed invention and the Benattar reference are such that the invention as a whole would have been obvious to one of ordinary skill in the art at the time this Application was effectively filed. As shown in the table above, Benattar describes a method and system that classifies received audio in order to process the audio according to user preferences. The received audio may include some combination of speech, desired ambient noises and undesired ambient noises. Benattar does not detail the algorithms used to classify the received audio and neither describes the claimed use of trained machine learning models nor the claimed two-step classification process.
The Miyake reference describes a noise detection and classification method and system. Miyake at Abs., § 2, FIG.2. One of ordinary skill would have recognized that the Benattar reference is relevant to the Miyake reference since it provides detailed teachings on how to classify received audio signals, including the detection of speech and different classes of noise. Miyake teaches the use of a trained machine learning model called AdaBoost, which is a binary classifier (i.e., it splits an input into two classes). Id. To segregate an audio signal into three classes, Miyake teaches staging two AdaBoost models in series to create a two-step classification process that is similar to the claimed classification process. Id. at §§ 2, 4. In particular, Miyake describes extracting features from an input audio signal. Id. at §§ 2, 7.1. A first AdaBoost model analyzes the features to segregate speech only frames from noise-containing frames. Id. at § 2. A second AdaBoost model analyzes the noise-containing frames to segregate fames containing both speech and noise from frames containing only noise. Id.
Read in the context of Benattar, the Miyake reference would have reasonably taught and suggested distinguishing among Benattar’s three classes of sounds with a two-step classification process similar to the one taught by Miyake. For example, using two trained, AdaBoost models, a first model would operate on received audio to separate speech from noise and a second model would operate on the first model’s noise output to separate desired noise from undesired noise.
Use of Different Clip Lengths for Identification and Classification
The combination of Benattar and Miyake would have reasonably produced a system including a trained set of AdaBoost models. A first stage of models would calculate a first confidence score to identify the presence/absence of wanted/unwanted sounds. A second stage of models would further calculate a second confidence score to characterize any unwanted sounds into types to facilitate their removal. Following the teachings of Miyake concerning AdaBoost models, the first stage and second stage would operate by processing a clip, or set of frames of digital samples. See Miyake at § 5 (describing classifying based on a single frame and smoothing operation over a set of frames). In particular, Miyake samples 200 ms worth of audio at 16 KHz and splits the samples into 20 ms windows overlapped by 10 ms. Miyake does not describe, however, the idea of using a different number of frames (i.e., different clip lengths) in the first stage compared to the number of frames used in the second stage. Notably, Miyake describes the goal of performing the first step with a single model to reduce the amount of processing resources required, while the second step uses multiple models trained on different classes of noise.
The Giannakopoulos reference relates to Benattar and Miyake since it is drawn to the field of audio processing through machine learning. Like Miyake, the Giannakopoulos reference addresses feature extraction through sampling and windowing of audio into a set of windows, or frames. Giannakopoulos at pp. 2–6. Giannakopoulos further teaches the idea of clipping, or segmenting, the audio into manageable chunks to reduce dimensionality and the processing requirements needed for analyzing input audio. Id. For example, Giannakopoulos describes splitting a larger audio sample of 2.5 seconds into 1-second segments that are further split into windows of 50 ms. Id. Giannakopoulos further teaches configuring the set of windows, or frames, and the length of audio used based on the purpose of the analysis, including segmentation (e.g., identification of speech and noise) and classification (e.g., classification of noise into one of various types). Id. If one desires to better identify discrete events, a shorter window should be used. Id. If one desired better spectral resolution, longer frames should be used. Id.
Further, the TCSSE reference teaches the importance of dimensionality reduction in machine learning. TCSSE at pp. 1–2. The goal is to reduce processing overhead by reducing the amount of data analyzed at a given stage with a tradeoff between resource requirements and precision. Id. As noted in the TCSSE reference, the question as to how much data reduction to apply to a given problem is always data dependent, suggesting routine experimentation to determine the effective amount of reduction that can be sought while maintaining a desired amount of accuracy, or precision, in results. Id.
Based on the foregoing teachings, aimed at reducing the number of resources used in machine learning, one of ordinary skill would have readily recognized that it is important to determine the degree of dimensionality reduction possible in implementing Miyake’s two step analysis for identification and classification of audio. One of ordinary skill would have weighed the amount of audio needed to be processed at each step against the goals of each step. In particular, given the different goals of segmentation and classification respectively presented by Miyake’s two steps, one of ordinary skill would have reasonably addressed each step differently. On the one hand, since Miyake seeks to reduce the number of resources used in the first step, and since the first step is concerned with segmenting audio into relatively different categories (i.e., speech and noise), one of ordinary skill would have beneficially experimented with configuring the first step of segmenting audio to use less frames of audio in order to reflect features that are unique to a single audio event, such as speech or a burst of noise in speech. And on the other hand, since the second step is only invoked in noise cases, it may use more resources. Further, since the second step is concerned with differentiating noise into multiple different categories) one of ordinary skill would have beneficially experimented with configuring the second step to use more audio, or more frames, to more accurately or precisely characterize the type of audio event detected by the first step. For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos and the TCSSE references makes obvious all limitations of the claim.
Claim 2 depends on claim 1, and further requires the following:
“further comprising: outputting, by a loudspeaker, the processed audio signal as sound.”
Benattar outputs processed audio as sound to a user using a pair of headphones 305. Benattar at ¶¶ 286, 326, 330, FIGs.3, 10, 11. For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos and the TCSSE references makes obvious all limitations of the claim.
Claim 4 depends on claim 1, and further requires the following:
“wherein the first audio processing process comprises audio processing other than noise reduction; and
“wherein the second audio processing process comprises noise reduction.”
Similarly, Benattar describes enhancing (e.g., amplifying and equalizing, beamforming) speech and desired ambient noises while cancelling undesired ambient noises. Benattar at ¶¶ 97, 127–149, 160, 174, 176, 194. For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos and the TCSSE references makes obvious all limitations of the claim.
Claim 5 depends on claim 1, and further requires the following:
“wherein the noise of the first type corresponds to user-generated content (UGC) noise,
“wherein the noise of the second type corresponds to professionally-generated content (PGC) noise,
“wherein PGC is audio content that has been created professionally, and wherein UGC is audio content that has been created other than professionally.”
Benattar describes providing a set of user profiles to specify the types of ambient noises to allow. For example, in a concert setting, a received audio signal may contain a mix of audio in the environment. Benattar at ¶ 97. One of ordinary skill would have immediately recognized that ambient audio at a concert includes audio from the stage and audience sounds (e.g., cheering, singing along, applause). In the context of the Benattar-Miyake combination, this teaching suggests training a second AdaBoost classifier model to discriminate between user-generated content, such as audience noise, and professionally-generated content, such as performer produced noise.
Additionally, the Simou reference teaches and suggests training classification models to segregate user-generated content/recordings (UGC/R) from professional content/recording (PGC/R). Simou at Abs., § 1. Simou’s additional teachings on the subject would have suggested training Miyake’s AdaBoost classifier models to distinguish between UGC or PGC as another set of desired/undesired sounds for user customized processing according to Benattar. The models would then be used in Miyake’s AdaBoost classifiers to distinguish UGC from PGC. For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos, the TCSSE and the Simou references makes obvious all limitations of the claim.
Claim 6 depends on claim 1, and further requires the following:
“wherein the first machine learning model has been trained offline using positive training data and negative training data,
“wherein the positive training data includes training data corresponding to the noise of the first type and training data corresponding to the noise of the second type, and
“wherein the negative training data includes non-noise training data.”
The Miyake reference teaches and suggests training a first AdaBoost classifier’s model with both positive and negative training data in order to learn how to distinguish between speech and noise. Miyake at FIG.3. One of ordinary skill would have understood that because the first AdaBoost classifier distinguishes between clean speech on one hand and both noise and noisy speech on the other hand, the positive training data would include both noise and noisy speech while the negative training data would include clean speech. Applied to Benattar, this teaching would suggest providing positive training data that corresponds to two types of noise (e.g., desired ambient noise and undesired ambient noise) and negative training data that corresponds to non-noise (e.g., speech or music). For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos and the TCSSE references makes obvious all limitations of the claim.
Claim 7 depends on claim 1, and further requires the following:
“wherein calculating the first confidence score comprises:
“extracting a first plurality of features from the audio signal;
“classifying the audio signal by inputting the first plurality of features into the first machine learning model; and
“calculating a noise confidence score based on a result of classifying the audio signal.”
The Miyake reference teaches and suggests generating a confidence score                         
                            f
                            (
                            x
                            )
                        
                     by extracting features from an audio signal, inputting and analyzing the features in a model to generate a series of classifications                         
                            h
                            (
                            x
                            )
                        
                     and calculating                         
                            f
                            (
                            x
                            )
                        
                     by weighting and summing each classification. Miyake at § 3, FIG.3. For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos and the TCSSE references makes obvious all limitations of the claim.
Claim 8 depends on claim 7, and further requires the following:
“wherein a first plurality of features is extracted from a short clip that includes a current frame and a plurality of history frames,
“wherein the noise confidence score of the current frame results from inputting the first plurality of features of the short clip into the first machine learning model.”
Likewise, Miyake calculates a confidence score                         
                            f
                            (
                            x
                            )
                        
                     based on features from a current time and features from earlier frames. Miyake at § 3, FIG.3. In particular,                         
                            f
                            (
                            x
                            )
                        
                     is derived from a weighted hypothesis                         
                            
                                
                                    h
                                
                                
                                    t
                                
                            
                            (
                            x
                            )
                        
                    , where the weight                         
                            α
                        
                     is a function of error                         
                            
                                
                                    ϵ
                                
                                
                                    t
                                
                            
                        
                     that is a function of distribution                         
                            
                                
                                    w
                                
                                
                                    t
                                
                            
                        
                    . Id. Distribution                         
                            w
                        
                     is updated to                         
                            
                                
                                    w
                                
                                
                                    t
                                    +
                                    1
                                
                            
                        
                     for each frame based on inputs                         
                            
                                
                                    x
                                
                                
                                    t
                                
                            
                        
                    . Id. Thus, a confidence score                         
                            
                                
                                    f
                                    (
                                    x
                                    )
                                
                                
                                    t
                                    +
                                    1
                                
                            
                        
                     is a function of                         
                            x
                            (
                            t
                            )
                        
                     and                         
                            x
                            (
                            t
                            +
                            1
                            )
                        
                    . Id. For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos and the TCSSE references makes obvious all limitations of the claim.
Claim 9 depends on claim 7, and further requires the following:
“the method further comprising:
“calculating noise confidence scores for a plurality of frames in a clip; and
“calculating a noise confidence score for the clip as a weighted combination of the noise confidence scores for the plurality of frames.”
The Miyake reference teaches and suggests a classification process that operates on a frame-by-frame basis. Miyake at §§ 2, 3. Miyake does not teach or suggest processing the frames as clips containing multiple frames.
The Simou reference, however, extends the idea of real-time classification to the classification of clips in order to gauge quality. Simou at Abs., § 3. In Simou, a plurality of frames are scored. Id. at § 3.3. Simou then classifies an audio clip by aggregating each frame’s score to produce a clip score. Id.
Read in light of Benattar, this would have reasonably suggested classifying prerecorded audio in a wholistic (i.e., clip-based) way rather than in a real-time, frame-by-frame basis. For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos, the TCSSE and the Simou references makes obvious all limitations of the claim.
Claim 10 depends on claim 7, and further requires the following:
“wherein calculating the noise confidence score comprises:
“combining a plurality of outputs of a plurality of weak learners into a weighted sum; and
“converting the weighted sum into the noise confidence score using an inverse exponential function.”
Miyake teaches and suggests the use of an AdaBoost classifier that produces a noise confidence score                         
                            f
                            (
                            x
                            )
                        
                     from a plurality of weighted weak learner scores                         
                            h
                            (
                            x
                            )
                        
                     and a logarithmic (i.e., inverse exponential) function                         
                            α
                        
                    . Miyake at § 3, FIG.3. For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos and the TCSSE references makes obvious all limitations of the claim.
Claim 13 depends on claim 7, and further requires the following:
“wherein the first plurality of features includes one or more of a plurality of temporal features, a plurality of spectral features, a plurality of temporal-frequency features, and a first plurality of statistics, and/or
“wherein the first plurality of statistics comprises one or more of a mean and a standard deviation, where the mean is calculated based on one or more of the first plurality of features and the standard deviation is calculated based on one or more of the first plurality of features.”
This claim presents a SuperGuide1 grouping (“one or more of a [1], [2] and [3]”). The grouping here is construed in the disjunctive sense because one of ordinary skill would have understood that the group elements are possible alternative features for characterizing an input signal that could be used together or alone. Notably, the Specification does not distinguish between the use of one element of the group or two or more elements of the group.
The Miyake reference teaches and suggests extracting a plurality of log-mel power spectrum features, which are spectral features, and Mel-frequency cepstral coefficients, which are temporal-frequency features, as claimed. Miyake at § 2. For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos and the TCSSE references makes obvious all limitations of the claim.
Claim 18 depends on claim 1, and further requires the following:
“wherein the second machine learning model has been trained offline using positive training data and negative training data,
“wherein the positive training data includes training data corresponding to the noise of the second type, and
“wherein the negative training data includes training data corresponding to the noise of the first type.”
The Miyake reference teaches and suggests training a second AdaBoost classifier’s model with both positive and negative training data in order to learn how to distinguish between noise and noisy speech. Miyake at FIG.3. One of ordinary skill would have understood that because the second AdaBoost classifier distinguishes between noise and noisy speech, the positive training data would include noise and the negative training data would include noisy speech. Applied to Benattar, this teaching would suggest providing positive training data that corresponds to desired noise and negative training data that corresponds to undesired noise. For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos and the TCSSE references makes obvious all limitations of the claim.
Claim 19 depends on claim 1, and further requires the following:
“A non-transitory computer readable medium storing a computer program that, when executed by a processor, controls an apparatus to execute processing including the method of claim 1.”
Claim 20 depends on claim 1, and further requires the following:
“An apparatus for audio processing, the apparatus comprising: a processor, wherein the processor is configured to control the apparatus to execute processing including the method of claim 1.”
Claims 19 and 20 are treated together. Benattar’s method is implemented by a computer with a medium that stores instructions executed by a processor. Benattar at ¶¶ 157, 204–206, 286, 422, FIG.2. For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos and the TCSSE references makes obvious all limitations of the claim.
Claim 21 depends on claim 1, and further requires the following:
“wherein the first confidence score and the second confidence score are calculated in real time.”
The Benattar reference describes configuring its system to operate in real time. Applicant’s Spec. at ¶ 30 characterizes real time as processing on a clip-by-clip basis. This indicates that processing needs to occur in roughly 1 second, which would equate to about one clip of audio according to at least one example provided in Applicant’s Specification. While the Miyake reference does not specifically address the time of operation, as noted in the obviousness rejection of claim 1, a common goal of the Miyake, Giannakopoulos and the TCSSE references is to reduce dimensionality (e.g., amount of audio considered at each stage) and processing requirements (e.g., number of models operated at each step) in machine learning as much as possible. Further, Benattar sets a goal of providing real-time processing. See Benattar at ¶ 435. These findings would have reasonably suggested embodying the prior art system with the necessary known processing resources and reducing dimensionality (e.g., amount of audio processed) as much as possible to produce the fastest result possible to produce a real-time system. For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos and the TCSSE references makes obvious all limitations of the claim.
Claim 22 depends on claim 1, and further requires the following:
“wherein a given clip includes a plurality of short clips, wherein each of the plurality of short clips is overlapping with at least one other of the plurality of short clips.”
The Miyake reference similarly suggests sampling 200 ms of audio, windowing it with 20-ms windows that overlap by 10 ms. Miyake at § 7.1. In that case, a full clip of 200 ms would include multiple short, overlapping clips as claimed. For the foregoing reasons, the combination of the Benattar, the Miyake, the Giannakopoulos and the TCSSE references makes obvious all limitations of the claim.
Summary
Claims 1, 2, 4–10, 13 and 18–22 are rejected under at least one of 35 U.S.C. §§ 102 and 103 as being unpatentable over the cited prior art. In the event the determination of the status of the application as subject to AIA  35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA  35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 C.F.R. § 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
Allowable Subject Matter
Claims 11, 12, 14–17 are objected to for reciting allowable subject matter while depending on a rejected base claim. The claims would be allowable if rewritten in independent form including all limitations of their base claim and any and all intervening claims.
Claim 11 depends on claim 7, and further requires the following:
“wherein calculating the first confidence score further comprises:
“calculating an average root mean square gain of the audio signal,
“wherein calculating the noise confidence score comprises calculating the noise confidence score based on the result of classifying the audio signal and the average root mean square gain of the audio signal.”
The Miyake reference does not teach or suggest the calculation of an average root mean square gain of the audio signal and calculating a confidence score based on a classification and the average root mean square gain. For the foregoing reasons, claim 11 and its dependent claim 12 are allowable over the cited prior art.
Claim 14 depends on claim 7, and further requires the following:
“further comprising calculating a weight based on the noise confidence score,
“wherein calculating the second confidence score comprises:
“extracting a second plurality of features from the audio signal,
“wherein the second plurality of features is extracted over a longer time period than the first plurality of features is extracted;
“calculating a second plurality of statistics based on the second plurality of features,
“wherein the second plurality of statistics is weighted according to the weight;
“classifying the audio signal by inputting the second plurality of features and the second plurality of statistics into the second machine learning model; and
“calculating the second confidence score based on a result of classifying the audio signal.”
Miyake teaches and suggests calculating a weight             
                a
            
         based on the error             
                e
                t
            
         of a confidence score             
                h
                t
            
        . Miyake does not teach or suggest extracting a second plurality of features over a longer time period than the first plurality of features, calculating a second plurality of statistics based on the second plurality of features, weighting the statistics according to a weight based on a noise confidence score, classifying the audio signal by inputting the second plurality of features and statistics into a second machine learning model and calculating a second confidence score based on the classifying result. For the foregoing reasons, claim 14 and its dependent claims 15–17 are allowable over the cited prior art.
Response to Applicant’s Arguments
Applicant’s Reply (03 March 2026) has substantively amended all the claims. This Office action has been updated accordingly.
Applicant’s Reply at 9–11 further includes comments pertaining to the rejections included in the Non-Final Rejection (03 December 2025). Those comments have been considered, but are moot in light of the new grounds of rejection presented in this Office action.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 C.F.R. § 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 C.F.R. § 1.17(a)) pursuant to 37 C.F.R. § 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WALTER F BRINEY III whose telephone number is (571)272-7513. The examiner can normally be reached M-F 8 am-4:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Carolyn Edwards can be reached at 571-270-7136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Walter F Briney III/

Walter F Briney IIIPrimary ExaminerArt Unit 2692

4/24/2026


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 SuperGuide Corp. v. DirecTV Enters., Inc., 358 F.3d 870 (Fed. Cir. 2004).
Read full office action
Prosecution Timeline

Feb 22, 2024
Application Filed
Dec 03, 2025
Non-Final Rejection mailed — §103
Feb 16, 2026
Interview Requested
Feb 23, 2026
Examiner Interview Summary
Feb 23, 2026
Applicant Interview (Telephonic)
Mar 03, 2026
Response Filed
Apr 28, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/208,589
Patent 12641386
Loudspeakers
2y 11m to grant Granted May 26, 2026
18/472,152
Patent 12640131
DISPLAY APPARATUS
2y 8m to grant Granted May 26, 2026
18/400,758
Patent 12634634
SYSTEMS AND METHODS FOR STABILIZING A PLAYBACK DEVICE
2y 4m to grant Granted May 19, 2026
18/839,522
Patent 12634632
SPEAKERS AND METHODS
1y 9m to grant Granted May 19, 2026
18/106,714
Patent 12620402
HEARING DEVICE WITH ACCELERATION-BASED BEAMFORMING
3y 2m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
65%
Grant Probability
71%
With Interview (+5.2%)
3y 0m (~9m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 544 resolved cases by this examiner. Grant probability derived from career allowance rate.