Last updated: April 19, 2026
Application No. 18/376,486
Apparatus, Methods and Computer Programs for Audio Signal Enhancement Using a Dataset

Final Rejection §102§103
Filed
Oct 04, 2023
Examiner
ROBERTS, SHAUN A
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Nokia Technologies Oy
OA Round
2 (Final)
Interview Optional

— +10.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 647 resolved cases, 2023–2026
Examiner Intelligence

ROBERTS, SHAUN A View full profile →
Grants 76% — above average
Career Allow Rate
491 granted / 647 resolved
+13.9% vs TC avg
Moderate +10% lift
Without
With
+10.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
31 currently pending
Career history
678
Total Applications
across all art units
Statute-Specific Performance

§101
7.6%
-32.4% vs TC avg
§103
49.2%
+9.2% vs TC avg
§102
29.5%
-10.5% vs TC avg
§112
3.5%
-36.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 647 resolved cases
Office Action

§102 §103
DETAILED ACTION
1.	This action is responsive to remarks filed 11/11/2025.
Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
3.	Claims 1, 4, 6-7 have been amended.  The drawings have been amended and are accepted.
Response to Arguments
4.	Applicant's arguments filed have been fully considered but are not persuasive.
Applicant argues that cited prior art does not teach the limitations of claim 1, that the DNN is not for alternative real samples of the target use case, and also that it does not disclose anything on trading off performance loss for a general model.
Examiner respectfully disagrees. Regarding claim 1, and specifically, the first limitation, Cited prior art Bui teaches: 
Abstract: Methods and systems are provided for generating a customized speech recognition neural network system comprised of an adapted automatic speech recognition neural network and an adapted language model neural network. The automatic speech recognition neural network is first trained in a generic domain and then adapted to a target domain. The language model neural network is first trained in a generic domain and then adapted to a target domain. Such a customized speech recognition neural network system can be used to understand input vocal commands.

[0001] Oftentimes, it is desirable for computer programs (e.g., applications) to have verbal command-based functionality. A program capable of understanding and responding to verbal commands allows users to more easily interact with and use the program. Such verbal commands are often tied to particular features in the application. For instance, in an image editing application, a verbal command can be given to modify the saturation of an image. In this regard, speech recognition can be used as the basis for verbal command-based functionality. Speech recognition can be implemented by first using automatic speech recognition to process speech (e.g., one or more utterances) and the using a language model to understand and/or interpret the processed speech.

[0005] Embodiments of the present disclosure are directed to a customizable speech recognition system capable of recognizing speech related to a specific domain. One method described herein for creating such a system is using a neural network(s). A neural network can initially be trained to perform automatic speech recognition. The neural network can further undergo domain adaptation to customize the network for a specific target domain. In particular, a customizable speech recognition neural network system can be trained for a generic domain and then adapted to a target domain of interest. This is advantageous because the customizable speech recognition system takes advantage of the large dataset of the generic domain to initially train the system for speech recognition and then uses the small custom dataset of the target domain to augment the system for speech recognition in a target domain.

[0021] Accordingly, embodiments of the present disclosure are directed to facilitating the creation of a customizable speech recognition system capable of accurately recognizing speech related to a specific domain. Advantageously, adapting a speech recognition system to a specific domain ensures that the system understands words, phrases, terms, etc. related to the domain.

The cited prior art reads on the limitations as currently claimed.  The current claim language is broad enough to allow the art to read on the limitations.  Applicant's arguments appear to be narrower than what is currently recited.  For example, while Applicant presents arguments about the DNN, the claims make no mention of the DNN.
Further incorporating more specific language (specifically classifying (and/or limiting) "audibility of sounds"; and language around the DNN and performance loss trade off or comparison) may help to advance prosecution.
	The additional independent and dependent claims are allowed based on arguments presented above and art rejections below.


Claim Rejections - 35 USC § 102
5.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

6.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

7.	Claims 1-2, 4-6, 8-9, 13-16, 18, 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Bui et al (2020/0327884).

Regarding claim 1 Bui et al (2020/0327884) teaches An apparatus (abstract; fig 10; para 0005: customizable speech recognition system), comprising:
at least one processor (fig 10); and 
at least one memory storing instructions that, when executed with the at least one processor (fig 10), cause the apparatus at least to:
enable access to a trained computer program wherein the trained computer program is configured for processing one or more audio signals to enhance audibility of sounds within the one or more audio signals and wherein the trained computer program is trained using a generic dataset
([0024] To train the automatic speech recognition neural network, an encoder-decoder architecture can be employed. Such an architecture can include an encoder, an attention unit, and a decoder. The encoder can be used to learn feature representations that capture correlations between sub-phonetic units and the output of the system. The attention unit can be used to estimate the relative importance of each feature in determining the correct output. The decoder can be used to construct an output using learned representations.
[0025] The automatic speech recognition neural network can first be trained using a generic dataset. The generic dataset can be a large speech-based dataset. Such a large dataset contains enough data to train the neural network to be highly accurate for the generic domain and avoid any over-fitting. Training of the automatic speech recognition neural network using the generic dataset can continue until the network converges to a state where the output reaches a desired threshold level of accuracy.); 
obtain a dataset wherein the dataset comprises data samples with inputs and outputs for the computer program (0048: A target dataset can be a targeted speech-based dataset custom to a particular domain (e.g., application, program, etc.).; 0071); and 
update the trained computer program for processing one or more audio signals to enhance audibility of sounds within the one or more audio signals using the dataset wherein the update of the trained computer program comprises training the computer program using at least part of the dataset and evaluating the performance of the updated computer program for at least part of the dataset and for at least part of the generic dataset
(5; 22; 25-26; 71;
[0005] Embodiments of the present disclosure are directed to a customizable speech recognition system capable of recognizing speech related to a specific domain. One method described herein for creating such a system is using a neural network(s). A neural network can initially be trained to perform automatic speech recognition. The neural network can further undergo domain adaptation to customize the network for a specific target domain. In particular, a customizable speech recognition neural network system can be trained for a generic domain and then adapted to a target domain of interest. This is advantageous because the customizable speech recognition system takes advantage of the large dataset of the generic domain to initially train the system for speech recognition and then uses the small custom dataset of the target domain to augment the system for speech recognition in a target domain.;
[0071] Upon reaching a desired threshold of accuracy in the generic domain, the automatic speech recognition neural network can undergo domain adaptation. Adapting the neural network updates the automatic speech recognition neural network trained in the generic domain and modifies the parameters/weights of the automatic speech recognition neural network for the target domain. Advantageously, adapting an automatic speech recognition neural network takes advantage of using a large generic dataset to obtain a neural network that is highly accurate at predicting characters or byte-pairs from input audio. This highly accurate neural network can then be tailored to the specific target domain of interest to ensure that the network understands words, phrases, terms, etc. related to the target domain.;
0078-0079: during adaptation …network can be updated for error; cross-entropy loss can be used to determine differences;
[0120] At block 410, the parameters of the automatic speech recognition model can be adapted to the target domain. Adapting the automatic speech recognition model maintains the high accuracy at predicting byte-pairs from input audio learned from the generic domain while tailoring the model to the specific target domain of interest. In this way, adaptation ensures that the model understands words, phrases, terms, etc. related to the target domain.).  

Regarding claim 2 Bui teaches An apparatus as claimed in claim 1 wherein the dataset comprises at least a subset of data that is not comprised within the generic dataset (5); and no data that is comprised within the generic dataset (005: generic domain; target domain; 25-26; 0118: target dataset (e.g. a small speech-based dataset for a target domain)). 

Regarding claim 4 Bui teaches An apparatus as claimed in claim 1, wherein the instructions, when executed with the at least one processor, cause the apparatus to 
triqqer the obtaining of the dataset with one or more of: 
an input with an end-user, a request with an end-user device, a request with an end-user application, an expiry of a time period relating to the trained computer program, or an output of a similarity evaluation between the generic dataset and the dataset (0049: the datasets can be input into data store from a remote device, such as from a server or a user device).

Regarding claim 5 Bui teaches An apparatus as claimed in claim 1, wherein the instructions, when executed with the at least one processor, cause the apparatus to obtain the dataset using one or more of: real world measurements; or simulators (0048 one or more datasets (generic and target); derived from audio recordings, etc).  

Regarding claim 6 Bui teaches An apparatus as claimed in claim 1, wherein the instructions, when executed with the at least one processor, cause the apparatus to perform: training the computer program using a first subset of the dataset; and evaluating the performance of the updated computer program using a second subset of the dataset, wherein the data of the first subset and the second subset are disjoint (0119: target dataset can be divided into a training set and a test set).  


Regarding claim 8 Bui teaches An apparatus as claimed in claim 1, wherein the updated trained computer program comprises an iterative process wherein respective iterations comprise the instructions, when executed with the at least one processor, causing the apparatus to perform evaluating the performance of the updated computer program for the at least part of the dataset and for the at least part of the generic dataset (0069; 0071; 78-79 network can be updated for error ; cross entropy loss; 113; 119 training set and test set).  

Regarding claim 9 Bui teaches An apparatus as claimed in claim 1, wherein the instructions, when executed with the at least one processor, cause the apparatus to evaluate the performance of the updated computer program for the at least part of the generic dataset with tracking a performance loss (0069: during initial training…the network can be updated for error; error in the network can be determined using, for example, cross-entropy loss; 0078: during the adaptation …the network can be updated for error; 0079: cross-entropy loss can be used to determine differences between the output from the network and a ground truth output).  


Regarding claim 13 Bui teaches An apparatus as claimed in claim 1, wherein the processing of the one or more audio signals comprises at least one of: 
acoustic echo cancellation; noise suppression; residual echo suppression; speech enhancement; speech dereverberation; wind noise reduction; or sound source separation
([0005] Embodiments of the present disclosure are directed to a customizable speech recognition system capable of recognizing speech related to a specific domain. One method described herein for creating such a system is using a neural network(s). A neural network can initially be trained to perform automatic speech recognition. The neural network can further undergo domain adaptation to customize the network for a specific target domain. In particular, a customizable speech recognition neural network system can be trained for a generic domain and then adapted to a target domain of interest. This is advantageous because the customizable speech recognition system takes advantage of the large dataset of the generic domain to initially train the system for speech recognition and then uses the small custom dataset of the target domain to augment the system for speech recognition in a target domain.;
[0071] Upon reaching a desired threshold of accuracy in the generic domain, the automatic speech recognition neural network can undergo domain adaptation. Adapting the neural network updates the automatic speech recognition neural network trained in the generic domain and modifies the parameters/weights of the automatic speech recognition neural network for the target domain. Advantageously, adapting an automatic speech recognition neural network takes advantage of using a large generic dataset to obtain a neural network that is highly accurate at predicting characters or byte-pairs from input audio. This highly accurate neural network can then be tailored to the specific target domain of interest to ensure that the network understands words, phrases, terms, etc. related to the target domain.;
[0120] At block 410, the parameters of the automatic speech recognition model can be adapted to the target domain. Adapting the automatic speech recognition model maintains the high accuracy at predicting byte-pairs from input audio learned from the generic domain while tailoring the model to the specific target domain of interest. In this way, adaptation ensures that the model understands words, phrases, terms, etc. related to the target domain. ).  

Regarding claim 14 Bui teaches An apparatus as claimed in claim 1, wherein the computer program comprises a machine learning model (0005).  

Regarding claim 15 Bui teaches An apparatus as claimed in claim 14, wherein the machine learning model comprises a neural network circuit (0005: system using a neural network).  

Regarding claim 16 Bui teaches A method, comprising:
enabling access to a trained computer program wherein the trained computer program is configured for processing one or more audio signals to enhance audibility of sounds within the one or more audio signals and wherein the trained computer program is trained using a generic dataset; 
obtaining a dataset wherein the dataset comprises data samples with inputs and outputs for the computer program; and 
updating the trained computer program for processing one or more audio signals to enhance audibility of sounds within the one or more audio signals using the dataset wherein the updating of the trained computer program comprises training the computer program using at least part of the dataset and evaluating the performance of the updated computer program for at least part of the dataset and for at least part of the generic dataset.  
Claim recites limitations similar to claim 1 and is rejected for similar rationale and reasoning 

Regarding claim 18 Bui teaches A non-transitory program storage device readable with an apparatus, tangibly embodying a program of instructions that when executed with the apparatus, cause the apparatus to perform at least:
enabling access to a trained computer program wherein the trained computer program is configured for processing one or more audio signals to enhance audibility of sounds within the one or more audio signals and wherein the trained computer program is trained using a generic dataset; 
obtaining a dataset wherein the dataset comprises data samples with inputs and outputs for the computer program; and 
updating the trained computer program for processing one or more audio signals to enhance audibility of sounds within the one or more audio signals using the dataset wherein the updating of the trained computer program comprises training the computer program using at least part of the dataset and evaluating the performance of the updated computer program for at least part of the dataset and for at least part of the generic dataset.  
Claim recites limitations similar to claim 1 and is rejected for similar rationale and reasoning 

Regarding claim 20 Bui teaches A method as claimed in claim 16, further comprising evaluating the performance of the updated computer program for the at least part of the generic dataset with tracking a performance loss.  
Claim recites limitations similar to claim 9 and is rejected for similar rationale and reasoning 



Claim Rejections - 35 USC § 103
8.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

9.	Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Bui in view of Saxena et al (2019/0327271).

Regarding claim 7 Bui teaches An apparatus as claimed in claim 1, wherein the instructions, when executed with the at least one processor, cause the apparatus to perform: training the computer program using a first subset of the dataset; and evaluating the performance of the updated computer program using a second subset of the dataset, {wherein the data of the first subset and the second subset are at least partly overlapping} (0119);
But does not specifically teach where Saxena teaches wherein the data of the first subset and the second subset are at least partly overlapping (0021 supervised learning for each node and cluster in step (3) may be performed by the following method: (a) inputting a vector for each node or edge where the vector contains some or all of the attributes of the node or edge along with all cluster tags and other tags to a deep learning neural network; (b) training the deep learning neural network on test data created from the set of all nodes and edges in the graph; (c) testing the trained model using a dataset created from the set of all nodes an edges such that the test set has minimal or no overlap with training set; (d) predicting clusters and other attributes of any new node or edge being added to the graph based on the model resulting from step (c).). 
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Saxena presenting a reasonable expectation of success in still allowing for the training and testing of the network, and ultimately improving the network by adapting to incorporate the additional dataset.
Bui already teaches training and evaluating using a first and second subset. Saxena teaches utilizing neural networks in processing data for computer (IT) infrastructure.  Thus, one could look to (the specific portion of) Saxena to further have datasets with some overlap while still allowing for the training and evaluation to take place using the multiple subsets for the network.


10.	Claims 10 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Bui in view of Sun et al (2021/0064372)

Regarding claim 10 Bui does not specifically teach where Sun teaches An apparatus as claimed in claim 9, wherein the instructions, when executed with the at least one processor, cause the apparatus to perform using inference of the updated computer program to track the performance loss (0046: The back propagation computations may be a common back propagation process in deep neural network (DNN) training, except that the loss function is obtained as described above (e.g., based on the high and low precision inferences). 
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Sun to track loss using inference for an improved system, with reduced complexity and faster training. 
Bui already teaches determining/tracking loss for the network.  Sun teaches low precision machine learning that incorporates inference, and one could look to Sun to further incorporate the inference in determining loss (as discussed in Bui) for reduced complexity and computational demand, allowing for quicker evaluation and training. 

Claim 21 recites limitations similar to claim 10 and is rejected for similar rationale and reasoning 


11.	Claims 11-12, 22-23 are rejected under 35 U.S.C. 103 as being unpatentable over Bui in view of Zhou et al (2018/0096678),

Regarding claim 11 Bui does not specifically teach where Zhou teaches An apparatus as claimed in claim 1, wherein the instructions, when executed with the at least one processor, cause the apparatus to obtain a balance parameter wherein the balance parameter indicates a level of impact on the performance of the updated computer program for the at least part of the generic dataset (parameter that compares the datasets abstract; 54-55).  
	It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Zhou for an improved system, to better evaluate performance for improved evaluation, determination, and training of the target dataset.
	Bui already teaches adapting the neural network model using the target dataset and optimization/determining loss.  Zhou (also) teaches a general and domain specific speech recognition, and comparing multiple speech recognition performance.  Thus, one could look to Zhou to allow for the models to be compared to improve the operation of speech recognition systems (Zhou 0001) for better evaluation and determination of the appropriate models to use for customization. 

Regarding claim 12 Bui and Zhou teach An apparatus as claimed in claim 11 wherein the balance parameter indicates a level of performance of the updated computer program for the at least part of the dataset that is used to evaluate the performance of the updated computer program.  
Rejected for similar rationale and reasoning as claim 11

Claim 22 recites limitations similar to claim 11 and is rejected for similar rationale and reasoning 
Claim 23 recites limitations similar to claim 12 and is rejected for similar rationale and reasoning 


Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAUN A ROBERTS whose telephone number is (571)270-7541.  The examiner can normally be reached Monday-Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool.  To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.
For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAUN ROBERTS/Primary Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Oct 04, 2023
Application Filed
Aug 12, 2025
Non-Final Rejection — §102, §103
Nov 11, 2025
Response Filed
Dec 12, 2025
Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/312,688
Patent 12586599
AUDIO SIGNAL PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM WITH MACHINE LEARNING AND FOR MICROPHONE MUTE STATE FEATURES IN A MULTI PERSON VOICE CALL
2y 5m to grant Granted Mar 24, 2026
18/484,282
Patent 12586568
SYNTHETICALLY GENERATING INNER SPEECH TRAINING DATA
2y 5m to grant Granted Mar 24, 2026
18/179,756
Patent 12573376
Dynamic Language and Command Recognition
2y 5m to grant Granted Mar 10, 2026
18/629,200
Patent 12562157
GENERATING TOPIC-SPECIFIC LANGUAGE MODELS
2y 5m to grant Granted Feb 24, 2026
18/484,538
Patent 12555562
VOICE SYNTHESIS FROM DIFFUSION GENERATED SPECTROGRAMS FOR ACCESSIBILITY
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
76%
Grant Probability
86%
With Interview (+10.3%)
2y 10m
Median Time to Grant
Moderate
PTA Risk
Based on 647 resolved cases by this examiner. Grant probability derived from career allow rate.