Last updated: April 19, 2026

Application No. 18/062,815

AUDIO DEVICE WITH AUDIO QUALITY DETECTION AND RELATED METHODS

Final Rejection §103

Filed

Dec 07, 2022

Examiner

ZHANG, LESHUI

Art Unit

2695

Tech Center

2600 — Communications

Assignee

Gn Audio A/S

OA Round

4 (Final)

Interview Optional

— +36.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 928 resolved cases, 2023–2026

Examiner Intelligence

ZHANG, LESHUI View full profile →

Grants 78% — above average

Career Allow Rate

719 granted / 928 resolved

+15.5% vs TC avg

Strong +36% interview lift

Without

With

+36.0%

Interview Lift

resolved cases with interview

Typical timeline

2y 10m

Avg Prosecution

47 currently pending

Career history

975

Total Applications

across all art units

Statute-Specific Performance

§101

5.5%

-34.5% vs TC avg

§103

42.5%

+2.5% vs TC avg

§102

13.6%

-26.4% vs TC avg

§112

28.7%

-11.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 928 resolved cases

Office Action

§103

DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
This Office Action is in response to claim amendment communication filed on December 22, 2025 and wherein claims 1, 14, 20 amended, claims 2, 15 remained cancellation status.
In virtue of this communication, claims 1, 3-14, 16-21 are currently pending in this Office Action.
The Office appreciates the explanation of the amendment and analyses of the prior arts, and however, although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993) and MPEP 2145.
In the response to this office action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-4, 6-14, 16-21 are rejected under 35 U.S.C. 103 as being unpatentable over Talwar et al. (US 20110125500 A1, hereinafter Talwar) and in view of reference Assem et al. (US 20150156324 A1, hereinafter Assem).
Claim 1: Talwar teaches an audio device (title and abstract, ln 1-14, an ASR system in fig. 2) for speech quality detection (through method in fig. 3, wherein identifying distortion from the received audio signal at step 310 in fig. 3), the audio device comprising an interface (including an acoustic interface 33 for digitizing the speech into acoustic data in fig. 2, para 30, 34; communications bus 44 or an entertainment bus 46, including CAN, MOST, LIN, LAN, and other appropriate connections, para 15, a dual antenna 56, etc., para 17, and connection from module to module, e.g., a connection from decoder/classifier 214 to the post-processor 216, etc. in fig. 2), a processor (a processor 52 in fig. 1, para 17), and a memory (memory 54 in fig. 1, para 17), wherein the audio device is configured to:
obtain, via the interface (including the acoustic interface 33 in fig. 2), a microphone input signal from one microphone including a first microphone (microphone 32 in fig. 1 and 2);
process the microphone input signal for provision of an output signal (via pre-processor 212 and then decoder/classifier, etc. by taking an output from element 33 as input and output signal from the decoder/classifier 214 in fig. 2, including constructed sentences from recognized subwords, para 41);
determine, using a non-intrusive quality detection model (including post-processor 216, distortion models 221, etc., in fig. 2) stored in the memory (memory 54 to store speech recognition software and databases, para 30), one or more quality parameters (determined one or more distortion models 221 containing distortion-related acoustic features of various types of distortion, and confidence values, based on feedback from the post-processor 216 for training, para 49, and used with other aspects of the ASR system of para 42) including a first quality parameter (at least one distortion hypothesis identified by the post-processor 216 among the plurality of distortion hypotheses, para 51 or a highest likelihood score for the identified and selected subword, para 38 or feedback for the pre-processor 212 to update parameters via training for the pre-processor module 212, para 42) indicative of a speech quality associated with the output signal (the at least one distortion hypothesis identified as a particular distortion hypothesis with highest ranking, para 51 and the recognized speech including desired vocabulary while distortion includes undesirable ambient noise, transient noise, and/or electronics noise, para 46, and the identified at least one distortion hypothesis is used to improve speech signal processing, para 44, at step 330 in fig. 3, para 53, 59, and the distortion models 221 are used as an aid to classify signal distortion, associated with identified subwords and constructed sentences inherently);
control processing of the microphone input signal based on the first quality parameter (at least one of speech decoder 214, acoustic interface 33, the pre-processor 212, or the acoustic models 220 is modified or improved based on the identified particular distortion hypothesis outputted from the post-processor 216, para 53) associated with the output signal (feedback  to pre-processor 212 and distortion models 221, etc., and associated with the output signal from the decoder/classifier 214 in fig. 2, and through the post-processor 216, para 42); and
transmit, via the interface, the output signal (e.g., as vehicle device or device function control, para 29 and via the communications bus 44 in fig. 1, para 44 or voice dialing through a dual antenna 56, para 17 and for the recognized and constructed sentences outputted from the decoder/classifier 214, para 41).
However, Talwar does not explicitly teach one or more microphones from which the disclosed microphone signal is obtained and does not explicitly teach wherein the first quality parameter is a mean opinion score, wherein to control processing of the microphone input signal based on the first quality parameter comprises determining whether the mean opinion score satisfies a first criterion.
Assem teaches an analogous field of endeavor by disclosing an audio device (title and abstract, ln 1-13 and a system in fig. 1, including multiple computing devices 110 and multi-party VoIP conference call system 140, in fig. 1) for speech quality detection (through obtaining mean opinion score MOS for providing an assessment of audio quality, para 12), comprising:
	an interface (VoIP managed by the multi-party VoIP conference call system 140 over a network 180 in fig. 1, para 39, and thus, an interface of the element 140 to the network 180 is inherency, including input and output interface);
a processor (including a processor of the computer, para 17 and included in the multi-arty VoIP conference call system 140, para 24); and
	a memory (memory ROM, EPROM, Flash memory, CD-ROM, etc., para 14),
wherein the audio device is configured to 
obtain, via the interface (the circuit for the capture of at least audio signals, para 22), a microphone input signal (through microphone of smartphone, para 22) from one or more microphones (microphones of multiple computing devices 110, 120) including a first microphone (one microphone on the device 110);
process the microphone input signal (focus component 145 to transcode the received audio signals, including mixing, decoding, and then re-encoding for distribution, etc., para 26) for provision of an output signal (providing distribute audio signal to other parties in fig. 1);
determine, using a quality detection model (QoE as perceived by the receiving party is detected and represented by MOS through focus-effect coefficients and E-Model R-Factor equation, para 12 and at step 230 in fig. 2) stored in the memory (as software and stored in the memory discussed above), one or more quality parameters (including R-value, and focus-effect coefficients as input to a correction function that produces a corrected MOS, para 12) including a first quality parameter indicative of a speech quality associated with the output signal (including a MOS representing QoE perceived by the receiving party, para 12);
control processing of the microphone input signal based on the first quality parameter (modifying the received microphone signal based on the corrected mean opinion score MOS, para 7) associated with the output signal (the corrected MOS is measured after transcoding processing, para 7); and 
transmit, via the interface, the output signal (distributing the transcoded signal to the other parties over the network 180 in fig. 1, para 7, 25), 
wherein the first quality parameter is a mean opinion score (corrected mean opinion score MOS outputted QoE calculator 155 in fig. 1, para 30-33 and QoE representing the MOS, para 43-44),
wherein to control processing of the microphone input signal based on the first quality parameter comprises determining whether the mean opinion score satisfies a first criterion (determining whether the calculated MOS satisfying threshold at step 235 in fig. 2, para 42-45) for benefits of improving the speech quality in real-time environment (real-time application in voice over IP and conference environment, para 6, 8) and in a cost saving manner (para 3).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the one or more microphones and wherein the first quality parameter is the mean opinion score, wherein to control processing of the microphone input signal based on the first quality parameter comprises determining whether the mean opinion score satisfies the first criterion, as taught by Assem, to the microphone and the first quality parameter included in one or more quality parameters in the audio device, as taught by Talwar, for the benefits discussed above.
Claim 14 recited a method that essentially same as the processing steps of the audio device as recited in claim 1 and thus, claim 14 is rejected according to claim 1 above.
Claim 3: the combination of Talwar and Assem further teaches, according to claim 1 above, wherein to determine the one or more quality parameters comprises applying the non-intrusive quality detection model (Talwar, including post-processor 216 and distortion models 221 in fig. 2 and the discussion in claim 1 above based on the output signal, and Assem, the QoE or MOS is based on the received microphone signal from parties 110, 120 and transcoded microphone signal, the discussed in claim 1 above) to a model input based on one or both of the output signal (Assem, the QoE calculator 155 based on E-Model R-Factor equation 160 is input to the QoE Corrector 165 by using correction function 170 in fig. 1, para 31-32) and the microphone input signal (Talwar, a part of the decoder/classifier 214 for receiving one or more distortion models to produce N-best hypotheses and associated parameter values, para 49, etc., the discussion in claim 1 above and Assem, through the transcoding applied to the received microphone signals from the parties 110, 120).
Claim 4: the combination of Talwar and Assem further teaches, according to claim 1 above, wherein to determine the one or more quality parameters comprises 
determining an output quality parameter associated with the output signal (Talwar, e.g., parameter values include confidence values in the N-best hypotheses, or one or more distortion models used by the decoder/classifier 214, para 49 and about sampling rate or amplification gain of the microphone signal, para 80, discussed above and Assem, corrected QoE via the QoE corrector 165 by using correction function 170 in fig. 1, para 31-32 and based on audio signal perceived by receiving party, i.e., output signal, para 12) and an input quality parameter associated with the microphone input signal (QoE or MOS calculated from QoE Calculator 155, E-Model R-Factor 160 in fig. 1 and based on the received signal from the parties 110, 120 in fig. 1), and wherein the first quality parameter is determined based on the output quality parameter and the input quality parameter (Assem, the overall QoE is calculated based on the QoE from the QoE Calculator 155 and QoE corrector 165 in fig. 1, para 33),
Claim 6: the combination of Talwar and Assem further teaches, according to claim 1 above, wherein processing the microphone input signal for provision of an output signal comprises applying a noise suppression scheme (Talwar, adaptive filters, noise suppression algorithms, speech equalization, speech compensation, speech enhancement, are implemented in acoustic interface 33 and/or the pre-processor 212, para 54 and Assem, modifying operating parameters to improve the MOS at step 255, and different codec at step 240), to controlling processing of the microphone input signal based on the first quality parameter comprises controlling the noise suppression scheme based on the first quality parameter (Talwar, the feedback from the post-processor 216 is used to train adaptation parameters for the pre-processor module 212, para 42, the pre-processor module 212 can implement the FIR filter, para 54 and Assem, over steps 255 to modify parameters and step 240 to change different codec in fig. 2).
Claim 7: the combination of Talwar and Assem further teaches, according to claim 1 above, wherein to process the microphone input signal for provision of an output signal comprises applying an echo cancellation scheme, and to control processing of the microphone input signal based on the first quality parameter comprises to control the echo cancellation scheme based on the first quality parameter (Talwar, including adaptive filter of the acoustic interface 33 and/or the pre-processor 212 to perform an echo cancellation algorithms whose parameters are adjusted, para 54 and controlled by the feedback from the post-processor 216, para  as discussed in claim 6 above, para 42).
Claim 8: the combination of Talwar and Assem further teaches, according to claim 1 above, wherein to determine the one or more quality parameters comprises determining a first score associated with a first feature of the output signal (Talwar, including the confidence values, para 49, e.g., a first best, i.e., highest confidence value from the confidence values, of an N-best list of distortion hypotheses, similar to identification of subword or utterance hypotheses, para 51 and Assem, based on the E-model R-factor equation 160, para 30-31), wherein the first quality parameter is based on the first score (Talwar, the identified distortion hypothesis with highest confidence value, para 51 and Assem, the QoE corrected is based on the QoE calculated over element 155, 160 in fig. 1, para 31-32).
Claim 9: the combination of Talwar and Assem further teaches, according to claim 8 above, wherein to determine the one or more quality parameters comprises determining a second score associated with a second feature of the output signal (Talwar, likelihood score for each observed feature vector of each subword, para 38, and the feature vectors include vocal pitch, energy profiles, spectral attributes, and/or cepstral coefficients that are obtained by performing FFT of frames, etc., para 35 and Assem, the corrected QoE through the QoE corrector 165 by using correction function 170 in fig. 1, para 31-32, 37).
Claim 10: the combination of Talwar and Assem further teaches, according to claims 1, 8-9 above, wherein to determine the one or more quality parameters comprises: determining a third score associated with a third feature of the output signal (Talwar, likelihood score for each observed feature vector of each subword, para 38, and the feature vectors include vocal pitch, energy profiles, spectral attributes, and/or cepstral coefficients, as third feature, that are obtained by performing FFT of frames, etc., para 35 and Assem, the QoE value after changed codec at step 245-250), wherein the first quality parameter is based on the third score (Talwar, either averaging scores and averaging is too low, for any given distortion in convention approach, para 4 or likelihood score for each observed feature vector of each subword, and those scores are used to reorder the N-best list of hypotheses, by which, a highest likely score is selected among them, i.e., the speech quality is obtained according to the highest likely score based on the reordered N-best list of hypotheses and the calculated the MOS at step 245 or further calculated MOS by modification of the operation parameters at step 255 in fig. 1 and Assem, the final QoE is based on the further calculation of QoE or MOS after modified codec at step 240 and/or modified operating parameters at step 255 in fig. 2).
Claim 11: the combination of Talwar and Assem further teaches, according to claim 1 above, wherein to determine the one or more quality parameters comprises to determine a combined score associated with two or more of the first feature, the second feature, and the third feature, wherein the first quality parameter is based on the combined score (Talwar, the discussion in claim 10 above, the likelihood score for each observed feature vector of each subword, para 38, and the feature vectors include vocal pitch as second feature, energy profiles as third feature, spectral attributes, and/or cepstral coefficients that are more than four features associated with their likelihood scores, and combining the scores for obtaining the model having highest likelihood score according to reordered N-best list, para 38 and Assem, the combined QoE is based on the corrected QoE and the calculated QoE, by comparing the QoE threshold stored in the focus-effect table 177 and database 175 in fig. 1, para 32, para 34).
Claim12: the combination of Talwar and Assem further teaches according to claim 1 above, wherein the first quality parameter indicative of a speech quality associated with the output signal is determined based on the output signal (the discussion in claim 1 above, Talwar, the distortion models 221 outputted from the post-processor 216 is based on the microphone signal from the microphone 32 via the pre-processor 212, decoder/classifier 214, etc., and Assem, based on the perceived signal the receiving party perceived, i.e., output signal from the element 140 in fig. 1, para 12, and the discussion in claim 1 above).
Claim 13: the combination of Talwar and Assem further teaches, according to claim 1 above, wherein the first quality parameter indicative of a speech quality associated with the output signal is based on the microphone input signal (Talwar, the output from post-processor about distortion models 221 is indicative of speech quality associated with the output signal from post-processor 216 and based on the acoustic language sound received through the microphone 32, through decoder/classifier 214, pre-processor 212, etc., in fig. 2 and Assem, the correct QoE or MOS is special based on the focus effect table 177 in fig. 1, para 37).
Claim 16: the combination of Talwar and Assem further teaches, according to claim 1 above, wherein the first quality parameter is indicative of one or more of speech distortion, noise attenuation, and echo annoyance (Talwar, adaptive filters, noise suppression algorithms, speech equalization, speech compensation, speech enhancement, are implemented in acoustic interface 33 and/or the pre-processor 212, para 54 and Assem, QoE or quality of experience accounted for degradation effects, abstract, para 8).  
Claim 17: the combination of Talwar and Assem further teaches, according to claims 1, 3 above, wherein to determine the one or more quality parameters comprises: 
applying the non-intrusive quality detection model to a model input based on both of the output signal and the microphone input signal (Talwar, including post-processor 216 and distortion models 221 in fig. 2 and the discussion in claim 3 above and Assem, QoE calculated by QoE Calculator is based on the received party signals for each of parties individually, para 30 and Corrected QoE via QoE Corrector 165 is based on the output signal perceived by the receiving party of the output audio signal, para 12, i.e., based on both received microphone signal and delivered and processed signals to receiving parties,) to a model input based on both of the output signal (Talwar, the discussed above and in claim 1, and Assem, discussed above, the received signal is to Focus Component for transcoding the received signal, para 25-27) and the microphone input signal (Talwar, a part of the decoder/classifier 214 for receiving one or more distortion models to produce N-best hypotheses and associated parameter values, para 49, etc., the discussion in claim 1 above and Assem, the microphones signal received by the multi-party VoIP Conference Call system 140 and provided by the parties 110, 120 in fig. 1).
Claim 18 has been analyzed and rejected according to claims 14, 16 above.
Claim 19 has been analyzed and rejected according to claims 14, 17 above.
Claim 20 has been analyzed and rejected according to claims 1, 3, 14, 17 above.
Claim 21 has been analyzed and rejected according to claims 20, 16 above.

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Talwar (above) and in view of references Assem (above) and Willett et al. (US 20200043468 A1, hereinafter Willett).
Claim 5: the combination of Talwar and Assem teaches all the elements of claim 5, according to claim 1 above, including the non-intrusive quality detection model (the discussion in claim 1 above), except a machine learning model comprising a trained neural network included in the non-intrusive quality detection model.
Willett teaches an audio device (title and abstract, ln 1-14, an ASR system in fig. 2) for speech quality detection (through a trained model 214 for outputting estimated parameters so that more accurate text is outputted from the ASR 224 than output text without estimated parameters, para 46) and wherein the non-intrusive quality detection model is disclosed (model 214 with no metadata in fig. 2C, and thus, non-intrusive type of quality detection model, para 46) to comprise a machine learning model comprising a trained neural network (element 214 is trained model, and the model 214 is a machine learning model, para 38 and can be one of a feedforward neural network, unidirectional or bidirectional recurrent neural network, a convolutional neural network, or a support vector machine model, para 49) for benefits of improving the quality of speech recognition result (increasing accuracy of the recognized speech, para 46 above by accurately estimating the parameters such as 90% confidence level, para 72) with simple implementation (para 3).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the non-intrusive quality detection model comprising the machine learning model comprising a trained neural network, as taught by Willett, to the non-intrusive quality detection model in the audio device, as taught by the combination of Talwar and Assem, for the benefits discussed above.

Response to Arguments

Applicant's arguments filed on December 22, 2025 have been fully considered and but are moot in view of the new ground(s) of rejection necessitated by the applicant amendment. The Office has thoroughly reviewed Applicants' arguments but firmly believes that the cited references to reasonably and properly meet the claimed limitations.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589. The examiner can normally be reached Monday-Friday 6:30amp-4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached at 571-272-7848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LESHUI ZHANG/
Primary Examiner, 
Art Unit 2695

Read full office action

Prosecution Timeline

Dec 07, 2022

Application Filed

Dec 28, 2024

Non-Final Rejection — §103

Feb 19, 2025

Response Filed

May 08, 2025

Final Rejection — §103

Jul 25, 2025

Examiner Interview Summary

Jul 25, 2025

Applicant Interview (Telephonic)

Aug 14, 2025

Request for Continued Examination

Aug 18, 2025

Response after Non-Final Action

Sep 20, 2025

Non-Final Rejection — §103

Dec 22, 2025

Response Filed

Mar 24, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/124,589

Patent 12585677

AUTOMATED GENERATION OF IMPROVED LIST-TYPE ANSWERS IN QUESTION ANSWERING SYSTEMS

2y 5m to grant Granted Mar 24, 2026

17/726,728

Patent 12572757

VIDEO PROCESSING METHOD, VIDEO PROCESSING APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM

2y 5m to grant Granted Mar 10, 2026

18/410,942

Patent 12567423

SYSTEM AND METHODS FOR UPSAMPLING OF DECOMPRESSED SPEECH DATA USING A NEURAL NETWORK

2y 5m to grant Granted Mar 03, 2026

18/553,783

Patent 12567424

METHOD AND DEVICE FOR MULTI-CHANNEL COMFORT NOISE INJECTION IN A DECODED SOUND SIGNAL

2y 5m to grant Granted Mar 03, 2026

18/104,083

Patent 12561354

SYSTEMS AND METHODS FOR ITEM-SPECIFIC KEYWORD RECOMMENDATION

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

5-6

Expected OA Rounds

78%

Grant Probability

99%

With Interview (+36.0%)

2y 10m

Median Time to Grant

High

PTA Risk

Based on 928 resolved cases by this examiner. Grant probability derived from career allow rate.