Last updated: April 19, 2026

Application No. 18/139,263

SPEECH NOISE REDUCTION MODEL TRAINING METHOD AND APPARATUS, SPEECH SCORING METHOD AND APPARATUS, DEVICE, STORAGE MEDIUM, AND PROGRAM PRODUCT

Non-Final OA §103

Filed

Apr 25, 2023

Examiner

ISLAM, MOHAMMAD K

Art Unit

2653

Tech Center

2600 — Communications

Assignee

Tencent Technology(Shenzhen) Company Limited

OA Round

3 (Non-Final)

Interview Optional

— +16.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 1288 resolved cases, 2023–2026

Examiner Intelligence

ISLAM, MOHAMMAD K View full profile →

Grants 83% — above average

Career Allow Rate

1070 granted / 1288 resolved

+21.1% vs TC avg

Strong +16% interview lift

Without

With

+16.5%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

83 currently pending

Career history

1371

Total Applications

across all art units

Statute-Specific Performance

§101

21.4%

-18.6% vs TC avg

§103

32.6%

-7.4% vs TC avg

§102

25.0%

-15.0% vs TC avg

§112

14.6%

-25.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1288 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. CN202111025632.X, filed on 09/02/2021.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/08/2023, 03/25/2024, and 03/25/2025 is considered by the examiner.
Drawings
The drawing submitted on 04/25/2023 is considered by the examiner.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/23/2026 has been entered.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 8, and 15, are rejected under 35 U.S.C. 103 as being unpatentable over Caldwell (US 2019/0311722 A1) in view of Streit (US 2020/0044852 A1).

Regarding Claim 1, Caldwell teaches: A speech scoring method performed by an electronic device, comprising ([0009] In one embodiment, a user instructing a voice interaction device to authorize an electronic payment is requested to authentication his or her identity using voice biometrics. The user may access the application on a smart phone or other user device and read the pass phrase presented to the user.): receiving speech information (input audio signal) and associated reference speech text (sample phrases or pass phrase that appears on the display screen for user to read), wherein the reference speech text has a corresponding reference pronunciation (stored voice feature characteristics or voiceprint or voice sample) and the speech information represents an audio signal of a person (user's stored biometric voiceprint or  known user voice feature characteristics or User voice feature characteristics) reading the associated reference speech text ([0051] During the enrollment process, the user 302 may provide identification, user account name, password, and other account information.  In one embodiment, sample phrases are displayed on the user device 330, the user 302 repeats that phrases that appear and voice samples 306 are received by a microphone 312 of the voice authentication system 300. User voice feature characteristics are extracted from the received voice samples and stored in a memory or database 308. [0052] When the user 302 needs authentication (e.g., to access an electronic device, to initiate an online payment) the user 302 can run a voice authentication application on the user device 330. The user device 330 repeatedly plays an acoustic tone, series of chirps or other audio information that are unique for the acoustic code. At the same time, the user device 330 prompts the user 302 to read text that appears on the screen of the user device 330. [0053] The audible speech from the user 302 (voice 306) and the acoustic code 316 output from a speaker of the user device 330 are both received by the microphone 312 which generates an input audio signal 320.); performing noise reduction processing on the speech information based on a speech noise reduction model to obtain noise-reduced speech information ([0024] The audio input processing components 114 may estimate a direction and/or location of the user 102 based on the audio input signals received by the audio sensing component(s) 112 (e.g., an array of microphones) and process the audio input signals to enhance the target audio and suppress noise based on the estimated location. [0040] In various embodiments, the digital signal processor 224 may be operable to perform echo cancellation, noise cancellation, target signal enhancement, post-filtering, and other audio signal processing functions.); performing speech recognition on the noise-reduced speech information to recognize text in the noise-reduced speech information and acoustic features(voice feature characteristics or voiceprint) associated with the speech information ([0059] In step 410, a speech recognition engine identifies the pass phrase from the audio input signal(s), including a beginning frame and an ending frame. Next, feature vectors for the pass phrase are determined (step 412) and compared against feature vectors for the user's stored biometric voiceprint which was created during a training sequence to determine a degree of match (i.e. confidence score) (step 414). ); and predicting (generate a confidence score indicating a likelihood match voice feature characteristics of the speech with known user voice feature characteristics) a pronunciation score (a confidence score indicating the strength of the match) for indicating pronunciation similarity (match) between the speech information and  the reference pronunciation corresponding to the reference speech text based on the recognized text and the acoustic features ([0005] Voice processing components identify speech in the audio input signal, match voice feature characteristics of the speech with known user voice feature characteristics, and generate a confidence score indicating a likelihood that the user has been authenticated. [0009] The user may access the application on a smart phone or other user device and read the pass phrase presented to the user. [0052] At the same time, the user device 330 prompts the user 302 to read text that appears on the screen of the user device 330. [0054] The speech subbands 324 are passed to a speech recognition engine 340 to identify the presence of the voiced sample phrase, including identifying a speech segment comprising a sequence of audio frames. [0055] Voice authentication 350 is also performed by extracting feature characteristics from the pass phrase received from the speech subbands of the input audio signal and comparing the extracted feature characteristics with the user's voice features stored during enrollment and training. The output of the voice authentication 350 is a confidence score indicating the strength of the match.).
Caldwell does not explicitly teach the underlines limitation: predicting a pronunciation score for indicating pronunciation similarity between the speech information and the reference pronunciation corresponding to the reference speech text based on (i) content differences between the recognized text and the associated reference speech text and (ii) pronunciation differences between the acoustic features of the audio signal and the reference pronunciation.
Streit teaches: authentication and verification of liveness of a person by pronunciation similarity between the speech information and the reference pronunciation corresponding to the reference speech text based on (i) content differences between the recognized text and the associated reference speech text and (ii) pronunciation differences between the acoustic features of the audio signal and the reference pronunciation (identification on a private voice biometric) ([0121] In one example (Row 8), the random biometric instances include a set of random words selected for liveness validation in conjunction with voice based identification. [0122] According to one embodiment, an authentication system, assesses liveness by asking the user to read a few random words. For example, an authentication system can concurrently or simultaneously process the received voice signal through two algorithms (e.g., liveness algorithm and identity algorithm (e.g., by executing 904 of process 900), returning a result in less than one second. The first algorithm (e.g., liveness) performs a text to speech function to compare the pronounced text to the requested text (e.g., random words) to verify that the words were read correctly, and the second algorithm uses a prediction function (e.g., a prediction application programming interface (API)) to perform a one-to-many (1:N) identification on a private voice biometric to ensure that the input correctly identifies the expected person. [0124] In conjunction with liveness, the system compares the random text voice input and performs an identity assertion on the same input to ensure the voice that spoke the random words matches the user's identity. ).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Caldwell to include the teaching of Streit above in order to improve security and convenience in conjunction with verification of  liveness - a text to speech function to compare the pronounced text to the requested text random text,  performs an identity assertion on the same voice input to ensure the voice that spoke the random words matches the user's identity.

Regarding Claim 8,Caldwell  teaches:  An electronic device, comprising: a memory, configured to store a computer executable instruction; and a processor, configured to execute the computer executable instruction stored in the memory and cause the electronic device to perform a speech scoring method including ([0040] The audio signal processor 220 includes the audio input circuitry 222, an optional digital signal processor 224, and audio output circuitry 226. In various embodiments the audio signal processor 220 may be implemented as an integrated circuit comprising analog circuitry, digital circuitry and the digital signal processor 224, which is operable to execute program instructions stored in firmware. The audio input circuitry 222, for example, may include an interface to the audio sensor component 205, anti-aliasing filters, analog-to-digital converter circuitry, echo cancellation circuitry, and other audio processing circuitry and components as disclosed herein. The digital signal processor 224 is operable to process a multichannel digital audio signal to generate an enhanced target audio signal, which is output to one or more of the host system components 250. In various embodiments, the digital signal processor 224 may be operable to perform echo cancellation, noise cancellation, target signal enhancement, post-filtering, and other audio signal processing functions. In some embodiments, the host system components 250 are operable to enter into a low power mode (e.g., a sleep mode) during periods of inactivity, and the audio signal processor 220 is operable to listen for a trigger word and wake up one or more of the host system components 250 when the trigger word is detected.): receiving speech information and associated reference speech text; performing noise reduction processing on the speech information based on a speech noise reduction model to obtain noise-reduced speech information; performing speech recognition on the noise-reduced speech information to recognize text in the noise-reduced speech information and acoustic features associated with the speech information; and predicting a pronunciation score for indicating pronunciation similarity between the speech information and the reference pronunciation corresponding to the reference speech text based on (i) content differences between the recognized text and the associated reference speech text and (ii) pronunciation differences between the acoustic features of the audio signal and the reference pronunciation (See rejection of claim 1 ).

Regarding claim 15, Caldwell teaches: A non-transitory computer-readable storage medium, storing a computer executable instruction that, when executed by a processor of an electronic device, causes the electronic device to perform a speech scoring method including([0040] The audio signal processor 220 includes the audio input circuitry 222, an optional digital signal processor 224, and audio output circuitry 226. In various embodiments the audio signal processor 220 may be implemented as an integrated circuit comprising analog circuitry, digital circuitry and the digital signal processor 224, which is operable to execute program instructions stored in firmware. The audio input circuitry 222, for example, may include an interface to the audio sensor component 205, anti-aliasing filters, analog-to-digital converter circuitry, echo cancellation circuitry, and other audio processing circuitry and components as disclosed herein. The digital signal processor 224 is operable to process a multichannel digital audio signal to generate an enhanced target audio signal, which is output to one or more of the host system components 250. In various embodiments, the digital signal processor 224 may be operable to perform echo cancellation, noise cancellation, target signal enhancement, post-filtering, and other audio signal processing functions. In some embodiments, the host system components 250 are operable to enter into a low power mode (e.g., a sleep mode) during periods of inactivity, and the audio signal processor 220 is operable to listen for a trigger word and wake up one or more of the host system components 250 when the trigger word is detected.): receiving speech information and associated reference speech text; performing noise reduction processing on the speech information based on a speech noise reduction model to obtain noise-reduced speech information; performing speech recognition on the noise-reduced speech information to recognize text in the noise-reduced speech information and acoustic features associated with the speech information; and predicting a pronunciation score for indicating pronunciation similarity between the speech information and the reference pronunciation corresponding to the reference speech text based on (i) content differences between the recognized text and the associated reference speech text and (ii) pronunciation differences between the acoustic features of the audio signal and the reference pronunciation (See rejection of claim 1).
Allowable Subject Matter
Claims 2-7, 9-14 and 16-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is an examiner’s statement of reasons for allowance: The prior art of records alone or in combination failed to teach for claims 2-7, 9-14, and 16-20, for at least the limitation of claims 2, 9, and 16.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of  record Hennig teaches: (US 20220328050 A1) ADVERSARIALLY ROBUST VOICE BIOMETRICS, SECURE RECOGNITION, AND IDENTIFICATION 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras Shah can be reached at 571-270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2653

Read full office action

Prosecution Timeline

Apr 25, 2023

Application Filed

Jul 09, 2025

Non-Final Rejection — §103

Oct 10, 2025

Response Filed

Oct 22, 2025

Final Rejection — §103

Dec 15, 2025

Interview Requested

Dec 19, 2025

Examiner Interview Summary

Dec 19, 2025

Applicant Interview (Telephonic)

Dec 23, 2025

Response after Non-Final Action

Jan 23, 2026

Request for Continued Examination

Jan 29, 2026

Non-Final Rejection — §103

Jan 29, 2026

Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

18/028,597

Patent 12601849

SYSTEMS AND METHODS FOR PLANNING SEISMIC DATA ACQUISITION WITH REDUCED ENVIRONMENTAL IMPACT

2y 5m to grant Granted Apr 14, 2026

18/181,087

Patent 12596361

FAILURE DIAGNOSIS METHOD, METHOD OF MANUFACTURING DISK DEVICE, AND RECORDING MEDIUM

2y 5m to grant Granted Apr 07, 2026

18/592,408

Patent 12596872

HOLISTIC EMBEDDING GENERATION FOR ENTITY MATCHING

2y 5m to grant Granted Apr 07, 2026

18/634,768

Patent 12596868

CREATING A DIGITAL ASSISTANT

2y 5m to grant Granted Apr 07, 2026

18/706,798

Patent 12597434

CONTROL OF SPEECH PRESERVATION IN SPEECH ENHANCEMENT

2y 5m to grant Granted Apr 07, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

83%

Grant Probability

99%

With Interview (+16.5%)

2y 9m

Median Time to Grant

High

PTA Risk

Based on 1288 resolved cases by this examiner. Grant probability derived from career allow rate.

SPEECH NOISE REDUCTION MODEL TRAINING METHOD AND APPARATUS, SPEECH SCORING METHOD AND APPARATUS, DEVICE, STORAGE MEDIUM, AND PROGRAM PRODUCT

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email