Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. CN202111025632.X, filed on 09/02/2021.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/08/2023, 03/25/2024, and 03/25/2025 is considered by the examiner.
Drawings
The drawing submitted on 04/25/2023 is considered by the examiner.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/23/2026 has been entered.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 8, and 15, are rejected under 35 U.S.C. 103 as being unpatentable over Caldwell (US 2019/0311722 A1) in view of Streit (US 2020/0044852 A1).
Regarding Claim 1, Caldwell teaches: A speech scoring method performed by an electronic device, comprising ([0009] In one embodiment, a user instructing a voice interaction device to authorize an electronic payment is requested to authentication his or her identity using voice biometrics. The user may access the application on a smart phone or other user device and read the pass phrase presented to the user.): receiving speech information (input audio signal) and associated reference speech text (sample phrases or pass phrase that appears on the display screen for user to read), wherein the reference speech text has a corresponding reference pronunciation (stored voice feature characteristics or voiceprint or voice sample) and the speech information represents an audio signal of a person (user's stored biometric voiceprint or known user voice feature characteristics or User voice feature characteristics) reading the associated reference speech text ([0051] During the enrollment process, the user 302 may provide identification, user account name, password, and other account information. In one embodiment, sample phrases are displayed on the user device 330, the user 302 repeats that phrases that appear and voice samples 306 are received by a microphone 312 of the voice authentication system 300. User voice feature characteristics are extracted from the received voice samples and stored in a memory or database 308. [0052] When the user 302 needs authentication (e.g., to access an electronic device, to initiate an online payment) the user 302 can run a voice authentication application on the user device 330. The user device 330 repeatedly plays an acoustic tone, series of chirps or other audio information that are unique for the acoustic code. At the same time, the user device 330 prompts the user 302 to read text that appears on the screen of the user device 330. [0053] The audible speech from the user 302 (voice 306) and the acoustic code 316 output from a speaker of the user device 330 are both received by the microphone 312 which generates an input audio signal 320.); performing noise reduction processing on the speech information based on a speech noise reduction model to obtain noise-reduced speech information ([0024] The audio input processing components 114 may estimate a direction and/or location of the user 102 based on the audio input signals received by the audio sensing component(s) 112 (e.g., an array of microphones) and process the audio input signals to enhance the target audio and suppress noise based on the estimated location. [0040] In various embodiments, the digital signal processor 224 may be operable to perform echo cancellation, noise cancellation, target signal enhancement, post-filtering, and other audio signal processing functions.); performing speech recognition on the noise-reduced speech information to recognize text in the noise-reduced speech information and acoustic features(voice feature characteristics or voiceprint) associated with the speech information ([0059] In step 410, a speech recognition engine identifies the pass phrase from the audio input signal(s), including a beginning frame and an ending frame. Next, feature vectors for the pass phrase are determined (step 412) and compared against feature vectors for the user's stored biometric voiceprint which was created during a training sequence to determine a degree of match (i.e. confidence score) (step 414). ); and predicting (generate a confidence score indicating a likelihood match voice feature characteristics of the speech with known user voice feature characteristics) a pronunciation score (a confidence score indicating the strength of the match) for indicating pronunciation similarity (match) between the speech information and the reference pronunciation corresponding to the reference speech text based on the recognized text and the acoustic features ([0005] Voice processing components identify speech in the audio input signal, match voice feature characteristics of the speech with known user voice feature characteristics, and generate a confidence score indicating a likelihood that the user has been authenticated. [0009] The user may access the application on a smart phone or other user device and read the pass phrase presented to the user. [0052] At the same time, the user device 330 prompts the user 302 to read text that appears on the screen of the user device 330. [0054] The speech subbands 324 are passed to a speech recognition engine 340 to identify the presence of the voiced sample phrase, including identifying a speech segment comprising a sequence of audio frames. [0055] Voice authentication 350 is also performed by extracting feature characteristics from the pass phrase received from the speech subbands of the input audio signal and comparing the extracted feature characteristics with the user's voice features stored during enrollment and training. The output of the voice authentication 350 is a confidence score indicating the strength of the match.).
Caldwell does not explicitly teach the underlines limitation: predicting a pronunciation score for indicating pronunciation similarity between the speech information and the reference pronunciation corresponding to the reference speech text based on (i) content differences between the recognized text and the associated reference speech text and (ii) pronunciation differences between the acoustic features of the audio signal and the reference pronunciation.
Streit teaches: authentication and verification of liveness of a person by pronunciation similarity between the speech information and the reference pronunciation corresponding to the reference speech text based on (i) content differences between the recognized text and the associated reference speech text and (ii) pronunciation differences between the acoustic features of the audio signal and the reference pronunciation (identification on a private voice biometric) ([0121] In one example (Row 8), the random biometric instances include a set of random words selected for liveness validation in conjunction with voice based identification. [0122] According to one embodiment, an authentication system, assesses liveness by asking the user to read a few random words. For example, an authentication system can concurrently or simultaneously process the received voice signal through two algorithms (e.g., liveness algorithm and identity algorithm (e.g., by executing 904 of process 900), returning a result in less than one second. The first algorithm (e.g., liveness) performs a text to speech function to compare the pronounced text to the requested text (e.g., random words) to verify that the words were read correctly, and the second algorithm uses a prediction function (e.g., a prediction application programming interface (API)) to perform a one-to-many (1:N) identification on a private voice biometric to ensure that the input correctly identifies the expected person. [0124] In conjunction with liveness, the system compares the random text voice input and performs an identity assertion on the same input to ensure the voice that spoke the random words matches the user's identity. ).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Caldwell to include the teaching of Streit above in order to improve security and convenience in conjunction with verification of liveness - a text to speech function to compare the pronounced text to the requested text random text, performs an identity assertion on the same voice input to ensure the voice that spoke the random words matches the user's identity.
Regarding Claim 8,Caldwell teaches: An electronic device, comprising: a memory, configured to store a computer executable instruction; and a processor, configured to execute the computer executable instruction stored in the memory and cause the electronic device to perform a speech scoring method including ([0040] The audio signal processor 220 includes the audio input circuitry 222, an optional digital signal processor 224, and audio output circuitry 226. In various embodiments the audio signal processor 220 may be implemented as an integrated circuit comprising analog circuitry, digital circuitry and the digital signal processor 224, which is operable to execute program instructions stored in firmware. The audio input circuitry 222, for example, may include an interface to the audio sensor component 205, anti-aliasing filters, analog-to-digital converter circuitry, echo cancellation circuitry, and other audio processing circuitry and components as disclosed herein. The digital signal processor 224 is operable to process a multichannel digital audio signal to generate an enhanced target audio signal, which is output to one or more of the host system components 250. In various embodiments, the digital signal processor 224 may be operable to perform echo cancellation, noise cancellation, target signal enhancement, post-filtering, and other audio signal processing functions. In some embodiments, the host system components 250 are operable to enter into a low power mode (e.g., a sleep mode) during periods of inactivity, and the audio signal processor 220 is operable to listen for a trigger word and wake up one or more of the host system components 250 when the trigger word is detected.): receiving speech information and associated reference speech text; performing noise reduction processing on the speech information based on a speech noise reduction model to obtain noise-reduced speech information; performing speech recognition on the noise-reduced speech information to recognize text in the noise-reduced speech information and acoustic features associated with the speech information; and predicting a pronunciation score for indicating pronunciation similarity between the speech information and the reference pronunciation corresponding to the reference speech text based on (i) content differences between the recognized text and the associated reference speech text and (ii) pronunciation differences between the acoustic features of the audio signal and the reference pronunciation (See rejection of claim 1 ).
Regarding claim 15, Caldwell teaches: A non-transitory computer-readable storage medium, storing a computer executable instruction that, when executed by a processor of an electronic device, causes the electronic device to perform a speech scoring method including([0040] The audio signal processor 220 includes the audio input circuitry 222, an optional digital signal processor 224, and audio output circuitry 226. In various embodiments the audio signal processor 220 may be implemented as an integrated circuit comprising analog circuitry, digital circuitry and the digital signal processor 224, which is operable to execute program instructions stored in firmware. The audio input circuitry 222, for example, may include an interface to the audio sensor component 205, anti-aliasing filters, analog-to-digital converter circuitry, echo cancellation circuitry, and other audio processing circuitry and components as disclosed herein. The digital signal processor 224 is operable to process a multichannel digital audio signal to generate an enhanced target audio signal, which is output to one or more of the host system components 250. In various embodiments, the digital signal processor 224 may be operable to perform echo cancellation, noise cancellation, target signal enhancement, post-filtering, and other audio signal processing functions. In some embodiments, the host system components 250 are operable to enter into a low power mode (e.g., a sleep mode) during periods of inactivity, and the audio signal processor 220 is operable to listen for a trigger word and wake up one or more of the host system components 250 when the trigger word is detected.): receiving speech information and associated reference speech text; performing noise reduction processing on the speech information based on a speech noise reduction model to obtain noise-reduced speech information; performing speech recognition on the noise-reduced speech information to recognize text in the noise-reduced speech information and acoustic features associated with the speech information; and predicting a pronunciation score for indicating pronunciation similarity between the speech information and the reference pronunciation corresponding to the reference speech text based on (i) content differences between the recognized text and the associated reference speech text and (ii) pronunciation differences between the acoustic features of the audio signal and the reference pronunciation (See rejection of claim 1).
Allowable Subject Matter
Claims 2-7, 9-14 and 16-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is an examiner’s statement of reasons for allowance: The prior art of records alone or in combination failed to teach for claims 2-7, 9-14, and 16-20, for at least the limitation of claims 2, 9, and 16.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Hennig teaches: (US 20220328050 A1) ADVERSARIALLY ROBUST VOICE BIOMETRICS, SECURE RECOGNITION, AND IDENTIFICATION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras Shah can be reached at 571-270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2653