Last updated: April 19, 2026
Application No. 18/646,375
ACTIVE VOICE LIVENESS DETECTION SYSTEM

Non-Final OA §102§103§112
Filed
Apr 25, 2024
Examiner
BECKER, TYLER JUSTIN
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Pindrop Security Inc.
OA Round
1 (Non-Final)
Interview Optional

— +19.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 19 resolved cases, 2023–2026
Examiner Intelligence

BECKER, TYLER JUSTIN View full profile →
Grants 74% — above average
Career Allow Rate
14 granted / 19 resolved
+11.7% vs TC avg
Strong +19% interview lift
Without
With
+19.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
22 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
23.1%
-16.9% vs TC avg
§103
45.4%
+5.4% vs TC avg
§102
14.9%
-25.1% vs TC avg
§112
16.7%
-23.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 19 resolved cases
Office Action

§102 §103 §112
DETAILED ACTION
This action is in response to the application filed on April 25th, 2024. Claims 1-20 are pending and have been examined.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities:
Line 17 of [0009] of the specification reads “provides an improved a speech sample” but should read “provides an improved speech sample”.
Line 5 of [0010] of the specification reads “according to preconfigure passphrases” but should read “according to preconfigured passphrases”.
Line 2 of [0016] of the specification reads “a computer comprising at least one processor, configured to: a computer comprising at least one processor, configured to:” but should read “a computer comprising at least one processor, configured to:”.
Line 7 of [0017] of the specification reads “a third set one or more features” but should read “a third set of one or more features”.
Line 7 of [0018] of the specification reads “a third set one or more features” but should read “a third set of one or more features”.
Line 16 of [0041] of the specification reads “suggesting the user move close the microphone” but should read “suggesting the user move close to the microphone”.
Line 10 of [0048] of the specification reads “104d” but should read “114d”.
Line 3 of [0067] of the specification reads “ease of description an understanding” but should read “ease of description and understanding”.
Line 2 of [0074] of the specification reads “models trained classify” but should read “models trained to classify”.
Line 3 of [0078] of the specification reads “classifying the inbound audio signal a fraudulent or genuine” but should read “classifying the inbound audio signal as fraudulent or genuine”.
Line 3 of [0080] of the specification reads “ease of description an understanding” but should read “ease of description and understanding”.
Line 10 of [0145] of the specification reads “there is not match” but should read “there is not a match”.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, 9, and 17 each recite the limitation "the one or more features" in the second limitation.  There is insufficient antecedent basis for this limitation in the claim.
By virtue of their respective dependencies, dependent claims 2-8, 10-16, and 18-20 are also rejected, as they inherit the indefiniteness issue and do not correct it.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-2, 4-6, 9-10, 12-14, 17-18, and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hennig et al. (US Pat. Pub. No. 2022/0328050 A1 hereinafter Hennig).
Regarding claim 1, Hennig discloses a computer-implemented method for detecting fraud in calls by repeated recordings, the method comprising: receiving, by a computer, an inbound audio signal from a user device associated with a caller containing a speech signal for one or more utterances of the caller (Hennig, Fig. 4, 402; [0082]: "At 402, voice data that can correspond to a user account can be received. The authenticator component can receive voice data (e.g., an audio signal comprising the voice data) that can correspond to the user account associated with a user. The voice data can represent a voice of an unverified (e.g., unidentified, unknown, or undetermined) user, which may or may not be the user attempting to authenticate with regard to the user account, or may be an adversarial (e.g., fraudulent or malicious) user attempting to improperly gain access to the user account of the user or services or products related to the user account."); extracting, by the computer, an inbound audioprint for the inbound audio signal using the one or more features extracted from the speech signal of the inbound audio signal (Hennig, Fig. 4, 404; [0083]: "At 404, the voice data can be analyzed to determine one or more characteristics of the voice data. The authenticator component can comprise a voice verification component that can analyze the voice data to determine the one or more characteristics of the voice data."); generating, by the computer, an audio replay score for the inbound audio signal indicating an audio recording recognition likelihood that the inbound audio signal matches a prior audio signal based upon a distance between the inbound audioprint and a prior audioprint for the prior audio signal (Hennig, Fig. 4, 410; [0086]: "At 410, in response to determining that the first similarity score is above the first threshold similarity score, the one or more characteristics of the voice data can be compared to one or more characteristics of a set of previously stored voice fingerprints that can correspond to the user account."); and identifying, by the computer, the inbound audio signal as a replayed recording or unrecognized recording based upon comparing the audio replay score against a replay detection threshold (Hennig, Fig. 4, 412-418; [0087]: "At 412, a second similarity score can be determined based at least in part on the comparing of the one or more characteristics of the voice data to the one or more characteristics of the set of previously stored voice fingerprints."; [0090]: "At 418, in response to determining that the second similarity score is above the second threshold similarity score, the voice data can be determined to be fraudulent. In some embodiments, in response to the voice verification component determining that the second similarity score is above the second threshold (e.g., threshold minimum) similarity score, the voice verification component can determine that the voice data is fraudulent, and can deny authentication of the unidentified user that presented the voice data in an attempt to access the user account. For instance, in response to the voice verification component determining that the second similarity score is above the second threshold similarity score, the voice verification component can determine that the voice data is too close of a match to a previously stored voice fingerprint of the set of previously stored voice fingerprints, which can thereby indicate that the voice data can be a replay of a recording of the voice of the user or can be an artificially generated voice that emulates the voice of the user.").

    PNG
    media_image1.png
    871
    571
    media_image1.png
    Greyscale

Hennig, Fig. 4 for reference
Regarding claim 2, the rejection of claim 1 is incorporated. Hennig discloses all of the elements of the current invention as stated above. Hennig further discloses obtaining, by the computer, a plurality of prior audio signals, each prior audio signal comprises a corresponding prior speech signal for one or more prior utterances; extracting, by the computer, a plurality of audioprints corresponding to the plurality of prior audio signals using the one or more features extracted for the prior speech signal; storing, by the computer, each prior audioprint into a database (Henig, [0029]: "With regard to each user, the authenticator component 102 can store a set of previous voice prints (e.g., 124) that can be representative of the voice of the user, as verified by the authenticator component 102. The authenticator component 102 can generate the set of previous voice prints (e.g., 124) associated with the user from voice information (e.g., audio signals comprising voice information) of the user obtained during previous interactions (e.g., previous authentication attempts or other interactions) between the user and the system 100 and/or associated service entity (e.g., service representative of or associated with the service entity), as more fully described herein.").

Regarding claim 4, the rejection of claim 1 is incorporated. Hennig discloses all of the elements of the current invention as stated above. Hennig further discloses further comprising, in response to identifying the inbound audio signal as an unrecognized recording, storing, by the computer, the inbound audioprint into a database as a next prior audioprint (Hennig, [0037]: "In some embodiments, even if the voice verification component 118 determines that received voice data is not a match to the designated voice print 130 (or previous voice print) associated with a user account, the voice verification component 118 can store the voice print associated with (e.g., generated from) the voice data in a file associated with the user in the voice print repository 122").

Regarding claim 5, the rejection of claim 1 is incorporated. Hennig discloses all of the elements of the current invention as stated above. Hennig further discloses further comprising, in response to identifying the inbound audio signal as a replayed recording, generating, by the computer, an alert notification indicating that the computer detected the replayed recording (Hennig, [0037]: "if the authenticator component 102 detects multiple failed authentication attempts associated with a user account 104 of a user, the authenticator component 102 can perform responsive or remedial actions to mitigate fraudulent or malicious access to the user account 104 of the user, wherein such responsive or remedial actions can comprise, for example, sending a notification (e.g., an alert) message to a service representative or the user to indicate that one or more potentially fraudulent or malicious attempts to access the user account 104 of the user have been detected").

Regarding claim 6, the rejection of claim 1 is incorporated. Hennig discloses all of the elements of the current invention as stated above. Hennig further discloses generating, by the computer, a message indicating that the computer has identified the inbound audio signal as a replayed recording or unrecognized recording; and transmitting, by the computer, a notification containing the message for presentation at a user interface associated with an agent-user (Hennig, [0037]: "if the authenticator component 102 detects multiple failed authentication attempts associated with a user account 104 of a user, the authenticator component 102 can perform responsive or remedial actions to mitigate fraudulent or malicious access to the user account 104 of the user, wherein such responsive or remedial actions can comprise, for example, sending a notification (e.g., an alert) message to a service representative or the user to indicate that one or more potentially fraudulent or malicious attempts to access the user account 104 of the user have been detected").

Regarding claim 9, Hennig discloses A system for detecting fraud in calls by repeated recordings, the system comprising: a computer comprising at least one processor (Hennig, [0077]: “The authenticator component 102 also can comprise a processor component 310 that can work in conjunction with the other components (e.g., voice verification component 118, data store 206, voice print generator component 216, and/or other component) to facilitate performing the various functions of the authenticator component 102.”), configured to: receive an inbound audio signal from a user device associated with a caller containing a speech signal for one or more utterances of the caller (Hennig, Fig. 4, 402; [0082]: "At 402, voice data that can correspond to a user account can be received. The authenticator component can receive voice data (e.g., an audio signal comprising the voice data) that can correspond to the user account associated with a user. The voice data can represent a voice of an unverified (e.g., unidentified, unknown, or undetermined) user, which may or may not be the user attempting to authenticate with regard to the user account, or may be an adversarial (e.g., fraudulent or malicious) user attempting to improperly gain access to the user account of the user or services or products related to the user account."); extract an inbound audioprint for the inbound audio signal using the one or more features extracted from the speech signal of the inbound audio signal (Hennig, Fig. 4, 404; [0083]: "At 404, the voice data can be analyzed to determine one or more characteristics of the voice data. The authenticator component can comprise a voice verification component that can analyze the voice data to determine the one or more characteristics of the voice data."); generate an audio replay score for the inbound audio signal indicating an audio recording recognition likelihood that the inbound audio signal matches a prior audio signal based upon a distance between the inbound audioprint and a prior audioprint for the prior audio signal (Hennig, Fig. 4, 410; [0086]: "At 410, in response to determining that the first similarity score is above the first threshold similarity score, the one or more characteristics of the voice data can be compared to one or more characteristics of a set of previously stored voice fingerprints that can correspond to the user account."); and identify the inbound audio signal as a replayed recording or unrecognized recording based upon comparing the audio replay score against a replay detection threshold (Hennig, Fig. 4, 412-418; [0087]: "At 412, a second similarity score can be determined based at least in part on the comparing of the one or more characteristics of the voice data to the one or more characteristics of the set of previously stored voice fingerprints."; [0090]: "At 418, in response to determining that the second similarity score is above the second threshold similarity score, the voice data can be determined to be fraudulent. In some embodiments, in response to the voice verification component determining that the second similarity score is above the second threshold (e.g., threshold minimum) similarity score, the voice verification component can determine that the voice data is fraudulent, and can deny authentication of the unidentified user that presented the voice data in an attempt to access the user account. For instance, in response to the voice verification component determining that the second similarity score is above the second threshold similarity score, the voice verification component can determine that the voice data is too close of a match to a previously stored voice fingerprint of the set of previously stored voice fingerprints, which can thereby indicate that the voice data can be a replay of a recording of the voice of the user or can be an artificially generated voice that emulates the voice of the user.").

Regarding claim 10, the rejection of claim 9 is incorporated. Hennig discloses all of the elements of the current invention as stated above. Hennig further discloses obtain a plurality of prior audio signals, each prior audio signal comprises a corresponding prior speech signal for one or more prior utterances; extract a plurality of audioprints corresponding to the plurality of prior audio signals using the one or more features extracted for the prior speech signal; store each prior audioprint into a database (Henig, [0029]: "With regard to each user, the authenticator component 102 can store a set of previous voice prints (e.g., 124) that can be representative of the voice of the user, as verified by the authenticator component 102. The authenticator component 102 can generate the set of previous voice prints (e.g., 124) associated with the user from voice information (e.g., audio signals comprising voice information) of the user obtained during previous interactions (e.g., previous authentication attempts or other interactions) between the user and the system 100 and/or associated service entity (e.g., service representative of or associated with the service entity), as more fully described herein.").

Regarding claim 12, the rejection of claim 9 is incorporated. Hennig discloses all of the elements of the current invention as stated above. Hennig further discloses wherein the computer is further configured to, in response to identifying the inbound audio signal as an unrecognized recording, storing, by the computer, the inbound audioprint into a database as a next prior audioprint (Hennig, [0037]: "In some embodiments, even if the voice verification component 118 determines that received voice data is not a match to the designated voice print 130 (or previous voice print) associated with a user account, the voice verification component 118 can store the voice print associated with (e.g., generated from) the voice data in a file associated with the user in the voice print repository 122").

Regarding claim 13, the rejection of claim 9 is incorporated. Hennig discloses all of the elements of the current invention as stated above. Hennig further discloses wherein the computer is further configured to, in response to identifying the inbound audio signal as a replayed recording, generating, by the computer, an alert notification indicating that the computer detected the replayed recording (Hennig, [0037]: "if the authenticator component 102 detects multiple failed authentication attempts associated with a user account 104 of a user, the authenticator component 102 can perform responsive or remedial actions to mitigate fraudulent or malicious access to the user account 104 of the user, wherein such responsive or remedial actions can comprise, for example, sending a notification (e.g., an alert) message to a service representative or the user to indicate that one or more potentially fraudulent or malicious attempts to access the user account 104 of the user have been detected").

Regarding claim 14, the rejection of claim 9 is incorporated. Hennig discloses all of the elements of the current invention as stated above. Hennig further discloses wherein the computer is further configured to: generate a message indicating that the computer has identified the inbound audio signal as a replayed recording or unrecognized recording; and transmit a notification containing the message for presentation at a user interface associated with an agent-user (Hennig, [0037]: "if the authenticator component 102 detects multiple failed authentication attempts associated with a user account 104 of a user, the authenticator component 102 can perform responsive or remedial actions to mitigate fraudulent or malicious access to the user account 104 of the user, wherein such responsive or remedial actions can comprise, for example, sending a notification (e.g., an alert) message to a service representative or the user to indicate that one or more potentially fraudulent or malicious attempts to access the user account 104 of the user have been detected").

Regarding claim 17, Hennig discloses a non-transitory computer-readable media configured to store machine-executable instructions that when executed by one or more processors cause the processors to (Hennig, [0123]: “These computer readable program instructions can be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.”): receive an inbound audio signal from a user device associated with a caller containing a speech signal for one or more utterances of the caller (Hennig, Fig. 4, 402; [0082]: "At 402, voice data that can correspond to a user account can be received. The authenticator component can receive voice data (e.g., an audio signal comprising the voice data) that can correspond to the user account associated with a user. The voice data can represent a voice of an unverified (e.g., unidentified, unknown, or undetermined) user, which may or may not be the user attempting to authenticate with regard to the user account, or may be an adversarial (e.g., fraudulent or malicious) user attempting to improperly gain access to the user account of the user or services or products related to the user account."); extract an inbound audioprint for the inbound audio signal using the one or more features extracted from the speech signal of the inbound audio signal (Hennig, Fig. 4, 404; [0083]: "At 404, the voice data can be analyzed to determine one or more characteristics of the voice data. The authenticator component can comprise a voice verification component that can analyze the voice data to determine the one or more characteristics of the voice data."); generate an audio replay score for the inbound audio signal indicating an audio recording recognition likelihood that the inbound audio signal matches a prior audio signal based upon a distance between the inbound audioprint and a prior audioprint for the prior audio signal (Hennig, Fig. 4, 410; [0086]: "At 410, in response to determining that the first similarity score is above the first threshold similarity score, the one or more characteristics of the voice data can be compared to one or more characteristics of a set of previously stored voice fingerprints that can correspond to the user account."); and identify the inbound audio signal as a replayed recording or unrecognized recording based upon comparing the audio replay score against a replay detection threshold (Hennig, Fig. 4, 412-418; [0087]: "At 412, a second similarity score can be determined based at least in part on the comparing of the one or more characteristics of the voice data to the one or more characteristics of the set of previously stored voice fingerprints."; [0090]: "At 418, in response to determining that the second similarity score is above the second threshold similarity score, the voice data can be determined to be fraudulent. In some embodiments, in response to the voice verification component determining that the second similarity score is above the second threshold (e.g., threshold minimum) similarity score, the voice verification component can determine that the voice data is fraudulent, and can deny authentication of the unidentified user that presented the voice data in an attempt to access the user account. For instance, in response to the voice verification component determining that the second similarity score is above the second threshold similarity score, the voice verification component can determine that the voice data is too close of a match to a previously stored voice fingerprint of the set of previously stored voice fingerprints, which can thereby indicate that the voice data can be a replay of a recording of the voice of the user or can be an artificially generated voice that emulates the voice of the user.").

Regarding claim 18, the rejection of claim 17 is incorporated. Hennig discloses all of the elements of the current invention as stated above. Hennig further discloses wherein the instructions further cause the one or more processors to: obtain a plurality of prior audio signals, each prior audio signal comprises a corresponding prior speech signal for one or more prior utterances; extract a plurality of audioprints corresponding to the plurality of prior audio signals using the one or more features extracted for the prior speech signal; and store each prior audioprint into a database (Henig, [0029]: "With regard to each user, the authenticator component 102 can store a set of previous voice prints (e.g., 124) that can be representative of the voice of the user, as verified by the authenticator component 102. The authenticator component 102 can generate the set of previous voice prints (e.g., 124) associated with the user from voice information (e.g., audio signals comprising voice information) of the user obtained during previous interactions (e.g., previous authentication attempts or other interactions) between the user and the system 100 and/or associated service entity (e.g., service representative of or associated with the service entity), as more fully described herein.").

Regarding claim 20, the rejection of claim 17 is incorporated. Hennig discloses all of the elements of the current invention as stated above. Hennig further discloses wherein the instructions further cause the one or more processors to, in response to identifying the inbound audio signal as an unrecognized recording, store the inbound audioprint into a database as a next prior audioprint (Hennig, [0037]: "In some embodiments, even if the voice verification component 118 determines that received voice data is not a match to the designated voice print 130 (or previous voice print) associated with a user account, the voice verification component 118 can store the voice print associated with (e.g., generated from) the voice data in a file associated with the user in the voice print repository 122").

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 3, 11, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hennig as applied to claims 1-2, 4-6, 9-10, 12-14, 17-18, and 20 above, and further in view of Wu et al. (US Pat. Pub. No. 2025/0078841 A1 hereinafter Wu).
Regarding claim 3, the rejection of claim 2 is incorporated. Hennig discloses all of the elements of the current invention as stated above. However, Hennig fails to expressly recite executing, by the computer, one or more data augmentation operations using the one or more features extracted from the prior audio signal to generate one or more simulated prior audio signals; and extracting, by the computer, the one or more features from each simulated prior audio signal of the one or more simulated prior audio signals, wherein the computer extracts the prior audioprint using the one or more features extracted from the prior audio signal and from the one or more simulated prior audio signals.
Wu teaches executing, by the computer, one or more data augmentation operations using the one or more features extracted from the prior audio signal to generate one or more simulated prior audio signals (Wu, [0021]: "a synthesized voice is generated based on the voice of the user on which voiceprint recognition is successful and is also used as the training data, so that a large amount of training data can be obtained in a short period of time, and efficiency of update of the voiceprint model can be improved."); and extracting, by the computer, the one or more features from each simulated prior audio signal of the one or more simulated prior audio signals (Wu, [0177]: "The voiceprint feature corresponding to the synthesized voice obtained based on the voice of the registered user is used as the training data, so that more training data can be obtained in a short time, which saves the time for training and updating the voiceprint model. "), wherein the computer extracts the prior audioprint using the one or more features extracted from the prior audio signal and from the one or more simulated prior audio signals (Wu, [0179]: "The electronic device may select part or all of the voice and/or the synthesized voice of the registered user and extract a voiceprint feature therefrom by using, but not limited to, a de-finetune and de-incremental training method, or another algorithm to update the preset voiceprint feature. A principle of the de-finetune and de-incremental training method involves: keeping some parameters in the preset voiceprint model unchanged, taking other parameters as adjustment parameters, and then inputting the voiceprint feature extracted from the voice and/or the synthesized voice of the registered user as training data into the preset voiceprint model for training.").
Hennig and Wu are analogous arts because they each belong to the same field of voice authentication. It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice biometrics system of Hennig to incorporate the teachings of Wu to generate simulated prior audio signals. Generating simulated prior audio signals allows for the creation of additional training data (Wu, [0021]). This ensures that the system’s machine learning models have a large amount of training data. 

Regarding claim 11, the rejection of claim 10 is incorporated. Hennig discloses all of the elements of the current invention as stated above. However, Hennig fails to expressly recite wherein the computer is further configured to: execute one or more data augmentation operations using the one or more features extracted from the prior audio signal to generate one or more simulated prior audio signals; and extract the one or more features from each simulated prior audio signal of the one or more simulated prior audio signals, wherein the computer extracts the prior audioprint using the one or more features extracted from the prior audio signal and from the one or more simulated prior audio signals.
Wu teaches wherein the computer is further configured to: execute one or more data augmentation operations using the one or more features extracted from the prior audio signal to generate one or more simulated prior audio signals (Wu, [0021]: "a synthesized voice is generated based on the voice of the user on which voiceprint recognition is successful and is also used as the training data, so that a large amount of training data can be obtained in a short period of time, and efficiency of update of the voiceprint model can be improved."); and extract the one or more features from each simulated prior audio signal of the one or more simulated prior audio signals (Wu, [0177]: "The voiceprint feature corresponding to the synthesized voice obtained based on the voice of the registered user is used as the training data, so that more training data can be obtained in a short time, which saves the time for training and updating the voiceprint model. "), wherein the computer extracts the prior audioprint using the one or more features extracted from the prior audio signal and from the one or more simulated prior audio signals (Wu, [0179]: "The electronic device may select part or all of the voice and/or the synthesized voice of the registered user and extract a voiceprint feature therefrom by using, but not limited to, a de-finetune and de-incremental training method, or another algorithm to update the preset voiceprint feature. A principle of the de-finetune and de-incremental training method involves: keeping some parameters in the preset voiceprint model unchanged, taking other parameters as adjustment parameters, and then inputting the voiceprint feature extracted from the voice and/or the synthesized voice of the registered user as training data into the preset voiceprint model for training.").
Hennig and Wu are analogous arts because they each belong to the same field of voice authentication. It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice biometrics system of Hennig to incorporate the teachings of Wu to generate simulated prior audio signals. Generating simulated prior audio signals allows for the creation of additional training data (Wu, [0021]). This ensures that the system’s machine learning models have a large amount of training data. 

Regarding claim 19, the rejection of claim 18 is incorporated. Hennig discloses all of the elements of the current invention as stated above. However, Hennig fails to expressly recite wherein the instructions further cause the one or more processors to: execute one or more data augmentation operations using the one or more features extracted from the prior audio signal to generate one or more simulated prior audio signals; and extract the one or more features from each simulated prior audio signal of the one or more simulated prior audio signals, wherein the computer extracts the prior audioprint using the one or more features extracted from the prior audio signal and from the one or more simulated prior audio signals.
Wu teaches wherein the instructions further cause the one or more processors to: execute one or more data augmentation operations using the one or more features extracted from the prior audio signal to generate one or more simulated prior audio signals (Wu, [0021]: "a synthesized voice is generated based on the voice of the user on which voiceprint recognition is successful and is also used as the training data, so that a large amount of training data can be obtained in a short period of time, and efficiency of update of the voiceprint model can be improved."); and extract the one or more features from each simulated prior audio signal of the one or more simulated prior audio signals (Wu, [0177]: "The voiceprint feature corresponding to the synthesized voice obtained based on the voice of the registered user is used as the training data, so that more training data can be obtained in a short time, which saves the time for training and updating the voiceprint model. "), wherein the computer extracts the prior audioprint using the one or more features extracted from the prior audio signal and from the one or more simulated prior audio signals (Wu, [0179]: "The electronic device may select part or all of the voice and/or the synthesized voice of the registered user and extract a voiceprint feature therefrom by using, but not limited to, a de-finetune and de-incremental training method, or another algorithm to update the preset voiceprint feature. A principle of the de-finetune and de-incremental training method involves: keeping some parameters in the preset voiceprint model unchanged, taking other parameters as adjustment parameters, and then inputting the voiceprint feature extracted from the voice and/or the synthesized voice of the registered user as training data into the preset voiceprint model for training.").
Hennig and Wu are analogous arts because they each belong to the same field of voice authentication. It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice biometrics system of Hennig to incorporate the teachings of Wu to generate simulated prior audio signals. Generating simulated prior audio signals allows for the creation of additional training data (Wu, [0021]). This ensures that the system’s machine learning models have a large amount of training data. 

Claim(s) 7 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hennig as applied to claims 1-2, 4-6, 9-10, 12-14, 17-18, and 20 above, and further in view of Aull et al. (US Pat. Pub. No. 2005/0240779 A1 hereinafter Aull).
Regarding claim 7, the rejection of claim 1 is incorporated. Hennig discloses all of the elements of the current invention as stated above. However, Hennig fails to expressly recite wherein the inbound audio signal is an enrollment audio signal received by the computer during an enrollment of the caller, and wherein the computer halts the enrollment for a preconfigured period of time in response to identifying that the inbound audio signal as the enrollment audio signal is a replayed recording.
Aull teaches wherein the inbound audio signal is an enrollment audio signal received by the computer during an enrollment of the caller (Aull, [0037]: "Referring to FIG. 2, the enrollment of the BIOTOKEN is initiated by power on, as shown in block 320."; [0038]: "In block 380, after receiving a matching unique serial number, the BIOTOKEN enables its biometric(s) reader(s), and takes a reading."; [0032]: "The liveness check algorithm applied is unique to the biometric(s) selected. The biometric may be a finger print, voice sample, retina scan, DNA sample, other biometric or any combination thereof."), and wherein the computer halts the enrollment for a preconfigured period of time in response to identifying that the inbound audio signal as the enrollment audio signal is a replayed recording (Aull, [0038]: "In block 360, the enrollment process computes a nonce. Here as previously, the purpose of the nonce is to prevent any replay attack."; [0040]: "The computed nonce is compared to the decrypted nonce received from the BIOTOKEN. If the nonce does not agree, there has been a replay attack or a communication failure. The enrollment for the particular serial number is aborted and the enrollment operator is informed."; Here, aborting the enrollment is seen as halting the enrollment for a predetermined amount of time, wherein the predetermined amount of time is indefinitely.).
Hennig and Aull are analogous arts because they each belong to the same field of biometric authentication. It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice biometrics system of Hennig to incorporate the teachings of Aull to halt the enrollment process in response to detecting a replay attack. This stops the process to ensure that no user is fraudulently enrolled (Aull, [0040]). This helps generally prevent fraud.

Regarding claim 15, the rejection of claim 9 is incorporated. Hennig discloses all of the elements of the current invention as stated above. However, Hennig fails to expressly recite wherein the inbound audio signal is an enrollment audio signal received by the computer during an enrollment of the caller, and wherein the computer halts the enrollment for a preconfigured period of time in response to identifying that the inbound audio signal as the enrollment audio signal is a replayed recording.
Aull teaches wherein the inbound audio signal is an enrollment audio signal received by the computer during an enrollment of the caller (Aull, [0037]: "Referring to FIG. 2, the enrollment of the BIOTOKEN is initiated by power on, as shown in block 320."; [0038]: "In block 380, after receiving a matching unique serial number, the BIOTOKEN enables its biometric(s) reader(s), and takes a reading."; [0032]: "The liveness check algorithm applied is unique to the biometric(s) selected. The biometric may be a finger print, voice sample, retina scan, DNA sample, other biometric or any combination thereof."), and wherein the computer halts the enrollment for a preconfigured period of time in response to identifying that the inbound audio signal as the enrollment audio signal is a replayed recording (Aull, [0038]: "In block 360, the enrollment process computes a nonce. Here as previously, the purpose of the nonce is to prevent any replay attack."; [0040]: "The computed nonce is compared to the decrypted nonce received from the BIOTOKEN. If the nonce does not agree, there has been a replay attack or a communication failure. The enrollment for the particular serial number is aborted and the enrollment operator is informed."; Here, aborting the enrollment is seen as halting the enrollment for a predetermined amount of time, wherein the predetermined amount of time is indefinitely.).
Hennig and Aull are analogous arts because they each belong to the same field of biometric authentication. It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice biometrics system of Hennig to incorporate the teachings of Aull to halt the enrollment process in response to detecting a replay attack. This stops the process to ensure that no user is fraudulently enrolled (Aull, [0040]). This helps generally prevent fraud.

Claim(s) 8 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hennig as applied to claims 1-2, 4-6, 9-10, 12-14, 17-18, and 20 above, and further in view of Sivaraman et al. (US Pat. Pub. No. 2022/0084509 A1 hereinafter Sivaraman).
Regarding claim 8, the rejection of claim 1 is incorporated. Hennig discloses all of the elements of the current invention as stated above. However, Hennig fails to expressly recite detecting, by the computer, the one or more utterances occurring in one or more speech portions of the input audio signal; and generating, by the computer, the speech signal comprising the one or more speech portions of the inbound audio signal, wherein the one or more speech portions are filtered away from a plurality of non-speech portions of the input audio signal.
Sivaraman teaches detecting, by the computer, the one or more utterances occurring in one or more speech portions of the input audio signal; and generating, by the computer, the speech signal comprising the one or more speech portions of the inbound audio signal, wherein the one or more speech portions are filtered away from a plurality of non-speech portions of the input audio signal (Sivaraman, [0026]: "Described herein are systems and methods for processing various types of data associated with inbound calls, including audio signals containing a mixture of one or more speaker signals or utterances, protocol metadata, and caller inputs, to generate an enhanced audio signal in which utterances of interfering speakers and noise are suppressed compared to the input audio signal received by the system.").
Hennig and Sivaraman are analogous arts because they each belong to the same field of speech processing. It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice biometrics system of Hennig to incorporate the teachings of Sivaraman to detect speech and filter noise in an audio signal. Detecting speech and filtering noise enhances the audio signal for further processing (Sivaraman, [0013]). This ensures that an audio signal is clear and usable in other parts of the system.

Regarding claim 16, the rejection of claim 9 is incorporated. Hennig discloses all of the elements of the current invention as stated above. However, Hennig fails to expressly recite wherein the computer is further configured to: detect the one or more utterances occurring in one or more speech portions of the input audio signal; generate the speech signal comprising the one or more speech portions of the inbound audio signal, wherein the one or more speech portions are filtered away from a plurality of non-speech portions of the input audio signal.
Sivaraman teaches wherein the computer is further configured to: detect the one or more utterances occurring in one or more speech portions of the input audio signal; generate the speech signal comprising the one or more speech portions of the inbound audio signal, wherein the one or more speech portions are filtered away from a plurality of non-speech portions of the input audio signal (Sivaraman, [0026]: "Described herein are systems and methods for processing various types of data associated with inbound calls, including audio signals containing a mixture of one or more speaker signals or utterances, protocol metadata, and caller inputs, to generate an enhanced audio signal in which utterances of interfering speakers and noise are suppressed compared to the input audio signal received by the system.").
Hennig and Sivaraman are analogous arts because they each belong to the same field of speech processing. It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice biometrics system of Hennig to incorporate the teachings of Sivaraman to detect speech and filter noise in an audio signal. Detecting speech and filtering noise enhances the audio signal for further processing (Sivaraman, [0013]). This ensures that an audio signal is clear and usable in other parts of the system.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Lesso, John Paul (US Pat. No. 12,026,241 B2) discloses a system for detecting replay attacks.
Aley-Raz et al. (US Pat. Pub. No. 2010/0131273 A1) discloses a system for liveness detection utilizing voice biometrics.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TYLER J BECKER whose telephone number is (703)756-1271. The examiner can normally be reached M-Th, 7:15am-5:45pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TYLER BECKER/              Examiner, Art Unit 2657                                                                                                                                                                                          

/DANIEL C WASHBURN/               Supervisory Patent Examiner, Art Unit 2657
Read full office action
Prosecution Timeline

Apr 25, 2024
Application Filed
Jan 21, 2026
Non-Final Rejection — §102, §103, §112
Apr 07, 2026
Interview Requested
Apr 15, 2026
Examiner Interview Summary
Apr 15, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

18/484,927
Patent 12597433
SPEECH SIGNAL ENHANCEMENT METHOD AND APPARATUS, AND ELECTRONIC DEVICE
2y 5m to grant Granted Apr 07, 2026
18/334,771
Patent 12585893
Full Media Translator
2y 5m to grant Granted Mar 24, 2026
17/692,070
Patent 12518777
SYSTEMS AND METHODS FOR AUTHENTICATION USING SOUND-BASED VOCALIZATION ANALYSIS
2y 5m to grant Granted Jan 06, 2026
18/110,990
Patent 12499869
SOUND SYNTHESIS METHOD, SOUND SYNTHESIS APPARATUS, AND RECORDING MEDIUM STORING INSTRUCTIONS TO PERFORM SOUND SYNTHESIS METHOD
2y 5m to grant Granted Dec 16, 2025
18/117,304
Patent 12499311
Language Model Preprocessing with Weighted N-grams
2y 5m to grant Granted Dec 16, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
93%
With Interview (+19.0%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 19 resolved cases by this examiner. Grant probability derived from career allow rate.