Last updated: April 19, 2026

Application No. 18/646,493

ACTIVE VOICE LIVENESS DETECTION SYSTEM

Non-Final OA §102§103§112

Filed

Apr 25, 2024

Examiner

SHAIKH, ZEESHAN MAHMOOD

Art Unit

2658

Tech Center

2600 — Communications

Assignee

Pindrop Security Inc.

OA Round

1 (Non-Final)

Interview Optional

— +55.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 31 resolved cases, 2023–2026

Examiner Intelligence

SHAIKH, ZEESHAN MAHMOOD View full profile →

Grants 52% of resolved cases

Career Allow Rate

16 granted / 31 resolved

-10.4% vs TC avg

Strong +55% interview lift

Without

With

+55.0%

Interview Lift

resolved cases with interview

Typical timeline

3y 2m

Avg Prosecution

32 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

25.7%

-14.3% vs TC avg

§103

45.8%

+5.8% vs TC avg

§102

17.3%

-22.7% vs TC avg

§112

5.8%

-34.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 31 resolved cases

Office Action

§102 §103 §112

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/10/2024, 11/7/2024, 7/8/2025, 7/30/2025, 8/13/2025, 8/26/2025, 10/27/2025, 11/18/2025, and 12/24/2025 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claims 12-15 and 17-19 are objected to because of the following informalities:
The claims listed above have are dependent on the incorrect independent claim.  Claim 12-13 should be dependent on claim 11.  Next, claims 14-15 should be dependent on claim 13.  Lastly, claims 17-19 should be dependent on claim 11.  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 3-6 and 13-16  are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  
Claims 3 and 13 recites, “determining, by the computer, a plurality of audio quality parameters for the input audio signal, including the magnitude value and at least one of a net speech value, SNR, DRR, T60, or C50, wherein the passive liveness detector is trained to calibrate the liveness score further using the plurality of audio quality parameters”.  The acronyms should be spelled out in word format before using their abbreviated form.  
Dependent claims 4-6 and 14-16 are also rejected under 35 U.S.C. 112(b) since they incorporate the deficiencies of the claim(s) upon which they depend.  

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-2, 7-10, 11-12, and 17-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chen et al. US 20210233541 A1(hereinafter Chen).

Regarding independent claim 1 and 11, Chen teaches a computer-implemented method for generating liveness scores for detecting fraud occurring in calls / a system for generating liveness scores for detecting fraud occurring in calls, comprising:   
obtaining, by a computer, an input audio signal including one or more speech signals representing one or more utterances of a speaker (FIG. 3, 302, [0068] “The server may receive or request clean audio signals from one or more speech corpora databases. The clean audio signals may include speech originating from any number speakers”);
extracting, by the computer, a fakeprint for the input audio signal using one or more fraud artifact features extracted from the input audio signal (FIG. 4, 404, [0076] “In step 404, the server applies the neural network architecture to each enrollee audio signal to extract the enrollee spoofprint.  The neural network architecture generates spoofprint feature vectors for the enrollee audio signals using the relevant set of extracted features.” [0009] “The feature vectors generated when extracting the spoofprint are based on a set of features including various audio spoof characteristics indicating spoofing artifacts”);
determining, by the computer, a magnitude value for the fakeprint based upon a vector length of the faceprint ([0045] “The neural network generates the inbound embeddings (e.g., spoofprint, voiceprint, combined embedding) for the caller and then determines one or more similarity scores indicating the distances between these feature vectors and the corresponding enrollee feature vectors”);
executing, by the computer, a passive liveness detector having one or more layers of a machine-learning architecture ([0009] “Embodiments described herein provide for systems and methods for implementing a neural network architecture for spoof detection in audio signals. The neural network architecture contains one or more layers…”) to generate a liveness score for the input audio signal ([0010] “generating, by the computer, a spoof likelihood score for the inbound audio signal based upon one or more similarities between the inbound spoofprint and the enrollee spoofprint.”), the passive liveness detector trained to determine the liveness score taking the fakeprint as an input (FIG. 6, 602, [0091] “the server feeds the training audio signals 602 into the input layers 601, where the training audio signals may include any number of genuine and spoofed or false audio signals”;) and calibrate the liveness score using the magnitude value ([0065] “The pre-processing operations may also include parsing the audio signals into frames or sub-frames, and performing various normalization or scaling operations”).

	Regarding claims 2 and 12, Chen teaches all of the limitations of claim 1, upon which claim 2 and 12 depend.  
	Additionally, Chen teaches determining, by the computer, a net speech value for the speaker indicating an aggregate amount of each utterance obtained for the speaker ([0009] “The feature vectors generated when extracting the voiceprint are based on a set of features reflecting the speaker's voice. The feature vectors generated when extracting the spoofprint are based on a set of features including various audio spoof characteristics indicating spoofing artifacts, such as specific aspects of how the speaker speaks”),
	wherein the passive liveness detector is trained to calibrate the liveness score further using the net speech value ([0065] “The pre-processing operations may also include parsing the audio signals into frames or sub-frames, and performing various normalization or scaling operations”).

	Regarding claims 7 and 17, Chen teaches all of the limitations of claim 1, upon which claims 7 and 17 depend.  
	Additionally, Chen teaches wherein the input audio signal includes an inbound audio signal received from a user device that originated an inbound call during a deployment phase ([0036] “The neural network architecture operates logically in several operational phases, including a training phase, an enrollment phase, and a deployment phase (sometimes referred to as a test phase or testing).”),

	Regarding claims 8 and 18, Chen teaches all of the limitations of claim 1, upon which claims 8 and 18 depend.  
	Additionally, Chen teaches wherein the input audio signal includes a training audio signal received from a database during a training phase ([0082] “The server may receive the enrollee audio signals directly from a device (e.g., telephone, IoT device) of the enrollee, a database, or a device of a third-party (e.g., customer call center system). In some implementations, the server may perform one or more data augmentation operations on the enrollee audio signals, which could include the same or different augmentation operations performed during a training phase”),

	Regarding claims 9 and 19, Chen teaches all of the limitations of claim 1, upon which claims 9 and 19 depend.  
	Additionally, Chen teaches wherein the input audio signal includes an enrollment audio signal received from a user device during an enrollment phase ([0094] “During the optional enrollment phase, the server feeds one or more enrollment audio signals 603 into the embedding extractor 606 to extract an enrollee spoofprint embedding for an enrollee”). 

	Regarding claims 10 and 20, Chen teaches all of the limitations of claim 1 and 11, upon which claims 10 and 20 depend.  
	Additionally, Chen teaches further comprising parsing, by the computer, the input audio signal into one or more segments, wherein the computer determines the liveness score for each successive segment of the input audio signal ([0065] “The pre-processing operations may also include parsing the audio signals into frames or sub-frames, and performing various normalization or scaling operations”).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3-5 and 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Lopez Espejo et al. US 20210125619 A1 (hereinafter Lopez Espejo).

Regarding claims 3 and 13, Chen teaches all of the limitations of claim 1, upon which claims 3 and 13 depend.  
	Additionally, Chen teaches determining, by the computer, a plurality of audio quality parameters for the input audio signal, including the magnitude value ([0009] “The neural network architecture extracts a set of features from audio signals for spoofprints that are (at least in part) different from the set of features extracted for voiceprints.  The feature vectors generated when extracting the voiceprint are based on a set of features reflecting the speaker's voice”)
	Chen fails to teach at least one of a net speech value, SNR, DRR, T60, or C50, wherein the passive liveness detector is trained to calibrate the liveness score further using the plurality of audio quality parameters.
	However, Lopez Espejo teaches at least one of a net speech value, SNR, DRR, T60, or C50 ([0067] “The background noise level may be determined by applying any known technique with said purpose or similar, such as e.g., a signal-to-noise ratio (SNR)-based technique”), wherein the passive liveness detector is trained to calibrate the liveness score further using the plurality of audio quality parameters ([0049] “A score calibration σ(x) of the authentication may be also performed at block 306, wherein x represents a score of the authentication”).
	Chen in view of Lopez Espejo are considered to be analogous to the claimed invention because both are the same field of voice authentication.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the techniques neural network architectures for spoof detection and speaker recognition in audio signals of Chen with the technique of determining SNR of audio signals taught by Lopez Espejo in order to authenticate a user or speaker (see Lopez Espejo [0002]).

	Regarding claims 4 and 14, Chen in view of Lopez Espejo teaches all of the limitations of claim 3, upon which claims 4 and 14 depend.
	Additionally, Lopez Espejo teaches executing, by the computer, a content verifier of the machine-learning architecture to generate a content verification score for the input audio signal, the content verifier trained to determine the content verification score taking challenge content and response content and calibrate the content verification score using one or more audio quality of parameters (FIG. 7, 718, [0087] “The server then determines whether the similarity score satisfies a verification threshold. The server verifies the inbound audio signal as matching the enrollee voice with the speaker and as genuine (not spoofed) when the server determines the inbound combined embedding satisfies the corresponding verification threshold score”).

	Regarding claims 5 and 15, Chen in view of Lopez Espejo teaches all of the limitations of claim 3, upon which claims 5 and 15 depend. 
	Additionally, Chen teaches executing, by the computer, a speaker verifier of the machine-learning architecture to generate a speaker verification score for the input audio signal, the speaker verifier trained to determine the speaker verification score taking a voiceprint for the inbound audio signal and calibrate the speaker verification score using one or more audio quality of parameters ([0061] “the server evaluates the spoofprints and voiceprints without regard to the sequencing, yet require the extracted inbound embeddings to satisfy corresponding thresholds. In some implementations, the server generates a combined similarity score using a voice similarity score (based on comparing the voiceprints) and a spoof likelihood or detection score (based on comparing the spoofprints). The server generates the combined similarity score by summing or otherwise algorithmically combining the voice similarity score and the spoof likelihood score. The server then determines whether the combined similarity score satisfies an authentication or verification threshold score”). 

Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Lopez Espejo as shown above in claim 3, in further view of Rao et al. (US 20220059121 A1).

Regarding claims 6 and 16, Chen in view of Lopez Espejo teaches all of the limitations of claim 3 and 13, upon which claims 6 and 16 depend.  
Chen in view of Lopez Espejo fails to teach generating, by the computer, a fused liveness score for the input audio signal based upon the liveness score and at least one of a content verification score for the input audio signal or a speaker verification score for the input audio signal. 
However, Rao teaches generating, by the computer, a fused liveness score for the input audio signal based upon the liveness score and at least one of a content verification score for the input audio signal or a speaker verification score for the input audio signal ([0012] “The quality measures can then be fused at the score-level with the speaker recognition's embedding comparisons for verifying the speaker. Fusing the quality measures with the similarity scoring essentially calibrates the speaker recognition's outputs based on the realities of what is actually expected for the enrolled caller and what was actually observed for the current inbound caller”; [0034] “The score fusion operation 130 generates a fusion (or calibration) the initial similarity score with the quality measures by algorithmically combining the initial similarity score with the total quality measure to generate a final similarity score.”)
	Chen in view of Lopez Espejo in view of Rao are considered to be analogous to the claimed invention because all are the same field of voice authentication.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified caller authentication techniques of Chen in view of Lopez Espejo with the technique of fusing verification scores taught by Rao in order to deploying a speaker detection fraud detection engine that operates relative to tailored fraud importance for the various types of fraud events (see Rao [0009]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Rao et al. (US 20190037081 A1) teaches a system and method for fraud detection and management are provided. The system includes a first communication device that receives a phone call from a second communication device, wherein a call flow of the phone call comprises one or more distinct phases. The system also includes a fraud detection and management system (FDMS) platform that determines whether the phone call exceeds a predetermined risk threshold at each distinct phase of the call flow.  
	Wolinsky et al. (US 11943387 B1) teaches a system and method for monitoring calls or other communications between two parties by a monitoring service. The system may be configured to determine whether the caller is a known or unknown caller. In the event that the system determines that the caller is an unknown caller (i.e., not in either of a “good” caller or “bad” caller database), then the system may process dialog between the two parties to determine whether a potential fraud is being committed. The processing may be performed using artificial intelligence (AI) in real-time. In response to determining that a potential fraud is being committed, the call may be interdicted by automatically adding a third party, such as a security agent, to the call. The system may be configured to selectively play a pre-call message to a non-subscriber of the monitoring service to notify the non-subscriber that the call is being monitored.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZEESHAN SHAIKH whose telephone number is (703)756-1730. The examiner can normally be reached Monday-Friday 7:30AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ZEESHAN MAHMOOD SHAIKH/Examiner, Art Unit 2658                                                                                                                                                                                                        


/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658

Read full office action

Prosecution Timeline

Apr 25, 2024

Application Filed

Jan 22, 2026

Non-Final Rejection — §102, §103, §112

Apr 02, 2026

Interview Requested

Apr 10, 2026

Examiner Interview Summary

Apr 10, 2026

Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

17/992,340

Patent 12579373

SYSTEM AND METHOD FOR SYNTHETIC TEXT GENERATION TO SOLVE CLASS IMBALANCE IN COMPLAINT IDENTIFICATION

2y 5m to grant Granted Mar 17, 2026

17/915,465

Patent 12555575

Wakeup Indicator Monitoring Method, Apparatus and Electronic Device

2y 5m to grant Granted Feb 17, 2026

17/682,177

Patent 12518090

LOGICAL ROLE DETERMINATION OF CLAUSES IN CONDITIONAL CONSTRUCTIONS OF NATURAL LANGUAGE

2y 5m to grant Granted Jan 06, 2026

17/820,285

Patent 12511318

MULTI-SYSTEM-BASED INTELLIGENT QUESTION ANSWERING METHOD AND APPARATUS, AND DEVICE

2y 5m to grant Granted Dec 30, 2025

17/914,033

Patent 12512088

METHOD AND SYSTEM FOR USER-INTERFACE ADAPTATION OF TEXT-TO-SPEECH SYNTHESIS

2y 5m to grant Granted Dec 30, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

52%

Grant Probability

99%

With Interview (+55.0%)

3y 2m

Median Time to Grant

Low

PTA Risk

Based on 31 resolved cases by this examiner. Grant probability derived from career allow rate.