Last updated: April 19, 2026

Application No. 19/213,893

GENERALIZING AUDIO DEEPFAKE DETECTION BY EXPLORING STYLE-LINGUISTICS MISMATCH

Non-Final OA §103

Filed

May 20, 2025

Examiner

JACKSON, JAKIEDA R

Art Unit

2657

Tech Center

2600 — Communications

Assignee

Reality Defender Inc.

OA Round

3 (Non-Final)

Interview Optional

— +15.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 905 resolved cases, 2023–2026

Examiner Intelligence

JACKSON, JAKIEDA R View full profile →

Grants 74% — above average

Career Allow Rate

669 granted / 905 resolved

+11.9% vs TC avg

Strong +15% interview lift

Without

With

+15.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

35 currently pending

Career history

940

Total Applications

across all art units

Statute-Specific Performance

§101

25.8%

-14.2% vs TC avg

§103

42.5%

+2.5% vs TC avg

§102

21.8%

-18.2% vs TC avg

§112

3.5%

-36.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 905 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 9, 2026 has been entered.

Response to Arguments
Applicants argue that the prior art cited fails to teach generating, based on the one or more style embeddings and the one or more linguistic embeddings, one or more dependency embeddings representing dependencies between the one or more style embeddings and the one or more linguistic embeddings.  Applicants’ arguments are persuasive, but are moot in view of new grounds of rejection.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-5 and 13-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Altaf et al. (PGPUB 2024/0363099), hereinafter referenced as Altaf in view of Shekhar et al. (PGPUB 2022/0228367), hereinafter referenced as Shekhar.

Regarding claims 1 and 19-20, Altaf discloses a method, system and medium, hereinafter referenced as a method for classifying audio data, the method comprising:
inputting the audio data into a trained machine-learning model, wherein the trained machine-learning model (trained machine learning; p. 0041, 0138-0139), is configured to:
generate, using a style encoder of the machine-learning model, one or more style embeddings (style and tone) representing nonverbal characteristics of the audio data (text; p. 0178, 0196);
generate, using a linguistic encoder of the machine-learning model (encoder), one or more linguistic embeddings (semantic) representing textual content (textual content) of the audio data (p. 0165, 0196-0197);
generate one or more dependency embeddings representing dependencies between the one or more style embeddings and the one or more linguistic embeddings (embeddings and classification; p. 0179-0180, 0237-0242);
inputting the one or more dependency embeddings into a classification head of the machine-learning model (trained to classify the audio data as real or fake; p. 0103-0107, 0172-0181, 0209-0220); and
obtaining, from the trained machine-learning model, a classification result of whether the audio data is real or fake (trained to classify the audio data as real or fake; p. 0103-0107, 0172-0181, 0209-0220), but does not specifically teach generating, based on the one or more style embeddings and the one or more linguistic embeddings, one or more dependency embeddings representing dependencies between nonverbal speech characteristics and verbal speech characteristics.
Shekhar discloses a method comprising generating, based on the one or more style embeddings (style features such as pitch, emotion, etc.; p. 0019) and the one or more linguistic embeddings, one or more dependency embeddings representing dependencies between nonverbal speech characteristics and verbal speech characteristics (conveyed by the audio generated by text; p. 0019, 0103-0111, to more accurately capture the expressiveness of an input text.
Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to increase efficiency.
Regarding claim 2, Altaf discloses a method wherein the audio data comprises real human speech, synthetic human speech, or both real human speech and synthetic human speech (machine or human; p. 0020, 0130, 0179-0185).  
Regarding claim 3, Altaf discloses a method wherein the one or more machine learning models have been trained using bona fide audio data to learn dependencies between nonverbal characteristics and textual content in real human speech (bona fide enrollment audio signals; p. 0121, 0172-0181, 0209-0220).  
Regarding claim 4, Altaf discloses a method wherein determining a first subset of the one or more dependency embeddings comprises:
inputting the one or more style embeddings into a style compressor (style embedding; p. 0178-0179);
compressing the one or more style embeddings to create one or more style dependency embeddings (compress; p. 0104-0105). 
Regarding claim 5, Altaf discloses a method wherein determining a second subset of the one or more dependency embeddings comprises:
inputting the one or more linguistic embeddings into a linguistic compressor (grammar; p. 0178-0179); and
compressing the one or more linguistic embeddings to create one or more linguistic dependency embeddings (compress; p. 0104-0105).  
Regarding claim 13, Altaf discloses a method wherein the one or more style embeddings represent one or more attributes selected from the group comprising: speaker identity (individual identity; p. 0104), gender, emotion (p. 0174-0176), accent (p. 0174-0176), tone (p. 0178), speech rate (speech rate; p. 0284), health state, age, vocal pitch (p. 0174-0176), vocal intensity (p. 0174, 0179, 0199), and cognitive state (p. 0174-0176). 
Regarding claim 14, Altaf discloses a method wherein the classification head has been trained to classify audio as real or fake via supervised learning using labeled audio data (supervised learning; p. 0140).  
Regarding claim 15, Altaf discloses a method wherein the style compressor and the linguistics compressor are trained in a first training phase using only bona fide audio data, and wherein the classification head is trained during a second training phase (second authentication) using labeled bona fide audio data and labeled fake audio data (label indicating human or machine speech; p. 0066-0071). 
Regarding claim 16, Altaf discloses a method comprising: 
permitting access to a computing resource or protected endpoint based on the classification result (p. 0116), wherein the classification result indicates that the audio is real (p. 0103-0107, 0172-0181, 0209-0220).  
Regarding claim 17, Altaf discloses a method comprising: 
restricting access to a computing resource or protected endpoint based on the classification result, wherein the classification result indicates that the audio is fake (authenticate/restrict; p. 0319-0320).  
Regarding claim 18, Altaf discloses a method comprising: 
displaying an alert via a user interface based on the classification result, wherein the classification result indicates that the audio is fake (display genuine or fake; p. 0173).


Claim(s) 9-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Altaf in view of Shekhar and in further view of Chen et al. (PGPUB 2024/0005905), hereinafter referenced as Chen.

Regarding claim 9, Altaf in view of J Shekhar disclose a method as described above, but does not specifically teach a method comprising: 
generating one or more supplementary style embeddings based on the one or more style embeddings, wherein the one or more supplementary style embeddings include information-rich portions of the input audio data.  
Chen discloses a method comprising:
generating one or more supplementary style embeddings based on the one or more style embeddings, wherein the one or more supplementary style embeddings include information-rich portions of the input audio data (p. 0227), to make training simple and efficient.
Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to improve naturalness and emotion richness.
Regarding claim 10, it is interpreted and rejected for similar reasons as set forth above.  In addition, Chen discloses a method comprising: 
generating one or more supplementary linguistic embeddings based on the one or more linguistic embeddings, wherein the one or more supplementary linguistic embeddings include information-rich portions of the input audio data (p. 0180, 0227).  

Claim(s) 11-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Altaf in view of Shekhar and Chen and in further view of Yan et al. (PGPUB 2025/0094718), hereinafter referenced as Yan.

Regarding claim 11, Altaf in view of Shekhar and Chen disclose a method as described above, but does not specifically disclose a method comprising concatenating the one or more supplementary style embeddings, one or more supplementary linguistic embeddings, one or more style dependency embeddings, and one or more linguistic dependency embeddings to one another and inputting the concatenated embeddings into the classifier module, to condense embeddings.
Yan discloses a method of concatenating the one or more supplementary style embeddings, one or more supplementary linguistic embeddings, one or more style dependency embeddings, and one or more linguistic dependency embeddings to one another (p. 0080-0081); and
inputting the concatenated embeddings into the classifier module (p. 0050-0055), to condense embeddings.
Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to assist with improving tasks.
Regarding claim 12, it is interpreted and rejected for similar reasons as set forth above.  In addition, Yan discloses a method wherein the one or more supplementary style embeddings and one or more supplementary linguistic embeddings are generated using an attentive statistics pooling module (p. 0080) and a multi-layer perceptron module (p. 0066-0067).

Allowable Subject Matter
Claims 6-8 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAKIEDA R JACKSON whose telephone number is (571)272-7619. The examiner can normally be reached Mon - Fri 6:30a-2:30p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at 571.272.5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JAKIEDA R JACKSON/Primary Examiner, Art Unit 2657

Read full office action

Prosecution Timeline

May 20, 2025

Application Filed

Jun 27, 2025

Non-Final Rejection — §103

Sep 25, 2025

Applicant Interview (Telephonic)

Sep 26, 2025

Examiner Interview Summary

Oct 01, 2025

Response Filed

Oct 10, 2025

Final Rejection — §103

Dec 08, 2025

Applicant Interview (Telephonic)

Dec 10, 2025

Examiner Interview Summary

Dec 15, 2025

Response after Non-Final Action

Jan 09, 2026

Request for Continued Examination

Jan 23, 2026

Response after Non-Final Action

Feb 02, 2026

Non-Final Rejection — §103

Apr 15, 2026

Examiner Interview Summary

Apr 15, 2026

Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

18/151,953

Patent 12603079

PROVIDING A REPOSITORY OF AUDIO FILES HAVING PRONUNCIATIONS FOR TEXT STRINGS TO PROVIDE TO A SPEECH SYNTHESIZER

2y 5m to grant Granted Apr 14, 2026

18/379,618

Patent 12603088

TRAINING A DEVICE SPECIFIC ACOUSTIC MODEL

2y 5m to grant Granted Apr 14, 2026

17/750,345

Patent 12598092

SYSTEMS, METHODS, AND APPARATUS FOR NOTIFYING A TRANSCRIBING AND TRANSLATING SYSTEM OF SWITCHING BETWEEN SPOKEN LANGUAGES

2y 5m to grant Granted Apr 07, 2026

18/327,115

Patent 12597427

CONFIGURABLE NATURAL LANGUAGE OUTPUT

2y 5m to grant Granted Apr 07, 2026

18/614,575

Patent 12597418

AUDIO SIGNAL PROCESSING DEVICE AND METHOD FOR SYNCHRONIZING SPEECH AND TEXT BY USING MACHINE LEARNING MODEL

2y 5m to grant Granted Apr 07, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

74%

Grant Probability

89%

With Interview (+15.4%)

3y 0m

Median Time to Grant

High

PTA Risk

Based on 905 resolved cases by this examiner. Grant probability derived from career allow rate.

GENERALIZING AUDIO DEEPFAKE DETECTION BY EXPLORING STYLE-LINGUISTICS MISMATCH

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email