Prosecution Insights
Last updated: April 19, 2026
Application No. 19/213,893

GENERALIZING AUDIO DEEPFAKE DETECTION BY EXPLORING STYLE-LINGUISTICS MISMATCH

Non-Final OA §103
Filed
May 20, 2025
Examiner
JACKSON, JAKIEDA R
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Reality Defender Inc.
OA Round
3 (Non-Final)
74%
Grant Probability
Favorable
3-4
OA Rounds
3y 0m
To Grant
89%
With Interview

Examiner Intelligence

Grants 74% — above average
74%
Career Allow Rate
669 granted / 905 resolved
+11.9% vs TC avg
Strong +15% interview lift
Without
With
+15.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
35 currently pending
Career history
940
Total Applications
across all art units

Statute-Specific Performance

§101
25.8%
-14.2% vs TC avg
§103
42.5%
+2.5% vs TC avg
§102
21.8%
-18.2% vs TC avg
§112
3.5%
-36.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 905 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on January 9, 2026 has been entered. Response to Arguments Applicants argue that the prior art cited fails to teach generating, based on the one or more style embeddings and the one or more linguistic embeddings, one or more dependency embeddings representing dependencies between the one or more style embeddings and the one or more linguistic embeddings. Applicants’ arguments are persuasive, but are moot in view of new grounds of rejection. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-5 and 13-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Altaf et al. (PGPUB 2024/0363099), hereinafter referenced as Altaf in view of Shekhar et al. (PGPUB 2022/0228367), hereinafter referenced as Shekhar. Regarding claims 1 and 19-20, Altaf discloses a method, system and medium, hereinafter referenced as a method for classifying audio data, the method comprising: inputting the audio data into a trained machine-learning model, wherein the trained machine-learning model (trained machine learning; p. 0041, 0138-0139), is configured to: generate, using a style encoder of the machine-learning model, one or more style embeddings (style and tone) representing nonverbal characteristics of the audio data (text; p. 0178, 0196); generate, using a linguistic encoder of the machine-learning model (encoder), one or more linguistic embeddings (semantic) representing textual content (textual content) of the audio data (p. 0165, 0196-0197); generate one or more dependency embeddings representing dependencies between the one or more style embeddings and the one or more linguistic embeddings (embeddings and classification; p. 0179-0180, 0237-0242); inputting the one or more dependency embeddings into a classification head of the machine-learning model (trained to classify the audio data as real or fake; p. 0103-0107, 0172-0181, 0209-0220); and obtaining, from the trained machine-learning model, a classification result of whether the audio data is real or fake (trained to classify the audio data as real or fake; p. 0103-0107, 0172-0181, 0209-0220), but does not specifically teach generating, based on the one or more style embeddings and the one or more linguistic embeddings, one or more dependency embeddings representing dependencies between nonverbal speech characteristics and verbal speech characteristics. Shekhar discloses a method comprising generating, based on the one or more style embeddings (style features such as pitch, emotion, etc.; p. 0019) and the one or more linguistic embeddings, one or more dependency embeddings representing dependencies between nonverbal speech characteristics and verbal speech characteristics (conveyed by the audio generated by text; p. 0019, 0103-0111, to more accurately capture the expressiveness of an input text. Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to increase efficiency. Regarding claim 2, Altaf discloses a method wherein the audio data comprises real human speech, synthetic human speech, or both real human speech and synthetic human speech (machine or human; p. 0020, 0130, 0179-0185). Regarding claim 3, Altaf discloses a method wherein the one or more machine learning models have been trained using bona fide audio data to learn dependencies between nonverbal characteristics and textual content in real human speech (bona fide enrollment audio signals; p. 0121, 0172-0181, 0209-0220). Regarding claim 4, Altaf discloses a method wherein determining a first subset of the one or more dependency embeddings comprises: inputting the one or more style embeddings into a style compressor (style embedding; p. 0178-0179); compressing the one or more style embeddings to create one or more style dependency embeddings (compress; p. 0104-0105). Regarding claim 5, Altaf discloses a method wherein determining a second subset of the one or more dependency embeddings comprises: inputting the one or more linguistic embeddings into a linguistic compressor (grammar; p. 0178-0179); and compressing the one or more linguistic embeddings to create one or more linguistic dependency embeddings (compress; p. 0104-0105). Regarding claim 13, Altaf discloses a method wherein the one or more style embeddings represent one or more attributes selected from the group comprising: speaker identity (individual identity; p. 0104), gender, emotion (p. 0174-0176), accent (p. 0174-0176), tone (p. 0178), speech rate (speech rate; p. 0284), health state, age, vocal pitch (p. 0174-0176), vocal intensity (p. 0174, 0179, 0199), and cognitive state (p. 0174-0176). Regarding claim 14, Altaf discloses a method wherein the classification head has been trained to classify audio as real or fake via supervised learning using labeled audio data (supervised learning; p. 0140). Regarding claim 15, Altaf discloses a method wherein the style compressor and the linguistics compressor are trained in a first training phase using only bona fide audio data, and wherein the classification head is trained during a second training phase (second authentication) using labeled bona fide audio data and labeled fake audio data (label indicating human or machine speech; p. 0066-0071). Regarding claim 16, Altaf discloses a method comprising: permitting access to a computing resource or protected endpoint based on the classification result (p. 0116), wherein the classification result indicates that the audio is real (p. 0103-0107, 0172-0181, 0209-0220). Regarding claim 17, Altaf discloses a method comprising: restricting access to a computing resource or protected endpoint based on the classification result, wherein the classification result indicates that the audio is fake (authenticate/restrict; p. 0319-0320). Regarding claim 18, Altaf discloses a method comprising: displaying an alert via a user interface based on the classification result, wherein the classification result indicates that the audio is fake (display genuine or fake; p. 0173). Claim(s) 9-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Altaf in view of Shekhar and in further view of Chen et al. (PGPUB 2024/0005905), hereinafter referenced as Chen. Regarding claim 9, Altaf in view of J Shekhar disclose a method as described above, but does not specifically teach a method comprising: generating one or more supplementary style embeddings based on the one or more style embeddings, wherein the one or more supplementary style embeddings include information-rich portions of the input audio data. Chen discloses a method comprising: generating one or more supplementary style embeddings based on the one or more style embeddings, wherein the one or more supplementary style embeddings include information-rich portions of the input audio data (p. 0227), to make training simple and efficient. Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to improve naturalness and emotion richness. Regarding claim 10, it is interpreted and rejected for similar reasons as set forth above. In addition, Chen discloses a method comprising: generating one or more supplementary linguistic embeddings based on the one or more linguistic embeddings, wherein the one or more supplementary linguistic embeddings include information-rich portions of the input audio data (p. 0180, 0227). Claim(s) 11-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Altaf in view of Shekhar and Chen and in further view of Yan et al. (PGPUB 2025/0094718), hereinafter referenced as Yan. Regarding claim 11, Altaf in view of Shekhar and Chen disclose a method as described above, but does not specifically disclose a method comprising concatenating the one or more supplementary style embeddings, one or more supplementary linguistic embeddings, one or more style dependency embeddings, and one or more linguistic dependency embeddings to one another and inputting the concatenated embeddings into the classifier module, to condense embeddings. Yan discloses a method of concatenating the one or more supplementary style embeddings, one or more supplementary linguistic embeddings, one or more style dependency embeddings, and one or more linguistic dependency embeddings to one another (p. 0080-0081); and inputting the concatenated embeddings into the classifier module (p. 0050-0055), to condense embeddings. Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to assist with improving tasks. Regarding claim 12, it is interpreted and rejected for similar reasons as set forth above. In addition, Yan discloses a method wherein the one or more supplementary style embeddings and one or more supplementary linguistic embeddings are generated using an attentive statistics pooling module (p. 0080) and a multi-layer perceptron module (p. 0066-0067). Allowable Subject Matter Claims 6-8 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAKIEDA R JACKSON whose telephone number is (571)272-7619. The examiner can normally be reached Mon - Fri 6:30a-2:30p. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at 571.272.5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /JAKIEDA R JACKSON/Primary Examiner, Art Unit 2657
Read full office action

Prosecution Timeline

May 20, 2025
Application Filed
Jun 27, 2025
Non-Final Rejection — §103
Sep 25, 2025
Applicant Interview (Telephonic)
Sep 26, 2025
Examiner Interview Summary
Oct 01, 2025
Response Filed
Oct 10, 2025
Final Rejection — §103
Dec 08, 2025
Applicant Interview (Telephonic)
Dec 10, 2025
Examiner Interview Summary
Dec 15, 2025
Response after Non-Final Action
Jan 09, 2026
Request for Continued Examination
Jan 23, 2026
Response after Non-Final Action
Feb 02, 2026
Non-Final Rejection — §103
Apr 15, 2026
Examiner Interview Summary
Apr 15, 2026
Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12603079
PROVIDING A REPOSITORY OF AUDIO FILES HAVING PRONUNCIATIONS FOR TEXT STRINGS TO PROVIDE TO A SPEECH SYNTHESIZER
2y 5m to grant Granted Apr 14, 2026
Patent 12603088
TRAINING A DEVICE SPECIFIC ACOUSTIC MODEL
2y 5m to grant Granted Apr 14, 2026
Patent 12598092
SYSTEMS, METHODS, AND APPARATUS FOR NOTIFYING A TRANSCRIBING AND TRANSLATING SYSTEM OF SWITCHING BETWEEN SPOKEN LANGUAGES
2y 5m to grant Granted Apr 07, 2026
Patent 12597427
CONFIGURABLE NATURAL LANGUAGE OUTPUT
2y 5m to grant Granted Apr 07, 2026
Patent 12597418
AUDIO SIGNAL PROCESSING DEVICE AND METHOD FOR SYNCHRONIZING SPEECH AND TEXT BY USING MACHINE LEARNING MODEL
2y 5m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
74%
Grant Probability
89%
With Interview (+15.4%)
3y 0m
Median Time to Grant
High
PTA Risk
Based on 905 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month