Last updated: April 19, 2026
Application No. 18/376,822
DEEPFAKE AUDIO DETECTION SYSTEM AND METHOD

Non-Final OA §103
Filed
Oct 04, 2023
Examiner
NGUYEN, QUYNH H
Art Unit
2693
Tech Center
2600 — Communications
Assignee
Mitel Networks Corporation
OA Round
3 (Non-Final)
Interview Optional

— +17.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 1078 resolved cases, 2023–2026
Examiner Intelligence

NGUYEN, QUYNH H View full profile →
Grants 87% — above average
Career Allow Rate
941 granted / 1078 resolved
+25.3% vs TC avg
Strong +17% interview lift
Without
With
+17.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
29 currently pending
Career history
1107
Total Applications
across all art units
Statute-Specific Performance

§101
18.6%
-21.4% vs TC avg
§103
42.7%
+2.7% vs TC avg
§102
7.4%
-32.6% vs TC avg
§112
10.3%
-29.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1078 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
1.	The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claim Rejections - 35 USC § 103
2.	Claims 1-2, 4-6 are rejected under 35 U.S.C. 103 as being unpatentable over submitted prior art Sima (EP 3893477 A1) in view of Verma et al. (2024/0419983) and Traynor et al. (2023/0343342).
As to claim 1, Sima teaches a computer system with deepfake audio capability ([0009]), the computer system comprising:
there are a plurality of agents that include at least a primary agent ([0009, lines 25-26] – an answering and calling-out module, allocating a human agent/attendant after a call is connected) and a secondary agent ([0009, lines 45-53 – random attendant);
a deepfake processor in communication with the server, wherein the deepfake processor includes automatic speech recognition (ASR) software and a deepfake audio replicator (claim 1 and [0012-0013] – speech intention understanding module and speech of a random attendant converted into text by a speech recognition submodule in the voice cloning module and being then generated into speech by the voice cloning module to reply the client);
a first database of the one or more users and a primary agent associated with each of the one or more users, wherein the first database is in communication with the deepfake processor (claim 1 – a manual intervention module, replying the client by speech of the corresponding attendant and by processing reply content of a random attendant into voice of the corresponding attendant through invoking the voice cloning module; [0073] step S3006 – collect voice corpus of the human agent and extract a spectral feature in the voice corpus…; [0080] – obtain content of each question-and-answer session contained in the call content where the question-and-answer session is a process in which both parties of the call inquire and reply); and
a second database of voices for each primary agent and the content of prior sessions with each one or more users and the primary agent associated with each of the one or more users, wherein the second database is in communication with the deepfake processor ([0072-0073] steps S3005 through S3011 – training the voice cloning module, collect voice corpus of the human agent, check the voice corpus and corresponding literal text, preprocess the checked literal text by the text normalization rule and the natural language processing technology and input the spectral feature and the literal text after the vector conversion into the voice cloning module to obtain the trained voice cloning module, hence it would have been obvious that this is done using voice of human agent in prior sessions with user);
wherein the server is configured to connect a user device to a user device of the secondary agent when the primary agent is not available ([0075] – answer a call by the cloned audio where the cloned audio is an audio whose voice feature matches the human agent, hence the human agent or primary agent is not available), and the deepfake processor is configured to (a) utilize the ASR software to recognize the user’s voice (claim 1 and [0012-0013] – speech intention understanding module and speech of a random attendant converted into text by a speech recognition submodule in the voice cloning module and being then generated into speech by the voice cloning module to reply the client), (b) query the first database to identify the primary agent for the user, and (c) using the deepfake audio replicator, substitute the primary agent voice for the secondary agent’s voice (claim 1 and [0009-0010] – allocating a human agent after a call is connected and that a random attendant replies to the user using the human agent’s voice (replying the client by speech of the corresponding attendant and by processing reply content of a random attendant into voice of the corresponding attendant through invoking the voice cloning module); [0012-0013]).
Sima does not explicitly discuss a call center server configured to communicate with one or more user devices and route communications from each of the one or more user devices to a call center agent device based on the availability of a call center agent associated with the call center agent device; a voice characteristic adjuster (VCA) in communication with the deepfake processor and configured to analyze a user’s voice to determine desired voice prosody characteristics; using the VCA, determine a desired voice prosody characteristics of the user based on a fundamental frequency (FO) of the user’s voice; modifying the secondary agent’s voice to have a voice prosody characteristic based on a fundamental frequency of the primary agent’s voice.
Verma teaches a call center server ([0047-0054]) configured to communicate with one or more user devices and route communications from each of the one or more user devices (Fig. 1, terminals 141, 151; [0073] – terminals 141 and 151 may be computers where a user interacts with an application) to a call center agent device ([0085] – navigating to different screens of an agent computer) based on the availability of a call center agent associated with the call center agent device (Fig. 4, 401 – route call to machine learning agent; [0002-0003] – call centers and provide agents to service the customer’s needs and it would have been obvious to route customer to an available agent in order to provide service to the customer); agent voice and speech profile created using any known voice cloning software; the voice cloning software recorded during a call with a customer or recorded separately by the human agent; the voice and speech pattern used to create the agent voice and speech profile and the agent voice and speech profile stored within an agent speech profile database ([0024]).
Traynor teaches detecting audio deepfakes through acoustic prosodic modeling. For example, improved audio deepfakes detection techniques and/or improved audio deepfake machine learning models that employ prosody features associated with audio samples to distinguish between organic audio and deepfake audio can be provided. Prosody features relate to high-level linguistic features of human speech such as, for example, pitch, pitch variance, pitch rate of change, pitch acceleration, intonation (e.g., peaking intonation and/or dipping intonation), vocal jitter, fundamental frequency (F0), vocal shimmer, rhythm, stress, harmonic to noise ratio (HNR), one or more metrics based on vocal range, and/or one or more other prosody features related to human speech ([0030]). System 100 includes a feature extractor 104 that receives one or more audio samples 102. In certain embodiments, the one or more audio samples 102 can be one or more speech samples associated with human speech. Additionally, the one or more audio samples 102 can correspond to a potential audio deepfake or organically generated audio ([0032]); The feature extractor 104 can process the one or more audio samples 102 to determine one or more prosodic features 106 associated with the one or more audio samples 102. The one or more prosodic features 106 can be configured as a feature set F for the model 110. Additionally, the one or more prosodic features 106 can include one or more pitch features, one or more pitch variance features, one or more pitch rate of change features, one or more pitch acceleration features, one or more intonation features (e.g., one or more peaking intonation features and/or one or more dipping intonation features), one or more vocal jitter features, one or more fundamental frequency features, one or more vocal shimmer features, one or more rhythm features, one or more stress features, one or more HNR features, one or more metrics features related to vocal range, and/or one or more other prosody features related to the one or more audio samples 102 ([0033]); and at least a portion of the one or more prosodic features 106 can be derived features associated with the one or more audio samples 102. For example, the feature extractor 104 can derive vocal range, pitch rate of change, pitch acceleration, and/or intonation based on the fundamental frequency sequence of the one or more audio samples 102. In various embodiments, the feature extractor 104 can store a fundamental frequency sequence for each audio sample from the one or more audio samples 102. The feature extractor 104 can employ the fundamental frequency sequence to calculate the derived features included in the one or more prosodic features 106. A fundamental frequency sequence can be a series of F0 values sampled with respect to time ([0053]).
Hence, while Sima teaches using the deepfake audio replicator, substitute the primary agent voice for the secondary agent’s voice (claim 1 and [0009-0010] – allocating a human agent after a call is connected and that a random attendant replies to the user using the human agent’s voice (replying the client by speech of the corresponding attendant and by processing reply content of a random attendant into voice of the corresponding attendant through invoking the voice cloning module) and [0012-0013]) using the voice cloning module to have a voice close to the human agent/attendant; and Traynor teaches improving audio deepfakes detection techniques and/or improved audio deepfake machine learning models that employ prosody features associated with audio samples to distinguish between organic audio and deepfake audio can be provided. Prosody features relate to high-level linguistic features of human speech such as, for example, pitch, pitch variance, pitch rate of change, pitch acceleration, intonation (e.g., peaking intonation and/or dipping intonation), vocal jitter, fundamental frequency (F0), vocal shimmer, rhythm, stress, harmonic to noise ratio (HNR), one or more metrics based on vocal range, and/or one or more other prosody features related to human speech ([0030]); a feature extractor 104 that receives one or more audio samples 102 that can be one or more speech samples associated with human speech. Additionally, the one or more audio samples 102 can correspond to a potential audio deepfake or organically generated audio ([0032]); The feature extractor 104 can process the one or more audio samples 102 to determine one or more prosodic features 106 associated with the one or more audio samples 102. Additionally, the one or more prosodic features 106 can include one or more pitch features, one or more pitch variance features, one or more pitch rate of change features, one or more pitch acceleration features, one or more intonation features (e.g., one or more peaking intonation features and/or one or more dipping intonation features), one or more vocal jitter features, one or more fundamental frequency features, one or more vocal shimmer features, one or more rhythm features, one or more stress features, one or more HNR features, one or more metrics features related to vocal range, and/or one or more other prosody features related to the one or more audio samples 102 ([0033]); and at least a portion of the one or more prosodic features 106 can be derived features associated with the one or more audio samples 102. For example, the feature extractor 104 can derive vocal range, pitch rate of change, pitch acceleration, and/or intonation based on the fundamental frequency sequence of the one or more audio samples 102. In various embodiments, the feature extractor 104 can store a fundamental frequency sequence for each audio sample from the one or more audio samples 102. The feature extractor 104 can employ the fundamental frequency sequence to calculate the derived features included in the one or more prosodic features 106 ([0053]).
It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Verma an Traynor into the teachings of Sima for the purpose of setting up offices or call center with one or more user devices and provide agents to service the customers’ needs and detecting audio deepfakes through acoustic prosodic modeling.
As to claim 2, Sima teaches the call center computer system of claim 1, wherein the second database further includes resonant characteristics of the primary agent’s voice and the deepfake audio replicator is further configured to copy the resonant characteristics ([0073] – …based on the predicted pronunciation feature, convert the text for speech synthesis into Pinyin and phonemes and unitedly code it…; collect voice corpus of the human agent and extract a spectral feature in the voice corpus, check the voice corpus and corresponding literal text…; input the spectral feature and the literal text after the vector conversion into the voice cloning model and operate the back propagation algorithm to perform iterative optimization until the voice cloning model is converge to obtain the trained voice cloning model).
As to claim 4, Sima teaches the call center computer system of claim 1, wherein the call center server is configured to substitute the primary agent voice for each organization representative ([0041] – a voice cloning model matching the corresponding attendant needs to be trained for every human agent…).
As to claim 5, Sima teaches the call center computer system of claim 1, wherein the content in the second database includes user preferences, user transaction history, and other customer relationship management (CRM) information ([0101-0102]).
As to claim 6, Verma teaches the call center computer system of claim 5, wherein the deepfake processor configured to retrieve the content from the second database and provide the content to the secondary agent or to an AI bot ([0090-0091]).

4.	Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Sima, Verma, and Traynor in view of JP 3684521.
	As to claim 3, Sima, Verma, and Traynor do not explicitly discuss the call center computer system of claim 1, wherein the call center server is further configured to (a) provide a notification to the user of the utilization of deepfake audio technology prior to the user being connected to a secondary call center agent and (b) provide an opt-out option for the user in which the deepfake audio technology is not used.
	JP 3684521 teaches detecting a clone terminal being used by notifying the terminal of the fraud factor, the authorized terminal user can confirm the existence of the clone terminal and the clone terminal can be excluded… ([0013]); notifying a user when a clone terminal is detected ([0104-0105]); and the authorized subscriber can disable the clone terminal without applying to the center ([0069]); stopping transmission of a corresponding terminal when a clone terminal is detected ([0108-0109, 0124]); when the clone terminal is detected, the subscriber can cancel the transmission stop of the terminal… ([0126]); it would have been obvious to disable or opt-out the clone terminal without applying to the center when the deepfake audio technology is not use for the purpose of utilizing system resource for other applications.
	It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of JP 11234097 A into the teachings of Sima and Verma for the purpose of allowing user notified when deepfake audio technology being used and having option to op-out the deepfake audio technology.

5.	Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Sima, Verma, and Traynor in view of Deole (2021/0407527).
As to claim 7, Sima teaches the call center computer system of claim 1 that further includes a text-to-speech (TTS) engine in communication with the call center server and configured to convert text entered on the device of the secondary agent into speech of the primary agent ([0046] – inputted text for speech synthesis is analyzed; digits, dates, decimal number, unit symbols and the like converted into Chinese characters according a text normalization rule…and the acoustic feature is converted into a speech waveform by a voice coder so as to obtain corresponding speech).
Sima, Verma, and Traynor do not explicitly teach the call center computer system of claim1, wherein the deepfake processor is configured to modify the secondary agent’s voice to provide intelligible assistance in a target language when the user is a  non-native speaking user.
Deole teaches an AI, such as a trained neural network, takes the language content (e.g., words, phrases, utterances) of an agent’s speech and modifies the audio to comprise altered audio content, e.g., deepfake, altered voice attribute or vocal quality, such as to convey a particular emotional content that was determined to be absent (e.g., empathetic), or present but determined to require removal (e.g., irritation), from the vocalizations provided by the agent’s unaltered voice. As used herein, “words” and “phrases” has their ordinary and customary meaning. An “utterance,” as used herein comprises a sound, quasi-word, pseudo-word(s), etc., vocalized by the agent. For example, an utterance such as “uh huh” is generally understood to mean “yes,” an acknowledgement of understanding, an acknowledgement of hearing, etc.; a vocalization of “hmm,” is generally understood to mean puzzlement, curiosity, uncertainty, etc.; a vocalization of “oh” is generally understood to mean surprise, confusion, disappointment, etc.; and so on. Other utterances examples may include particular sounds provided, such as those utilized by non-English speakers ([0010]).
It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Deole into the teachings of Sima and Verma for the purpose of conveying emotion or sentiment, that are determined to be more suitable for a given communication content or customer.

6.	Claims 8, 10-11, 15, 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sima and Verma in view of Traynor et al. (2023/0343342) and Gupta et al. (2022/0392452).
Claims 8 and 15 are rejected for the same reasons discussed above with respect to claim 1. Sima, Verma, and Traynor do not explicitly discuss each of the one or more user devices is associated with a unique user; identify each unique user by the unique user’s voice. However, Verma teaches the AI agent may be programmed to process the voice inputs from human caller using natural language processing ([0016]), and each of plurality of valid mappings associated with one of a plurality of intents of a call from a human caller ([0025]).
Gupta teaches speaker recognition (voice biometrics) utilizes unique characteristics of a person’s voice to identify or authenticate the person as a user of a device or service ([0024]).
It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Gupta into the teachings of Sima, Verma, and Traynor for the purpose of evaluating to generate feature vectors combined from multiple samples of the user to produce an embedding vector and authenticate the person as a user of a device.
As to claims 10 and 19, Verma teaches agent voice and speech profile created using any known voice cloning software; the voice cloning software recorded during a call with a customer or recorded separately by the human agent; the voice and speech pattern used to create the agent voice and speech profile and the agent voice and speech profile stored within an agent speech profile database ([0024]); and Sima teaches replying the client by speech of the corresponding attendant and by processing reply content of a random attendant into voice of the corresponding attendant through invoking the voice cloning module and further including a retrieving and prompting submodule that can invoke the speech intention understanding module to display a call intention and give the random attendant a prompt for conversation ([0009]), hence assisting the secondary agent or random attendant with continuity in providing user service.
As to claim 11, Sima teaches the call center computer method of claim 8, wherein the call center server is further configured to enable the secondary agent to provide an assistant role during which the secondary agent voice is substituted for the primary agent’s voice during the communication ([0013, 0033]).
As to claim 17, Verma teaches the non-transient computer readable medium of claim 15, wherein the deepfake processor stores the communication and the modified secondary agent’s voice in the second database ([0024] – voice cloning software records voice samples from human agent during a call with a customer; the agent voice and speech profile stored within an agent speech profile database located within the voice AI container and [0087-0088] - recording voice samples from a human agent. The voices samples may be recorded during a call with a customer or may be recorded separately. At step 503, methods may include sending the voice samples, recorded in step 501, to a voice cloning software. At step 505, methods may include the voice cloning software creating an agent voice and speech profile using the received voice samples in step 503 and associating the human agent with the agent voice and speech profile created in step 505. At step 509, methods may include storing the voice and speech profile, created in step 505, and associated with the human agent in step 507, in an agent speech profile database) for subsequent retrieval and communications with the unique user based on a customer relationship management profile associated with the unique user ([0090] – the AI agent retrieves an agent voice and speech profile from the agent speech profile database to use when responding to a human caller 600).
As to claim 18, Sima teaches the non-transient computer readable medium of claim 15, wherein the processor is configured to provide the generated deepfake audio to the secondary agent (claim 1 and [0009-0010] –replying the client by speech of the corresponding attendant and by processing reply content of a random attendant into voice of the corresponding attendant through invoking the voice cloning module).
As to claim 20, Verma teaches a call center server ([0047-0054]) configured to communicate with one or more user devices and route communications from each of the one or more user devices (Fig. 1, terminals 141, 151; [0073] – terminals 141 and 151 may be computers where a user interacts with an application) to a call center agent device ([0085] – navigating to different screens of an agent computer). Sima teaches using the deepfake audio replicator, substitute the primary agent voice for the secondary agent’s voice (claim 1 and [0009-0010] – allocating a human agent after a call is connected and that a random attendant replies to the user using the human agent’s voice (replying the client by speech of the corresponding attendant and by processing reply content of a random attendant into voice of the corresponding attendant through invoking the voice cloning module) and [0012-0013]) using the voice cloning module to have a voice close to the human agent/attendant; and Traynor teaches the classification 250 can be a deepfake audio prediction for the one or more audio samples 102 and visual data associated with the classification 250 can be rendered via a graphical user interface of a computing device ([0067]); improving audio deepfakes detection techniques and/or improved audio deepfake machine learning models that employ prosody features associated with audio samples to distinguish between organic audio and deepfake audio can be provided. Prosody features relate to high-level linguistic features of human speech such as, for example, pitch, pitch variance, pitch rate of change, pitch acceleration, intonation (e.g., peaking intonation and/or dipping intonation), vocal jitter, fundamental frequency (F0), vocal shimmer, rhythm, stress, harmonic to noise ratio (HNR), one or more metrics based on vocal range, and/or one or more other prosody features related to human speech ([0030]); a feature extractor 104 that receives one or more audio samples 102 that can be one or more speech samples associated with human speech. Additionally, the one or more audio samples 102 can correspond to a potential audio deepfake or organically generated audio ([0032]); The feature extractor 104 can process the one or more audio samples 102 to determine one or more prosodic features 106 associated with the one or more audio samples 102. Additionally, the one or more prosodic features 106 can include one or more pitch features, one or more pitch variance features, one or more pitch rate of change features, one or more pitch acceleration features, one or more intonation features (e.g., one or more peaking intonation features and/or one or more dipping intonation features), one or more vocal jitter features, one or more fundamental frequency features, one or more vocal shimmer features, one or more rhythm features, one or more stress features, one or more HNR features, one or more metrics features related to vocal range, and/or one or more other prosody features related to the one or more audio samples 102 ([0033]); and at least a portion of the one or more prosodic features 106 can be derived features associated with the one or more audio samples 102. For example, the feature extractor 104 can derive vocal range, pitch rate of change, pitch acceleration, and/or intonation based on the fundamental frequency sequence of the one or more audio samples 102. In various embodiments, the feature extractor 104 can store a fundamental frequency sequence for each audio sample from the one or more audio samples 102. The feature extractor 104 can employ the fundamental frequency sequence to calculate the derived features included in the one or more prosodic features 106 ([0053]).

7.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Sima, Verma, Traynor, and Gupta in view of Deole et al. (CN 115776552 A).
	As to claim 9, Sima teaches a voice cloning model for synthesizing voice of a corresponding attendant, communicate with the client by the voice of the corresponding attendant and a manual intervention module replying the client by speech of the corresponding attendant and by processing reply content of a random attendant into voice of the corresponding attendant by invoking the voice cloning module (claim 1). Sima, Verma, Traynor, and Gupta do not explicitly discuss the method of claim 8, wherein the processing by the deepfake processor further queries the second database and analyses the periodic tone, tempo, pronunciation, enunciation and other voice characteristics specific to the primary agent.
	Deole teaches the audio of the human agent is analyzed to determine attributes other than the word explicitly spoken…the sound attribute includes a tone, a voice speed, a flutter, a whole pitch, breathing and other sound attributes and changes, a change rate, a degree of change or a difference between two or more portions of the speed (3rd paragraph after Fig. 8 and related text).
	It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Deole into the teachings of Sima, Verma, Traynor, and Gupta for the purpose of manipulating a real time audio including speech to become modified audio such as audio depth counterfeiting, voice clone and so on.

8.	Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Sima, Verma, Traynor, and Gupta in view of Lebedev et al. (2019/0306252).
As to claim 12, Sima teaches the call center computer method of claim 8 wherein the AI bot is in communication with the deepfake processor and the deepfake audio replicator provides the primary agent’s voice to the AI bot ([0028] – the voice of the attendant corresponding to the human agent is generated by the voice cloning module makes the client feel that he or she is communicating with an attendant of the human agent all the time, achieving a seamless switchover between a person and the speech robot; [0040] – the voice corpus of the attendant corresponding to the human agent is collected for training the voice cloning model and therefore voice of the trained  voice cloning model is relatively highly similar to that of the corresponding attendant, thus a seamless switchover between the speech robot and the human agent achieved). Sima, Verma, Traynor, and Gupta do not explicitly discuss routing the unique user call to an AI bot if the primary agent and the secondary agent are not available.
 	Lebedev teaches later the user returns and asks a question but no agent is available, the chat bot can provide or engage in an automated response ([0098]).
	It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Lebedev into the teachings of Sima, Verma, Traynor, and Gupta for the purpose of better provide customer service when agents are not available to provide services.

9.	Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Sima, Verma, Traynor, and Gupta in view of Kirchhoff (EP4266659 A1).
As to claim 13, Sima teaches a manual attendant of the human agent answers the caller by speech based on historical call intention and the preset call prompt text ([0101-0102]). Sima, Verma, Traynor, and Gupta do not explicitly discuss the call center server changing the decibel level of the primary agent voice.
Kirchhoff teaches measuring an increase in decibel levels in the agent stream data or call audio data (claims 2, 9, 12, 19, 22, 29).
It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Kirchhoff into the teachings of Sima, Verma, Traynor, and Gupta for the purpose of measuring the decibel level of each frame of audio seam data in predicting whether a call was answered by a human agent or sent to voice mail depending on characteristics of the calls for a given business.

10.	Claims 14 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Sima, Verma, Traynor, and Gupta in view of Fard (2025/0078873).
	As to claims 14 and 16, Gupta teaches the call center computer method of claim 8 and the non-transient computer readable medium of claim 15, wherein the identification server 102 receives the inbound contact of a transaction request that involves a voice assistant device 114c, at 9:00 am on a Wednesday, then extracts time-related features (and other types of features) and one or more feature vector embeddings, and match these features or embeddings against the time-related features or feature vectors of other identities that regularly use the device around 9:00 am on Wednesdays ([0087]). Sima, Verma, Traynor, and Gupta do not explicitly discuss changing the deep fake voice during the communication.
	Fard teaches changing/switching the voice on songs and singing elements within the media content, for example, a user can change the voice of Ryan Gosling in the movie “La La Land” where he is singing; changing the voice of the artist on songs and music videos ([0010-0011]); allowing users to use deepfake and voice changing technology to replace voice and the accent of the original cast in media content with their own and celebrities or other voice and accent (abstract; [0038]).
	It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Fard into the teachings of Sima, Verma, Traynor, and Gupta for the purpose of establishing a deeper connections with potential customers.
Response to Arguments
11.	Applicant’s arguments with respect to claims 1-20 have been considered but are moot in view of new ground(s) of rejection(s).
	With respect to independent claims 1, 8, and 15, Applicant argues that “Neither Sima, alone, nor Sima and Verma combined, teach the limitations of the amended claims”. Examiner respectfully disagrees. Please refer to the above claims rejections that the combination of Sima, Verma and Traynor teaches newly amended claims.
Furthermore, Applicant argues that claims 7, 17, and 20 have been amended and reexamination and allowance in light of the arguments and remarks made herein are respectfully requested. Examiner respectfully submits that please also refer to the above claims rejections of claims 7, 17, and 20.
Conclusion
12.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUYNH H NGUYEN whose telephone number is (571)272-7489. The examiner can normally be reached Monday-Friday 7:30AM-3:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached on 571-272-7488. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/QUYNH H NGUYEN/Primary Examiner, Art Unit 2693
Read full office action
Prosecution Timeline

Oct 04, 2023
Application Filed
Jun 14, 2025
Non-Final Rejection — §103
Sep 15, 2025
Response Filed
Oct 06, 2025
Final Rejection — §103
Dec 04, 2025
Examiner Interview Summary
Dec 04, 2025
Applicant Interview (Telephonic)
Dec 22, 2025
Request for Continued Examination
Jan 08, 2026
Response after Non-Final Action
Jan 18, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/367,310
Patent 12591740
METHODS AND SYSTEMS FOR GENERATING TEXTUAL FEATURES
2y 5m to grant Granted Mar 31, 2026
17/942,860
Patent 12567409
RESTRICTING THIRD PARTY APPLICATION ACCESS TO AUDIO DATA CONTENT
2y 5m to grant Granted Mar 03, 2026
18/663,662
Patent 12566920
System and Method to Generate and Enhance Dynamic Interactive Applications from Natural Language Using Artificial Intelligence
2y 5m to grant Granted Mar 03, 2026
18/459,819
Patent 12563141
SYSTEM AND METHOD OF CONNECTING A CALLER TO A RECIPIENT BASED ON THE RECIPIENT'S STATUS AND RELATIONSHIP TO THE CALLER
2y 5m to grant Granted Feb 24, 2026
18/468,679
Patent 12554761
DATA SOURCE CURATION FOR LARGE LANGUAGE MODEL (LLM) PROMPTS
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
87%
Grant Probability
99%
With Interview (+17.2%)
2y 8m
Median Time to Grant
High
PTA Risk
Based on 1078 resolved cases by this examiner. Grant probability derived from career allow rate.