Last updated: May 29, 2026

Application No. 18/911,200

Automatic Dubbing: Methods and Apparatuses

Non-Final OA §103§112

Filed

Oct 09, 2024

Priority

Oct 09, 2023 — provisional 63/543,232

Examiner

ZHAO, DAQUAN

Art Unit

2484

Tech Center

2400 — Computer Networks

Assignee

Applications Technology (AppTek), LLC

OA Round

1 (Non-Final)

Interview Optional

— +14.7% interview lift. Interview lift (+14.7%) is below the 15.0% threshold. A written response is recommended.

Based on 1035 resolved cases, 2023–2026

Examiner Intelligence

ZHAO, DAQUAN View full profile →

Grants 77% — above average

Career Allowance Rate

797 granted / 1035 resolved

+19.0% vs TC avg

Moderate +15% lift

Without

With

+14.7%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

20 currently pending

Career history

1055

Total Applications

across all art units

Statute-Specific Performance

§101

4.3%

-35.7% vs TC avg

§103

72.2%

+32.2% vs TC avg

§102

8.5%

-31.5% vs TC avg

§112

5.9%

-34.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1035 resolved cases

Office Action

§103 §112

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 19-22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 19 recites synthesizing voices in the target language based on the original audio stream and the at least one speaker feature vector corresponding to at least one speaker represented as speech in the original audio stream that controllably sounds like the at least one speaker in the original audio. The claimed “…the original audio stream that controllably sounds like the at least one speaker in the original audio.” is not clear since the original audio comes from the speaker.  
Claims 20-22 are also affected. 
Claim 22 recites the limitation "The system according to claim 19". Claim 19 is a method claim. There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 19-22 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al (US 2022/0238095) and further in view of Kim et al (US 2020/0082806). 
For claim 1, Liu et al teach a speaker-adaptive system for dubbing in different languages, comprising: 
inputs configured to receive text in a target language (e.g. figure 4: S460 “obtain a to-be-synthesized text”), an original audio stream (e.g. figure 4: S410 “obtain a to-be-synthesized audio file”), and at least one speaker feature vector corresponding to at least one speaker represented as speech in the original audio stream (e.g, paragraph 36: speech feature vectors and acoustic parameters may be obtained through an acoustic processor. Figure 4: perform analysis by using an acoustic processor S420), and wherein the original audio stream includes speech in a first language different from the target language (e.g. paragraph 47:  A cross-language output may be made between an original audio file and a generated speech),
a trained TTS system capable of receiving the inputs and synthesizing voices in the target language, wherein in response to receiving the inputs, the trained TTS system is configured to synthesize voices as output in the target language based on the at least one speaker feature vector and the original audio that controllably sound like the at least one speaker in the original audio (e.g. figure 4: A TTS synthesizer performs synthesis, paragraph 46: universal speech feature vectors are obtained, and are used in a TTS model, so that the TTS model may adapt to an unknown speaker, and may even generate a speaker autonomously).
Liu et al do not further specify translated text. Kim et al teaches translated text. (e.g. figure 6: input text of second language 612, paragraph 67: The machine translator 820 included in the speech translation system 800 may convert or translate the input text of the first language into an input text of a second language and deliver the  input text of the second language to the speech synthesizer 830). It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Kim et al into the teaching of Liu et al to translate text or voice signal related to voices of respective speaker in target language to improve convenience for user (e.g. paragraph 4). 
For claim 19, Liu et al teach a speaker-adaptive system for dubbing in different languages, comprising: 
receiving text in a target language (e.g. figure 4: S460 “obtain a to-be-synthesized text”), an original audio stream (e.g. figure 4: S410 “obtain a to-be-synthesized audio file”), and at least one speaker feature vector corresponding to at least one speaker represented as speech in the original audio stream (e.g, paragraph 36: speech feature vectors and acoustic parameters may be obtained through an acoustic processor. Figure 4: perform analysis by using an acoustic processor S420), and wherein the original audio stream includes speech in a first language different from the target language (e.g. paragraph 47:  A cross-language output may be made between an original audio file and a generated speech),
synthesizing voices in the target language based on the original audio stream and the at least one speaker feature vector corresponding to at least one speaker represented as speech in the original audio stream that controllably sounds like the at least one speaker in the original audio. (e.g. figure 4: A TTS synthesizer performs synthesis, paragraph 46: universal speech feature vectors are obtained, and are used in a TTS model, so that the TTS model may adapt to an unknown speaker, and may even generate a speaker autonomously).
Liu et al do not further specify translated text. Kim et al teaches translated text. (e.g. figure 6: input text of second language 612, paragraph 67: The machine translator 820 included in the speech translation system 800 may convert or translate the input text of the first language into an input text of a second language and deliver the  input text of the second language to the speech synthesizer 830). It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Kim et al into the teaching of Liu et al to translate text or voice signal related to voices of respective speaker in target language to improve convenience for user (e.g. paragraph 4).
	Claim 21 is rejected for the same reasons as discussed in claim 1 above, wherein figure 1 of Liu et al shows processing unit 130. 
	For claims 20 and 22, Liu et al do not further disclose generating an emotion label based on based on the speech in the first language; and synthesizing the voices as output in the target language based further on the emotion label. Kim et al teach generating an emotion label based on based on the speech in the first language (e.g. paragraph 60:  the emotion feature 613 may represent at least one of joy, sadness, anger, fear, trust, disgust, surprise, and expectation.); and synthesizing the voices as output in the target language based further on the emotion label (e.g. figure 6, paragraph 60: the emotion feature 613 may be generated by extracting a feature vector from speech data. The speech synthesizer 620 may input the articulatory feature 611 of the speaker regarding the first language, the input text 612 of the second language, and the emotion feature 613 to the single artificial neural network text-to-speech synthesis model to generate the output speech data 630. ).  It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Kim et al into the teaching of Liu et al to translate text or voice signal related to voices of respective speaker in target language to improve convenience for user (e.g. paragraph 4).
	For claim 2, Liu et al do not further disclose a speech emotion classifier that is capable of receiving speech in the first language and generating an emotion label based on the speech; and wherein the trained TTS system is capable of receiving the emotion label and synthesizing the voices as output in the target language based further on the emotion label. Kim et al teach a speech emotion classifier that is capable of receiving speech in the first language and generating an emotion label based on the speech (e.g. paragraph 60: the emotion features 613 may represent at least one of joy, sadness, anger, fear, trust…); and wherein the trained TTS system is capable of receiving the emotion label and synthesizing the voices as output in the target language based further on the emotion label (e.g. figure 6, paragraph 60: the emotion feature 613 may be generated by extracting a feature vector from speech data. The speech synthesizer 620 may input the articulatory feature 611 of the speaker regarding the first language, the input text 612 of the second language, and the emotion feature 613 to the single artificial neural network text-to-speech synthesis model to generate the output speech data 630. ).  It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Kim et al into the teaching of Liu et al to translate text or voice signal related to voices of respective speaker in target language to improve convenience for user (e.g. paragraph 4).
	For claim 3, Liu et al do not further disclose a speech emotion classifier that is capable of receiving speech segments in the first language corresponding to one of at least two speakers represented in the original audio and generating an emotion vector based on the speech in the speech segment; and wherein the trained TTS system is further capable of synthesizing the voices as output in the target language based on the speaker represented in the speech segment, the speaker vector and the emotion vector for the segment. Kim et al teach a speech emotion classifier that is capable of receiving speech segments in the first language corresponding to one of at least two speakers represented in the original audio and generating an emotion vector based on the speech in the speech segment (e.g. paragraphs 41-42: . In the shown example, the single artificial neural network text-to-speech synthesis model may learn Korean data and English data together. The speech synthesizer 110 may receive an English text and an articulatory feature of a Korean speaker. For example, the English text may be “Hello?” and the articulatory feature of the Korean speaker may be a feature vector extracted from speech data uttered by the Korean speaker in Korean. The number of speaker does not make any patent difference since the function of TTS stays the same for different speakers); and wherein the trained TTS system is further capable of synthesizing the voices as output in the target language based on the speaker represented in the speech segment, the speaker vector and the emotion vector for the segment (e.g. figure 6, paragraph 60: the emotion feature 613 may be generated by extracting a feature vector from speech data. The speech synthesizer 620 may input the articulatory feature 611 of the speaker regarding the first language, the input text 612 of the second language, and the emotion feature 613 to the single artificial neural network text-to-speech synthesis model to generate the output speech data 630. ).  It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Kim et al into the teaching of Liu et al to translate text or voice signal related to voices of respective speaker in target language to improve convenience for user (e.g. paragraph 4).

Claims 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al and Kim et al, as applied to claims 1-3 and 19-22 above, and further in view of Holt et al(US 2018/0032871). 
For claims 4 and 5, Liu et al and Kim et al do not further disclose a signal processing system that receives the audio output of the trained TTS system and original audio and reproduces corresponding audio conditions present in the original audio stream together with the synthesized speech in the target language. Holt et al teach a signal processing system that receives the audio output of the trained TTS system and original audio and reproduces corresponding audio conditions present in the original audio stream together with the synthesized speech in the target language (e.g. figure 1, processor, paragraph 49: in addition to the textual data, the speech-to-text model can also output data that describes an inflection, a voice, or other characteristics associated with the utterance described by the original audio data.). It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Holt et al into the teaching of Liu et al and Kim et al to allow the text-to-speech model to use additional data in simulating the original speakers’ inflection, voice (e.g. paragraph 49, Holt et al) to improve convenience for user. 

Allowable Subject Matter
Claims 6-18 are allowed.

	Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAQUAN ZHAO whose telephone number is (571)270-1119. The examiner can normally be reached M-Thur: 7:00 am-5:00 pm. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thai Tran can be reached on 571-272-7382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Email: daquan.zhao1@uspto.gov.  
Phone: (571)270-1119





/DAQUAN ZHAO/Primary Examiner, Art Unit 2484

Read full office action

Prosecution Timeline

Oct 09, 2024

Application Filed

Dec 12, 2025

Non-Final Rejection mailed — §103, §112

May 12, 2026

Response Filed

Precedent Cases

Applications granted by this same examiner with similar technology

18/712,266

Patent 12633317

METHOD AND APPARATUS FOR SYNCHRONOUSLY PLAYING VIDEO, AND STORAGE MEDIUM AND ELECTRONIC DEVICE

1y 12m to grant Granted May 19, 2026

18/945,421

Patent 12633312

METHOD FOR GENERATING FLAME VIDEO

1y 6m to grant Granted May 19, 2026

18/420,440

Patent 12627774

SYSTEMS, DEVICES, AND RELATED METHODS FOR USING SCAN DATA TO SIMPLIFY LOSS PREVENTION ACTIVITIES

2y 3m to grant Granted May 12, 2026

18/725,274

Patent 12626513

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND NON-TRANSITORY STORAGE MEDIUM

1y 10m to grant Granted May 12, 2026

19/002,056

Patent 12620419

REMOTE TRANSMISSION CONTROLLABLE EXTERNAL OPTICAL DISC DRIVER DATA PROCESSING METHOD AND DEVICE

1y 4m to grant Granted May 05, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

77%

Grant Probability

92%

With Interview (+14.7%)

2y 9m (~1y 1m remaining)

Median Time to Grant

Low

PTA Risk

Based on 1035 resolved cases by this examiner. Grant probability derived from career allowance rate.