Last updated: April 19, 2026

Application No. 19/033,358

Multimedia Data Generating Method, Apparatus, Electronic Device, Medium, and Program Product

Non-Final OA §103§DP

Filed

Jan 21, 2025

Examiner

ZHAO, DAQUAN

Art Unit

2484

Tech Center

2400 — Computer Networks

Assignee

BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.

OA Round

1 (Non-Final)

Interview Optional

— +14.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 1029 resolved cases, 2023–2026

Examiner Intelligence

ZHAO, DAQUAN View full profile →

Grants 77% — above average

Career Allow Rate

791 granted / 1029 resolved

+18.9% vs TC avg

Moderate +15% lift

Without

With

+14.8%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

24 currently pending

Career history

1053

Total Applications

across all art units

Statute-Specific Performance

§101

11.0%

-29.0% vs TC avg

§103

44.9%

+4.9% vs TC avg

§102

20.3%

-19.7% vs TC avg

§112

14.0%

-26.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1029 resolved cases

Office Action

§103 §DP

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 6-10, 13, 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Tokunaka et al (2008/0002949), and further in view of Rodriguez et al (US 8,744,239). 
For claim 1, Tokunaka et al teach a multimedia data generating method, comprising: 
receiving text information inputted by a user (e.g. paragraph 111: “A user uses the mouse and the keyboard provided as the input unit 25 to input the text data and the time limit to the screen.”); 
displaying, the text information (figures 5-6) and acquiring a first reading speech of the text information (e.g. paragraph 120: “…an operation for supporting recording of narration sound is performed on the basis of the read-out data…”, also see paragraphs 126-127, figures 5-6 are recording screens to record audio (or speech) of user reading the text data. So recording operation must be tricker for the screen to be displayed); and
generating a first multimedia data based on the text information and the first reading speech and displaying the first multimedia data (e.g. figure 20, paragraphs 338-339); 
wherein the first multimedia data comprise the first reading speech and a video image matched with the text information (e.g. figure 20, abstract: “…audio data based on an input sound to be recorded in a recording medium”), the first multimedia data comprise a plurality of first multimedia segments, the plurality of first multimedia segments corresponding to a plurality of text segments included in the text information, respectively (e.g. figure 20, clip A, clip B, clip c& clip D); wherein a first target multimedia segment comprises a first target video segment and a first target speech segment, the first target multimedia segment referring to a first multimedia segment in the plurality of first multimedia segments corresponding to a first target text segment in the plurality of text segments, the first target video segment including a video image matched with the first target text segment, the first target speech segment including a reading speech of the first target text segment (e.g. figures 10A-10B, take 1 has corresponding text data and user’s narration, also see paragraph 9: “When news contents are produced to fit in a predetermined time frame set in advance as described above, a video and a narration sound forming the news contents also need to be generated to fit in the time frame. In other words, concerning the video, it is necessary to edit photographed videos (material videos) to fit in a predetermined time. Concerning the narration sound, it is necessary to read out a script within the time.”).
Tokunaka et al do not further specify displaying, in response to a recording trigger operation for the text information, the text information; the first target video segment and the first target speech segment is displayed in editing tracks. 
Rodriguez et al teach displaying, in response to a recording trigger operation for the text information, the text information (e.g. column 7, lines 50-58); the first target video segment and the first target speech segment is displayed in editing tracks (e.g. figure 1, video and text are displayed on the same screen. Column 4, line 62-column 5, line 5: The composite display area 130 includes multiple tracks that span a timeline 160, and displays one or more graphical representations of media clips in the composite presentation. As shown, the composite display area 130 displays a music clip representation 165 and a video clip representation 170.). 
 It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Rodriguez et al into the teaching of Tokunaka et al to correlate the scripted words with the voice of the narrator to reduce the number of take the narrator has to performs (e.g. column 1, lines 33-46) to improve convenience for the narrator.
claim 10 is rejected for the same reasons as discussed in claim 1 above, wherein figure 2 shows CPU 5. 
Claim 19 is rejected for the same reasons as discussed in claim 10 above. 
For claim 7, Tokunaka et al teach prominently displaying a text segment currently read by the user while acquiring the first reading speech (e.g. figure 6, paragraph 132, Bar IB). 
Claim 16 is rejected for the same reasons as discussed in claim 7 above. 
For claims 9 and 18, Tokunaka et al teach wherein the generating a first multimedia data based on the text information and the first reading speech and displaying the first multimedia data comprises: generating the first multimedia data based on the text information and the fourth reading speech, and displaying the first multimedia data (e.g. figure 20). Tokunaka et al do not further disclose subjecting, after the first reading speech is acquired, the first reading speech to voice change processing and/or speed change processing to obtain a fourth reading speech. Rodriguez et al teach subjecting, after the first reading speech is acquired, the first reading speech to voice change processing and/or speed change processing to obtain a fourth reading speech (e.g. column 7, lines 20-30). It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Rodriguez et al into the teaching of Tokunaka et al to correlate the scripted words with the voice of the narrator to reduce the number of take the narrator has to performs (e.g. column 1, lines 33-46) to improve convenience for the narrator. 
For claims 4 and 13, Tokunaka  et al teach displaying the first target text segment corresponding to the first target speech segment, and acquiring a read segment of the first target text segment; and displaying the read segment in an area corresponding to the first target speech segment (e.g. figure 20 of Tokunaka et al. also see figure 1 of Rodriguez et al). Tokunaka et al do not further disclose deleting, in response to a rerecording operation on the first target speech segment, the first target speech segment. Rodriguez et al teach deleting, in response to a rerecording operation on the first target speech segment, the first target speech segment (.g. column 28, lines 6-16: “text is delete”). It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Rodriguez et al into the teaching of Tokunaka et al to delete speech segment to improve storage efficiency. 
For claims 6 and 15, Tokunaka et al do not teach moving, in response to a speech segment sliding operation and sliding of a first cursor pointing to the first reading speech till the first target speech segment, a second cursor pointing to the text information till the first target text segment. Rodriguez et al teach moving, in response to a speech segment sliding operation and sliding of a first cursor pointing to the first reading speech till the first target speech segment, a second cursor pointing to the text information till the first target text segment (e.g. figure 1, column 7, lines 14-19). It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Rodriguez et al into the teaching of Tokunaka et al to correlate the scripted words with the voice of the narrator to reduce the number of take the narrator has to performs (e.g. column 1, lines 33-46) to improve convenience for the narrator.
For claims 8 and 17, Tokunaka et al teach  generating a fourth multimedia data based on the target text information and the third reading speech, and displaying the fourth multimedia data; wherein the fourth multimedia data comprise the third reading speech and the video image matched with the text information; the fourth multimedia data comprise a plurality of fourth multimedia segments, the plurality of fourth multimedia segments corresponding to a plurality of text segments included in the text information, respectively; a fourth target multimedia segment comprises a fourth target video segment and a fourth target speech segment, the fourth target multimedia segment referring to a fourth multimedia segment in the plurality of fourth multimedia segments corresponding to a fourth target text segment in the plurality of text segments, the fourth target video segment including the video image matched with the fourth target text segment, the fourth target speech segment including the reading speech of the fourth target text segment (e.g. figure 20, or see discussion in claim 1 above). 
Tokunaka et al do not teach editing, after the first multimedia data are generated and displayed, the text information in response to an edit operation for the text information to obtain a modified target text information; displaying, in response to a recording trigger operation for the target text information, the target text information and acquiring a target reading speech of the target text information; updating the first reading speech based on the target reading speech to obtain a third reading speech; 
Rodriguez et al teach editing, after the first multimedia data are generated and displayed, the text information in response to an edit operation for the text information to obtain a modified target text information; displaying, in response to a recording trigger operation for the target text information, the target text information and acquiring a target reading speech of the target text information; updating the first reading speech based on the target reading speech to obtain a third reading speech (e.g. column 8, lines 46-52: “…user can select media clips to add…”).  It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Rodriguez et al into the teaching of Tokunaka et al to correlate the scripted words with the voice of the narrator to reduce the number of take the narrator has to performs (e.g. column 1, lines 33-46) to improve convenience for the narrator.
Claims 2-3,11-12 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Tokunaka et al and Rodriguez et al, as applied to claims 1, 4, 6-10, 13, 15-19 above, and further in view of Narayanan (US 2023/0122824). 
For claims 2,  11 and 20, Tokunaka et al teach generating second multimedia data based on the text information and the speech data, and displaying the second multimedia datawherein the second multimedia data comprise the speech data and the video image matched with the text information; the second multimedia data comprise a plurality of second multimedia segments, the plurality of second multimedia segments corresponding to a plurality of text segments included in the text information. respectively; a second target multimedia segment comprises a second target video segment and a second target speech segment, the second target multimedia segment referring to a second multimedia segment in the plurality of second multimedia segments corresponding to a second target text segment in the plurality of text segments, the second target video segment including a video image matched with the second target text segment, the second target speech segment including a reading speech of the second target text segment (e.g. see figure 20). Tokunaka et al and Rodriguez et al do not further disclose converting the text information into speech data in response to a multimedia synthesis operation. Narayanan teaches converting the text information into speech data in response to a multimedia synthesis operation (e.g. abstract, figure 1). It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Narayanan into the teaching of Tokunaka et al and Rodriguez et al to correlate the scripted words with the voice of the narrator to reduce the number of take the narrator has to performs to improve convenience for the narrator.
For claims 3 and 12, Tokunaka et al teach acquiring a second reading speech of the text information; and generating a third multimedia data based on the text information and the second reading speech, and displaying the third multimedia data to overwrite the second multimedia data; wherein the third multimedia data comprise the second reading speech and the video image matched with the text information; the third multimedia data comprise a plurality of third multimedia segments, the plurality of third multimedia segments corresponding to a plurality of text segments included in the text information, respectively; a third target multimedia segment includes a third target video segment and a third target speech segment, the third target multimedia segment referring to a third multimedia segment in the plurality of third multimedia segments corresponding to a third target segment in the plurality of text segments, the third target video segment including a video image matched with the third target text segment, and the third target speech segment including a reading speech of the third target text segment. (e.g. figure 20 or see discussion in claim 1 above). 
Tokunaka et al do not further specify displaying, in response to a recording trigger operation for the text information, the text information. Rodriguez et al teach displaying, in response to a recording trigger operation for the text information, the text information (e.g. column 7, lines 50-58). It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Rodriguez et al into the teaching of Tokunaka et al to correlate the scripted words with the voice of the narrator to reduce the number of take the narrator has to performs (e.g. column 1, lines 33-46) to improve convenience for the narrator.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-4, 6-13, and 15-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-18 of U.S. Patent No. 12,206,955 B2 in view of Rodriquez et al (US 8,744,239). 
Claims 1-4, 6-13, and 15-20 of the instant application corresponds to claims 1-18 of the Patent, respectively. Independent claims 1, 9 and 17 of the Patent do not further specify the first target video segment and the first target speech segment is displayed in editing tracks. 
Rodriguez et al teach the first target video segment and the first target speech segment is displayed in editing tracks (e.g. figure 1, video and text are displayed on the same screen. Column 4, line 62-column 5, line 5: The composite display area 130 includes multiple tracks that span a timeline 160, and displays one or more graphical representations of media clips in the composite presentation. As shown, the composite display area 130 displays a music clip representation 165 and a video clip representation 170.). 
 It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Rodriguez et al into claims 1-18 of the Patent to correlate the scripted words with the voice of the narrator to reduce the number of take the narrator has to performs (e.g. column 1, lines 33-46) to improve convenience for the narrator.

Allowable Subject Matter
Claims 5 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

	Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAQUAN ZHAO whose telephone number is (571)270-1119. The examiner can normally be reached M-Thur: 7:00 am-5:00 pm. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thai Tran can be reached on 571-272-7382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Email: daquan.zhao1@uspto.gov.  
Phone: (571)270-1119





/DAQUAN ZHAO/Primary Examiner, Art Unit 2484

Read full office action

Prosecution Timeline

Jan 21, 2025

Application Filed

Feb 26, 2026

Non-Final Rejection — §103, §DP (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/558,629

Patent 12597257

MONITORING SYSTEM AND METHOD FOR RECOGNIZING THE ACTIVITY OF DETERMINED PERSONS

2y 5m to grant Granted Apr 07, 2026

18/624,514

Patent 12593108

SYSTEMS AND METHODS FOR AUTOMATED SPEECH-TO-TEXT CAPTIONING

2y 5m to grant Granted Mar 31, 2026

18/241,849

Patent 12587609

ELECTRONIC DEVICE AND CONTROL METHOD FOR CONTROLLING SPEED OF WORKOUT VIDEO

2y 5m to grant Granted Mar 24, 2026

18/570,906

Patent 12587721

VIDEO PROCESSING METHOD, APPARATUS AND SYSTEM

2y 5m to grant Granted Mar 24, 2026

18/573,097

Patent 12586610

METHOD, APPARATUS, DEVICE, STORAGE MEDIUM AND PROGRAM PRODUCT FOR VIDEO GENERATION

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

77%

Grant Probability

92%

With Interview (+14.8%)

2y 9m

Median Time to Grant

Low

PTA Risk

Based on 1029 resolved cases by this examiner. Grant probability derived from career allow rate.

Multimedia Data Generating Method, Apparatus, Electronic Device, Medium, and Program Product

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email