Last updated: April 19, 2026

Application No. 18/529,707

METHOD, APPARATUS, DEVICE AND MEDIUM FOR MULTIMEDIA PROCESSING

Non-Final OA §102§103

Filed

Dec 05, 2023

Examiner

NGUYEN, NHAT HUY T

Art Unit

2147

Tech Center

2100 — Computer Architecture & Software

Assignee

BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.

OA Round

5 (Non-Final)

This examiner grants 54% of cases after interview

— +25.1% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 341 resolved cases, 2023–2026

Examiner Intelligence

NGUYEN, NHAT HUY T View full profile →

Grants 54% of resolved cases

Career Allow Rate

185 granted / 341 resolved

-0.7% vs TC avg

Strong +25% interview lift

Without

With

+25.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 5m

Avg Prosecution

59 currently pending

Career history

400

Total Applications

across all art units

Statute-Specific Performance

§101

11.0%

-29.0% vs TC avg

§103

54.7%

+14.7% vs TC avg

§102

16.9%

-23.1% vs TC avg

§112

10.7%

-29.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 341 resolved cases

Office Action

§102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of the Claims 
Claims 1-3, 6-8, 10 and 12-22 are pending for examination.
Claims 1, 17 and 20 are independent Claims.
Claims 1-3, 6-8, 10 and 12-22 are rejected under 35 U.S.C. §103.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 6-8 and 15-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Blinnikka (U.S. 2008/0126387 hereinafter Blinnikka) in view of Din (U.S. 6,754,631 hereinafter Din) in further view of Lim et al. (U.S. 2016/0021333 hereinafter Lim).
As Claim 1, Blinnikka teaches a method for multimedia processing, comprising: 
displaying first text content, wherein the first text content corresponds to a first multimedia content (Blinnikka (¶0038 line 1-6, fig. 3 item 306), text content is displayed in area 306. The text is corresponding to the media stream) 
playing second multimedia content in response to a triggering operation for the second multimedia content (Blinnikka (¶0037 line 4-8), user can play and watch the media stream), 
Blinnikka does not explicitly disclose:
and is obtained by automatic speech recognition of the first multimedia content; and
Din teaches:
and is obtained by automatic speech recognition of the first multimedia content (Din (col. 5 line 21-23 and col. 6 line 23-33), session is record. Transcript is generated for captured audio); and	
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify transcript of Blinnikka instead be an auto-transcript taught by Din, with a reasonable expectation of success. The motivation would be conveniently “recording meeting minutes based upon speech recognition” (Din (col. 7 line 41-42)).
	Blinnikka in view of Din does not explicitly disclose:
wherein the second multimedia content comprises a segment of the first multimedia content that is associated with at least one second text content, and the at least one second text content is extracted from the first text content, and 
wherein the second multimedia content is a summarized multimedia segment comprising at least two multimedia sub-segments, each multimedia sub-segment corresponds to a summary type selected from a plurality of summary types, and the at least one second text content comprises a text summary of at least one of the plurality of summary types; and 
wherein the method further comprises: 
during the playing of the second multimedia content, displaying an indicator of the at least one summary type corresponding to the text summary in association with a play timeline of the second multimedia content..
	Lim teaches:
wherein the second multimedia content comprises a segment of the first multimedia content that is associated with at least one second text content (Lim (¶0048 line 1-2), “A video summary is technology to convert a long archived video into a short video summary”), and the at least one second text content is extracted from (Lim (¶0048 line 1-3), “The video summarizer 130 generates a video summary script regarding each original video by using the metadata generated by the metadata generator 140”) the first text content (Lim (¶0044 line 1-4), “The image reproducer 110 may also provide the metadata generator 140 with the original videos that are input. The metadata generator 140 extracts metadata from the original videos”), and 
wherein the second multimedia content is a summarized multimedia segment comprising at least two multimedia sub-segments, each multimedia sub-segment corresponds to a summary type selected from a plurality of summary types (Lim (¶0073 line 1-7, fig. 8), “user selects a certain channel and may watch an original video and video summaries of certain time sections of the selected channel. For example, the user may watch video summaries of office hours 820, lunch time 830, and office leaving period 830 in a first channel 810 receiving an input video from a network surveillance camera that monitors entrances and exits of an office”), and the at least one second text content comprises a text summary of at least one of the plurality of summary types (Lim (¶0073 line 1-7), “For example, the user may watch video summaries of office hours 820, lunch time 830, and office leaving period 830”); and 
wherein the method further comprises: 
during the playing of the second multimedia content, displaying an indicator of the at least one summary type corresponding to the text summary in association with a play timeline of the second multimedia content (Lim (¶0073 line 1-7, fig. 8), “For example, the user may watch video summaries of office hours 820, lunch time 830, and office leaving period 830”).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify close caption of Blinnikka in view of Din instead be summarized videos taught by Lim, with a reasonable expectation of success. The motivation would be conveniently allow “the user selects a certain channel and may watch an original video and video summaries of certain time sections of the selected channel” (Lim (¶0073 line 1-3)).
	
As Claim 2, besides Claim 1, Blinnikka in view of Din in further view of Lim teaches wherein the at least one second text content comprises at least two consecutive text segments extracted from the first text content (Blinnikka (¶0043 line 1-8, ¶0052 line 1-4), text includes whole line of text or multiple words).  

As Claim 3, besides Claim 2, Blinnikka in view of Din in further view of Lim teaches wherein the playing second multimedia content in response to a triggering operation for the second content comprises: 
in response to the triggering operation for the second multimedia content, according to an order of associated time periods of the at least two consecutive text segments of the at least one target second content in the first multimedia content (Lim (¶0048 line 1-3), “The video summarizer 130 generates a video summary script regarding each original video by using the metadata generated by the metadata generator 140”), 
playing by jumping across multimedia segments in the first multimedia content, each of the multimedia segments corresponding to an associated time period of one of the consecutive text segments (Lim (¶0048 line 1-3), “The video summarizer 130 generates a video summary script regarding each original video by using the metadata generated by the metadata generator 140”).  

As Claim 6, besides Claim 1, Blinnikka in view of Din in further view of Lim teaches wherein the generating the second multimedia content based on the associated time period of the at least one second text content comprises: 
in response to that there are a plurality of associated time periods, generating the second multimedia content by joining a plurality of multimedia segments corresponding to the plurality of associated time periods according to an order of the plurality of associated time periods in the first multimedia content (Lim (¶0048 line 1-3), “The video summarizer 130 generates a video summary script regarding each original video by using the metadata generated by the metadata generator 140”).

As Claim 7, besides Claim 1, Blinnikka in view of Din in further view of Lim teaches wherein the generating the second multimedia content based on the associated time period of the at least one first text content comprises: 
adjusting the associated time period of the at least one second text content based on a sentence integrity of an associated text of the at least one second text content (Blinnikka (¶0029 line 1-10), user can overwrite the start /end point of the portion); and 
generating the second multimedia content based on the adjusted associated time period (Blinnikka (¶0029 line 1-10), user can overwrite the start /end point of the portion). 

As Claim 8, besides Claim 7, Blinnikka in view of Din in further view of Lim teaches  wherein the second text is a text corresponding to the at least one first text content in the initial text content (Blinnikka (¶0051 line 1-4, ¶0053 line 1-5), user selects a starting point and ending point of text segment within the text body).  

As Claim 15, besides Claim 1, Blinnikka in view of Din in further view of Lim teaches further comprising: 
receiving a download operation for the second multimedia content from a user, downloading and storing the second multimedia content (Blinnikka (¶0034 last 5 lines), display device 204 accesses content over a network).  

As Claim 16, besides Claim 1, Blinnikka in view of Din in further view of Lim teaches wherein the displaying first text content comprises: displaying the first text content in response to a triggering operation for a list page, wherein the list page displays abstract information of a plurality of multimedia contents (Blinnikka (¶0038 line 1-5), text displays multiple rows of text).  

As Claim 17, Blinnikka teaches an electronic device, comprising: 
a processor (Blinnikka (¶0031 line  4-5), processor and memory); and 
a memory configured to store instructions that are executable by the processor (Blinnikka (¶0031 line  4-5), processor and memory); 
the processor being configured to read the instructions from the memory and execute the instructions to implement a method for multimedia processing (Blinnikka (¶0031 line  4-5), processor and memory) comprising: 
The rest of the limitation(s) are rejected for the same reasons as Claim 1.

As Claim 18, the Claim is rejected for the same reasons as Claim 2.
As Claim 19, the Claim is rejected for the same reasons as Claim 3.
As Claim 20, the Claim is rejected for the same reasons as Claim 1.

As Claim 21, besides Claim 1, Blinnikka in view of Din in further view of Lim teaches determining the second multimedia content based on the at least one second text content (Lim (¶0048 line 1-2), “A video summary is technology to convert a long archived video into a short video summary”).  

As Claim 22, besides Claim 21, Blinnikka in view of Din in further view of Lim teaches wherein the determining the second multimedia content based on the at least one second text content comprises: 
generating the second multimedia content based on the associated time period of the at least one second text content, wherein the associated time period of the at least one second text content is used to characterize a time period of speech information corresponding to the at least one second text content in the second multimedia content (Lim (¶0048 line 1-2), “A video summary is technology to convert a long archived video into a short video summary”).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 10 and 12-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Blinnikka and Din in view of Lim in further view of Gilson (U.S. 2012/0059954 hereinafter Gilson).
As Claim 10, besides Claim 1, Blinnikka in view of Din in further view of Lim does not explicitly disclose: 
wherein the playing second multimedia content in response to a triggering operation for the second multimedia content comprises: 
determining a type of the summary corresponding to the triggering operation in response to the triggering operation for the second multimedia content; and 
obtaining a target multimedia sub-segment corresponding to the type of the summary and playing the target multimedia sub-segment; or 
obtaining the summarized multimedia segment and playing the summarized multimedia segment based on a time period of the type of the summary in the summarized multimedia segment.  
Gilson teaches wherein the playing an associated multimedia content in response to a triggering operation for the associated multimedia content comprises: 
determining a type of the summary corresponding to the triggering operation in response to the triggering operation for the second multimedia content; and (Gilson (¶0074 line 17-26), user selects a caption stream for display); and 
obtaining a target multimedia sub-segment corresponding to the type of the summary and playing the target multimedia sub-segment; or 
obtaining the summarized multimedia segment and playing the summarized multimedia segment based on a time period of the type of the summary in the summarized multimedia segment.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify caption content of Blinnikka in view of Din in further view of Lim instead be a multiple caption system taught by Gilson, with a reasonable expectation of success. The motivation would be to provide more convenient, usable and/or advanced captioning functionalities (Gilson (¶0002).

As Claim 12, besides Claim 1, Blinnikka in view of Din in further view of Lim does not explicitly disclose: 
wherein the displaying an identification of the type of the summary corresponding to the text summary in association on the play timeline of the second multimedia content comprises: 
displaying the identification of the type of the summary corresponding to the target text summary at an associated time point corresponding to the text summary on the play timeline of the second multimedia content
Gilson teaches wherein the displaying an identification of the type of the summary corresponding to the text summary in association on the play timeline of the second multimedia content comprises: 
displaying the identification of the type of the summary corresponding to the target text summary at an associated time point corresponding to the text summary on the play timeline of the second multimedia content (Gilson (¶0030 line 1-5, fig. 4A), caption type is indicated with playback speed. Caption is displayed in according to the scene).   
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify caption content of Blinnikka in view of Din in further view of Lim instead be a multiple caption system taught by Gilson, with a reasonable expectation of success. The motivation would be to provide more convenient, usable and/or advanced captioning functionalities (Gilson (¶0002).

As Claim 13, besides Claim 12, Blinnikka and Din in view of Lim in further view of Gilson teaches wherein the associated time point corresponding to the text summary is a time point in the associated time period of the text summary (Gilson (¶0030 line 1-5, fig. 4A), caption type is indicated with playback speed. Caption is displayed in according to the scene).  

As Claim 14, besides Claim 1, Blinnikka in view of Din in further view of Lim does not explicitly disclose:
further comprising: 
during the playing of second multimedia content, prominently displaying at least one second text content corresponding to a playing progress of the second multimedia content in sequence.  
Gilson teaches:
during the playing of second multimedia content, prominently displaying at least one second text content corresponding to a playing progress of the second multimedia content in sequence (Gilson (¶0030 line 1-5, fig. 4A), caption type is indicated with playback speed. Caption is displayed in according to the scene).  	
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify caption content of , Blinnikka in view of Din in further view of Bishop instead be a multiple caption system taught by Gilson, with a reasonable expectation of success. The motivation would be to provide more convenient, usable and/or advanced captioning functionalities (Gilson (¶0002).
Response to Arguments
Rejections under 35 U.S.C. §§ 102 and 103:
	As Claim 1, Applicants argue that cited references do not disclose “second multimedia content comprises a segment of the first multimedia content that is associated with at least one target text content” (third paragraph of page 9 in the remarks).

    PNG
    media_image1.png
    141
    641
    media_image1.png
    Greyscale

	Applicants’ arguments are moot because new reference Lim teaches the limitation(s).

As Claim 1, Applicants argue that cited references do not disclose “a summarized multimedia segment” (sixth paragraph of page 9 in the remarks).

    PNG
    media_image2.png
    139
    679
    media_image2.png
    Greyscale

Applicants’ arguments are moot because new reference Lim teaches the limitation(s).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NHAT HUY T NGUYEN whose telephone number is (571)270-7333. The examiner can normally be reached M-F: 12:00-8:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached on 571-270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/NHAT HUY T NGUYEN/Primary Examiner, Art Unit 2147

Read full office action

Prosecution Timeline

Dec 05, 2023

Application Filed

Feb 09, 2024

Non-Final Rejection — §102, §103

May 15, 2024

Response Filed

Jun 07, 2024

Final Rejection — §102, §103

Sep 12, 2024

Request for Continued Examination

Sep 18, 2024

Response after Non-Final Action

Mar 08, 2025

Non-Final Rejection — §102, §103

Jun 13, 2025

Response Filed

Jun 28, 2025

Final Rejection — §102, §103

Sep 02, 2025

Response after Non-Final Action

Oct 01, 2025

Request for Continued Examination

Oct 10, 2025

Response after Non-Final Action

Feb 21, 2026

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/423,234

Patent 12530116

MEDIA CAPTURE LOCK AFFORDANCE FOR GRAPHICAL USER INTERFACE

2y 5m to grant Granted Jan 20, 2026

18/071,452

Patent 12504866

AUTOMATED TAGGING OF CONTENT ITEMS

2y 5m to grant Granted Dec 23, 2025

17/185,611

Patent 12489720

INFERRING ASSISTANT ACTION(S) BASED ON AMBIENT SENSING BY ASSISTANT DEVICE(S)

2y 5m to grant Granted Dec 02, 2025

18/502,975

Patent 12463859

ENABLING AN OPERATOR TO RESOLVE AN ISSUE ASSOCIATED WITH A 5G WIRELESS TELECOMMUNICATION NETWORK USING AR GLASSES

2y 5m to grant Granted Nov 04, 2025

17/643,089

Patent 12443419

ADJUSTING EMPHASIS OF USER INTERFACE ELEMENTS BASED ON USER ATTRIBUTES

2y 5m to grant Granted Oct 14, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

5-6

Expected OA Rounds

54%

Grant Probability

79%

With Interview (+25.1%)

3y 5m

Median Time to Grant

High

PTA Risk

Based on 341 resolved cases by this examiner. Grant probability derived from career allow rate.

METHOD, APPARATUS, DEVICE AND MEDIUM FOR MULTIMEDIA PROCESSING

This examiner grants 54% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email