Last updated: April 19, 2026

Application No. 18/693,737

VIDEO COVER GENERATION METHOD, APPARATUS, ELECTRONIC DEVICE AND READABLE MEDIUM

Non-Final OA §102§103§112

Filed

Mar 20, 2024

Examiner

NGUYEN, KENNY

Art Unit

2171

Tech Center

2100 — Computer Architecture & Software

Assignee

BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.

OA Round

1 (Non-Final)

This examiner grants 49% of cases after interview

— +47.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 178 resolved cases, 2023–2026

Examiner Intelligence

NGUYEN, KENNY View full profile →

Grants 49% of resolved cases

Career Allow Rate

88 granted / 178 resolved

-5.6% vs TC avg

Strong +48% interview lift

Without

With

+47.6%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

32 currently pending

Career history

210

Total Applications

across all art units

Statute-Specific Performance

§101

6.7%

-33.3% vs TC avg

§103

51.6%

+11.6% vs TC avg

§102

18.2%

-21.8% vs TC avg

§112

19.1%

-20.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 178 resolved cases

Office Action

§102 §103 §112

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is made non-final.
Claims 1-13 and 15-21 are pending in the case. Claims 1, 15, and 16 are independent claims. Claim 14 has been canceled.

Priority
Acknowledgement is made of Applicant’s claim of foreign priority of Chinese application CN202111176742.6 filed 10/09/2021. The instant application is a 371 of PCT/CN2022/119224 filed 09/16/2022.

Claim Objections
Claims 7, 8, and 12 are objected to because of the following informalities:
Claim 7 introduces “the images” but should recite “images”.
Claim 8 introduces “the main frame” but should recite “a main frame”
Claim 12 introduces “the preset color ring type” but should recite “a preset color ring type”
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 4 and 11-13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

	Claim 4 recites “the corresponding area” which lacks antecedent basis and is indefinite. Without additional details that sufficiently define the “corresponding area”, the scope of a “corresponding area” of a given action sequence frame in relation to the claimed “region” in each action sequence frame is unclear. For the sake of compact prosecution, the Examiner has provided mapping for this claim to the furthest logical extent possible in light of the Specification. However, appropriate correction is required.

	Claim 11 recites “adding a description text” but the underlined element renders the claim indefinite because the claim is preceded by “a description text” and interpreting these description texts as distinct elements would be illogical. The Examiner recommends the following amendments to the claim:
“determining a hue, a saturation, and a brightness [[of]] for a description text”
“adding [[a]] the description text at a specified position in the single image according to the determined hue, saturation, and brightness 
Appropriate correction is required.
	Dependent claims 12 and 13 are also rejected due to inheriting the deficiencies of claim 11.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1-4, 6-10, 15-19, and 21 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Girgensohn et al. (US 7149974 B2).

Regarding claim 1, Girgensohn teaches a video cover generation method, comprising:
extracting at least two key frames in the video, wherein the key frame comprises feature information to be displayed in a cover (step 310 of FIG. 3 and Col. 6, lines 18-67: at least two key frames are extracted, the key frame comprises feature information, including at least one of an instance/foreground and a background);
according to an action relevance of the at least two key frames, fusing the feature information in the at least two key frames in a single image to generate a cover of the video, wherein the action relevance comprises being relevant or being irrelevant (FIG. 3 and Col. 6, line 18 to Col. 7, line 18: feature information across the at least two key frames is fused into a single image; FIG. 8 and Col. 8, line 30 to Col. 9, line 29: feature information is fused according to an action relevance of at least two key frames. For example, action relevance which is relevant involves motion of the object, in this case people during a meeting, throughout the keyframes that are part of a sequence of frames; Col. 3, lines 16-46: the single/composite image is used as a cover of the video).

Regarding claim 2, Girgensohn further teaches the method of claim 1, wherein extracting at least two key frames in a video comprises:
based on an action recognition algorithm, recognizing at least two action sequence frames in the video, and using each action sequence frame as the key frame;
wherein, the action relevance is being relevant (FIG. 8 and Col. 8, line 30 to Col. 9, line 29).

Regarding claim 3, Girgensohn further teaches the method of claim 2, wherein according to an action relevance of the at least two key frames, merging the feature information in the at least two key frames in a single image to generate a cover of the video comprises:
in a case that the action relevance is being relevant, performing instance segmentation on each action sequence frame to obtain feature information of each action sequence frame, wherein the feature information comprises an instance and a background (FIG. 3 and Col. 6, line 18 to Col. 7, line 18: instance segmentation is performed on each action sequence frame to obtain feature information, including an instance, like a foreground object, and a background);
generating a cover background based on backgrounds of at least two action sequence frames (FIG. 3 and Col. 6, line 18 to Col. 7, line 18; FIG. 8 and Col. 8, line 30 to Col. 9, line 29: a cover background is generated based on backgrounds of at least two action sequence frames);
fusing instances of at least two action sequence frames into the cover background to obtain the single image, and using the single image as the cover of the video (FIG. 3 and Col. 6, line 18 to Col. 7, line 18; FIG. 8 and Col. 8, line 30 to Col. 9, line 29: instances, such as objects in motion, are fused into the cover background to obtain the single image, used as the cover as supported in Col. 3, lines 16-46).

Regarding claim 4, Girgensohn further teaches the method of claim 3, wherein generating a cover background based on backgrounds of at least two action sequence frames comprises:
for each action sequence frame, removing a corresponding instance from each action sequence frame, and filling a region corresponding to the removed instance in each action sequence frame according to characteristic information of the corresponding area of a given action sequence frame, obtaining a filling result corresponding to each action sequence frame, wherein the given action sequence frame comprises an action sequence frame of the at least two action sequence frames different from a current action sequence frame;
generating the cover background based on the filling results of at least two action sequence frames (FIG. 3 and Col. 6, line 18 to Col. 7, line 18; FIG. 8 and Col. 8, line 30 to Col. 9, line 29: a corresponding instance, such as an object in motion represented by pixel areas deemed to have changed according to a certain threshold, are removed according to a given action sequence frame/base frame. A filling result corresponding to each action sequence frame is obtained, the filling result showing objects in motion in their corresponding color to indicate past or future movement with respect to the base frame which is an action sequence frame different from a current action sequence frame. The cover background is generated based on the filling results as seen in FIG. 8).

Regarding claim 6, Girgensohn further teaches the method of claim 3, wherein a degree of fusion of instances of the at least two action sequence frames with the cover background decreases sequentially in a chronological order (FIG. 8 and Col. 8, line 30 to Col. 9, line 29: “Each frame is compared to its adjacent frames and areas that changed between two subsequent frames are colored in the still image. The colorized differences are added to the base frame. In an embodiment, different colors are used to visualize time. For example, events in the past are colored red whereas events in the future are blue. The red varies from a shade of orange in the far past to a solid red in the recent past. Similarly, a light blue is used far in the future and a solid blue closer to the present.”).

Regarding claim 7, Girgensohn further teaches the method of claim 1, wherein extracting at least two key frames in a video comprises:
clustering the images in the video to obtain at least two categories;
extracting a key frame corresponding to each category based on an image quality evaluation algorithms;
wherein, an action relevance of the at least two key frames is being irrelevant (Col. 7, lines 47-56, FIG. 3 and Col. 6, line 18 to Col. 7, line 18; FIG. 8 and Col. 8, line 30 to Col. 9, line 29: images may be clustered to be either a base frame or not. The base frame is extracted as the main frame, while a key frame that is not a base frame is also extracted based on image quality evaluation algorithms that allow, for example, selection of a keyframe most similar or representative of other frames. In addition, an action relevance may be irrelevant when portions of a key frame may be irrelevant when certain pixels do not change by a certain threshold, thereby determining a corresponding particular object to not be in motion).

Regarding claim 8, Girgensohn further teaches the method of claim 7, wherein according to an action relevance of the at least two key frames, fusing the feature information in the at least two key frames in a single image to generate a cover of the video comprises:
in a case where the action relevance is being irrelevant, selecting a key frame as the main frame;
recognizing feature information in each key frame based on a target recognition algorithm, wherein the feature information comprise a foreground target;
fusing the foreground target in each of the at least two key frames except the main frame into the main frame to obtain the single image, and using the single image as the cover of the video (FIG. 3 and Col. 6, line 18 to Col. 7, line 18, and FIG. 8 and Col. 8, line 30 to Col. 9, line 29: the foreground target, including objects in motion, are recognized in each key frame based on a target recognition algorithm. The foreground target is fused in each of the least two key frames, except the base frame, into the base frame to obtain the single image, as seen in FIG. 8, to be used as the cover of the video as supported in Col. 3, lines 16-46).

Regarding claim 9, Girgensohn further teaches the method of claim 8, after obtaining the single image, further comprising:
performing a blur process on the background of the single image, wherein the blur process includes a fuzzy process or a feather process (Col. 8, lines 56-65: “In a further embodiment, the frames are slightly blurred before the comparison addition to filter out very small movements. This blur is only used in the comparison and not for the display of the center frame.”).

Regarding claim 10, Girgensohn teaches the method of claim 7, wherein according to an action relevance of the at least two key frames, fusing the feature information in the at least two key frames in a single image to generate a cover of the video comprises:
in a case where the action relevance is being irrelevant, extracting an image block containing the feature information in each key frame;
stitching all image blocks to obtain the single image (FIG. 3 and Col. 6, line 18 to Col. 7, line 18, and FIG. 8 and Col. 8, line 30 to Col. 9, line 29: image block extracted and image blocks from key frames are all stitched to obtain the single image, as seen in FIG. 8).

Regarding claims 15, 17-19, and 21, the claims recite an electronic device comprising: at least one processor; a storage device configured to store at least one program; when the at least one program is executed by the at least one processor, the at least one processor implements operations comprising processes corresponding to the method of claims 1-4 and 6, respectively, and are therefore rejected on the same premises.

Regarding claim 16, the claim recites a non-transitory computer readable medium storing a computer program, wherein the program, when executed by a processor, implements operations comprising processes corresponding to the method of claim 1 and is therefore rejected on the same premise.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 11-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Girgensohn et al. (US 7149974 B2), in view of Sull et al. (2006/0064716 A1), and in view of Lebrun et al. (US 2021/0035281 A1).

Regarding claim 11, Girgensohn teaches the method of claim 1.
Girgensohn does not explicitly teach after fusing the feature information in the at least two key frames in a single image, further comprising: determining a hue, a saturation, and a brightness of a description text based on a color value of the single image, wherein the color value is converted from the red, green, and blue RGB color mode to the hue saturation and brightness HSV color mode; adding a description text at a specified position in the single image according to the hue, saturation, and brightness of the description text
Sull teaches, after generating a single image,
determining a hue, a saturation, and a brightness of a description text based on a color value of the single image;
adding a description text at a specified position in the single image according to the hue, saturation, and brightness of the description text (FIG. 1B and [0297], FIG. 1C and [0298], FIGS. 7A-B and [0336-0341]: after a single image is generated via key frame generator 162, the image is processed via color analyzer 164 to determine a hue, saturation, and a brightness of a description text based on color value of the single image; For examples of description text added see FIGS. 5A-F and corresponding paragraphs).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Girgensohn by incorporating the teachings of Sull so as to include after fusing the feature information in the at least two key frames in a single image, further comprising determining a hue, a saturation, and a brightness of a description text based on a color value of the single image; adding a description text at a specified position in the single image according to the hue, saturation, and brightness of the description text. Doing so would create description text with visual characteristics that do not detract from or clash with the visual appearance of the single image. Furthermore, doing so could also ensure readability of the description text by accounting for the color of the single image.
Girgensohn in view of Sull does not explicitly teach wherein the color value is converted from the red, green, and blue RGB color mode to the hue saturation and brightness HSV color mode
Lebrun teaches wherein the color value is converted from the red, green, and blue RGB color mode to the hue saturation and brightness HSV color mode ([0044] and [0073-0083]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Girgensohn in view of Sull by incorporating the teachings of Lebrun so as to include wherein the color value is converted from the red, green, and blue RGB color mode to the hue saturation and brightness HSV color mode. Doing so would allow separation of color information from brightness information for increased accuracy of visually processing the single image. In this way, specific visual characteristics gleaned from HSV, as incorporated from Lebrun’s teachings, can create more visually apt description text, as set forth in Sull’s teachings.

Regarding claim 12, Girgensohn in view of Sull in view of Lebrun further teaches the method of claim 11, wherein determining a hue of description text based on a color value of the single image comprises:
determining multiple hue types of the single image and a proportion of each hue type based on a clustering algorithm;
using a hue type with a highest proportion as a main hue of the single image;
using a hue corresponding to a hue value closest to a hue value of the main hue within a designated area of the preset color ring type as a hue of the description text (Sull, FIG. 1B and [0297], FIG. 1C and [0298], FIGS. 7A-B and [0336-0341]: hue types are determined and proportion of each hue type is done via the image analyzer. A hue type with a highest proportion is the dominant hue. A hue closest to the hue value of the main hue is selected, with a slight difference in saturation) (Lebrun, [0044] and [0073-0083]) (See rationales provided for claim 11).

Regarding claim 13, Girgensohn in view of Sull in view of Lebrun further teaches the method of claim 11, wherein determining a saturation and a lightness of a description text based on a color value of the single image comprises:
determining a saturation of the description text based on an average saturation within a set range around the specified position;
determining a brightness of the description text based on an average brightness within a set range around the specified position (Sull, FIG. 1B and [0297], FIG. 1C and [0298], FIGS. 7A-B and [0336-0341]: based on an average saturation within a range of the specified position, the saturation of the description text may be increased or decreased. Brightness of the text is dictated by the brightness of the font at the specified position. For example, a dark font calls for increased brightness of the text by application of a bright outline) (Lebrun, [0044] and [0073-0083]) (See rationales provided for claim 11).

Allowable Subject Matter
Claims 5 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure, including:
US 8503523 B2: extracting key frames representative of video item
US 6052492 A: generating representative image based on determined distinct units within video sequence
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KENNY NGUYEN whose telephone number is (571)272-4980. The examiner can normally be reached M-Th 7AM to 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KIEU D VU can be reached at (571)272-4057. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KENNY NGUYEN/Primary Examiner, Art Unit 2171

Read full office action

Prosecution Timeline

Mar 20, 2024

Application Filed

Feb 21, 2026

Non-Final Rejection — §102, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/645,620

Patent 12602452

FILTERING OF DYNAMIC OBJECTS FROM VEHICLE GENERATED MAP

2y 5m to grant Granted Apr 14, 2026

16/642,613

Patent 12579481

Fluid Machine, Fluid Machine Managing Method and Fluid Machine Managing System

2y 5m to grant Granted Mar 17, 2026

17/991,714

Patent 12578202

NAVIGATION PROCESSING METHOD AND APPARATUS

2y 5m to grant Granted Mar 17, 2026

18/392,466

Patent 12578847

SECURE SCREEN RENDERING WITH ACCESSIBILITY DATA

2y 5m to grant Granted Mar 17, 2026

18/761,277

Patent 12579456

COGNITIVE PLATFORM FOR DERIVING EFFORT METRIC FOR OPTIMIZING COGNITIVE TREATMENT

2y 5m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

49%

Grant Probability

97%

With Interview (+47.6%)

3y 1m

Median Time to Grant

Low

PTA Risk

Based on 178 resolved cases by this examiner. Grant probability derived from career allow rate.

VIDEO COVER GENERATION METHOD, APPARATUS, ELECTRONIC DEVICE AND READABLE MEDIUM

This examiner grants 49% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email