Prosecution Insights
Last updated: April 19, 2026
Application No. 18/568,755

VIDEO GENERATION METHOD AND DEVICE

Non-Final OA §103§112
Filed
Dec 08, 2023
Examiner
ZHAI, KYLE
Art Unit
2611
Tech Center
2600 — Communications
Assignee
BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.
OA Round
1 (Non-Final)
75%
Grant Probability
Favorable
1-2
OA Rounds
3y 0m
To Grant
93%
With Interview

Examiner Intelligence

Grants 75% — above average
75%
Career Allow Rate
353 granted / 473 resolved
+12.6% vs TC avg
Strong +19% interview lift
Without
With
+18.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
31 currently pending
Career history
504
Total Applications
across all art units

Statute-Specific Performance

§101
10.6%
-29.4% vs TC avg
§103
61.2%
+21.2% vs TC avg
§102
7.9%
-32.1% vs TC avg
§112
15.1%
-24.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 473 resolved cases

Office Action

§103 §112
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claim 4 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 4 recites the limitation "the scene type". There is insufficient antecedent basis for this limitation in the claim. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al. (DanceIt: Music-inspired Dancing Video Synthesis, Computer Vision and Pattern Recognition, 2020) in view of Zhong et al. (Generative adversarial networks with decoder–encoder output noises, Neural Networks, 2020). Regarding claim 1, Guo et al. (hereinafter Guo) discloses a video generation method (Guo, Fig. 6 illustrates pipeline of the proposed framework), including: acquiring a target audio (Guo, Fig. 1 illustrates given a piece of input audio); generating an image sequence according to characteristic information of the target audio (Guo, Fig. 1 illustrates generate a dancing sequence automatically by taking the human pose as an intermediary. The representative pose and generated dancing frames corresponding to the audio peaks) and an image generation model (Guo, Fig. 6 illustrates the imagination module synthesizes the final dancing videos from the processed pose sequences), wherein the image generation model is used for generating a corresponding image (Guo, Fig. 6 illustrates the imagination module synthesizes the final dancing videos from the processed pose sequences); and combining the target audio and the image sequence to generate a target video corresponding to the target audio (Guo, Fig. 6 illustrates generation Phase: the spatial alignment is used to re-predict dance movements between discontinuous pose fragments. The temporal algorithm aligns the beats of the music and pose sequence. The imagination module synthesizes the final dancing videos from the processed pose sequences); Guo does not expressly disclose “based on a randomly input vector”; Zhong et al. (hereinafter Zhong) discloses generating an image based on randomly input vector (Zhong, 1. Introduction, [0006], “The decoder-encoder structure can transform the noninformative Gaussian noises to informative ones. Because the decoder-encoder structure carries information of the real images, the output noise vectors could accelerate the training process of the adversarial networks and improve the quality of the generated images”. The Gaussian noise vectors are randomly input vectors). It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to generate Guo’s videos by applying Zhong’s noise vector as an input to Guo’s video generation process. The motivation for doing so would have been improving the quality of the generated images. Regarding claim 9, Guo discloses generating the image sequence according to the characteristic information of the target audio and a generator in the image generation model (Guo, Fig. 1 illustrates generate a dancing sequence automatically by taking the human pose as an intermediary. The representative pose and generated dancing frames corresponding to the audio peaks. In addition, Fig. 6 illustrates the imagination module synthesizes the final dancing videos from the processed pose sequences); Guo as modified by Zhong with the same motivation from claim 1 discloses a generative adversarial model (Zhong, 3. Generative adversarial networks with decoder-encoder output noises). Claims 2 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al. in view of Zhong et al., as applied to claims 1 and 15, in further view of Liem et al. (When music makes a scene Characterizing music in multimedia contexts via user scene descriptions, Int J Multimed Info Retr, 2013) in view of Qiu et al. (Image generation associated with music data, CVPR 2018 Sight and Sound Workshop). Regarding claim 2, Guo teaches the target audio (Guo, Fig. 1); the generating of an image sequence according to characteristic information of the target audio and an image generation model (Guo, Fig. 1); Guo as modified by Zhong with the same motivation from claim 1 discloses a randomly input vector (Zhong, 1. Introduction, [0006], “The decoder-encoder structure can transform the noninformative Gaussian noises to informative ones. Because the decoder-encoder structure carries information of the real images, the output noise vectors could accelerate the training process of the adversarial networks and improve the quality of the generated images”. The Gaussian noise vectors are randomly input vectors); Liem et al. (hereinafter Leim) discloses determining a target scene type to which an audio belongs (Liem, 6.1 Many descriptions of the same music fragment, [0001], “the described scene outdoors, and from the 44 respondents who went as far as indicating a geographic location for the scene, frequently mentioned regions are Europe… To the worker respondents, the fragment strongly evokes dancing/party scenes…To the worker respondents, it evokes mysterious, unknown and sometimes unpleasant situations. The most frequently mentioned actor category is that of adventurers”); It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Guo’s video synthesis process to incorporate Liem’s data for determining a scene type corresponding to a target audio. The motivation for doing so would have been improving audio-visual consistency. Guo as modified by Zhong and Liem does not expressly disclose “determining an image generation model corresponding to the target scene type”; Qiu et al. (hereinafter Qiu) discloses determining an image generation model corresponding to an audio feature (Qiu, 3.4. Image generation from music data, [0001], “we can obtain music features that can correspond to images through our trained CNN-LSTM model. We then fuse those extracted music features into DCGAN to generate images”); generating an image corresponding to the target type (Qiu, Fig. 8). It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use Qiu’s generating images based on extracted audio features to generate Guo as modified by Zhong and Liem’s images used in the video synthesis process. The motivation for doing so would have been improving semantic consistency between audio content and generated video output. Regarding claim 21, claim 21 recites functions that are similar in scope to the method steps recited in claim 2 and therefore are rejected under the same rationale. Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Guo et al. in view of Zhong et al. in view of Jin et al. (US 2021/0398336). Regarding claim 15, Guo teaches video generation (Guo, Fig. 1); Guo as modified by Zhong does not expressly disclose “a video generation device”; Jin et al. (hereinafter Jin) discloses a video generation device (Jin, [0091], “a user may generate and acquire fusion images via a user apparatus 300 such as a personal computer”); at least one processor and a memory (Jin, Fig. 19), the memory storing computer-executable instructions, wherein the computer-executable instructions upon execution by the at least one processor cause the at least one processor to implement operations (Jin, [0097], “a CPU (Central Processing Unit) or the like may implement them through information processing of the software items…and functions of at least a portion thereof are stored in a storage medium and may be loaded into a computer for execution”). It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to perform Guo’s video synthesis process using Jin’s video generation device. The motivation for doing so would have been enabling practical implementation of the video synthesis process. The remaining limitations recite in claim 15 are similar in scope to the method recited in claim 1 and therefore are rejected under the same rationale. Allowable Subject Matter Claims 3-8 and 22-25 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to KYLE ZHAI whose telephone number is (571)270-3740. The examiner can normally be reached 9AM-5PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ke Xiao can be reached at (571) 272 - 7776. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /KYLE ZHAI/Primary Examiner, Art Unit 2611
Read full office action

Prosecution Timeline

Dec 08, 2023
Application Filed
Jan 16, 2026
Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602879
METHOD AND DEVICE FOR PROVIDING SURGICAL GUIDE USING AUGMENTED REALITY
2y 5m to grant Granted Apr 14, 2026
Patent 12594123
VIRTUAL REALITY SYSTEM WITH CUSTOMIZABLE OPERATION ROOM
2y 5m to grant Granted Apr 07, 2026
Patent 12590811
METHOD, APPARATUS, AND PROGRAM FOR PROVIDING IMAGE-BASED DRIVING ASSISTANCE GUIDANCE IN WEARABLE HELMET
2y 5m to grant Granted Mar 31, 2026
Patent 12573162
MODELLING METHOD FOR MAKING A VIRTUAL MODEL OF A USER'S HEAD
2y 5m to grant Granted Mar 10, 2026
Patent 12566580
HOLOGRAPHIC PROJECTION SYSTEM, METHOD FOR PROCESSING HOLOGRAPHIC PROJECTION IMAGE, AND RELATED APPARATUS
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
75%
Grant Probability
93%
With Interview (+18.6%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 473 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month