Last updated: April 19, 2026
Application No. 18/913,939
METHOD, DEVICE, APPARATUS AND STORAGE MEDIUM FOR VIDEO PRODUCTION

Non-Final OA §103
Filed
Oct 11, 2024
Examiner
TRAN, LOI H
Art Unit
2484
Tech Center
2400 — Computer Networks
Assignee
BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.
OA Round
1 (Non-Final)
This examiner grants 64% of cases after interview

— +23.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 611 resolved cases, 2023–2026
Examiner Intelligence

TRAN, LOI H View full profile →
Grants 64% of resolved cases
Career Allow Rate
394 granted / 611 resolved
+6.5% vs TC avg
Strong +24% interview lift
Without
With
+23.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
25 currently pending
Career history
636
Total Applications
across all art units
Statute-Specific Performance

§101
6.3%
-33.7% vs TC avg
§103
54.9%
+14.9% vs TC avg
§102
14.8%
-25.2% vs TC avg
§112
12.5%
-27.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 611 resolved cases
Office Action

§103
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1,148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 
1. Determining the scope and contents of the prior art. 
2. Ascertaining the differences between the prior art and the claims at issue. 
3. Resolving the level of ordinary skill in the pertinent art. 
4. Considering objective evidence present in the application indicating     
    obviousness or nonobviousness.

Claims 1-3 and 17-20 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Li (English Translation of Chinese Publication CN 111787395 10-2020) in view of Zhang et al. (US Publication 2022/0415071) and further in view of Li et al. (US Publication 2025/0310594, hereinafter LiLija). 
Regarding claim 1, Li discloses a method for producing a video, comprising:
presenting a text setup interface in a displayed video producing window, the text setup interface comprising: an information edit area, a first information display area (Li, para’s 0009, 0081, receiving user input of original raw data for video production, the raw data including at least one details page link; receiving a detail page link as a user input implicitly indicates presenting a text setup interface that includes a text edit area “an information edit area”; para’s 0111-0118, after obtaining the product details page, image recognition technology can be used to decompose the content of the original details page, performing text recognition and extraction, and image recognition; based on the text recognition results, key content is extracted and presented as text data for generating the video; presenting/displaying key content as text data implicitly indicates providing a first information display area);  
receiving a first trigger operation, wherein the first trigger operation triggers a first trigger control comprised [in the information edit area] after object association information of a video production object is inputted to an information edit box included in the information edit area (Li, para’s 0010, 0013, obtaining, via an implied trigger operation control, materials and supplemental data for generating the video based on the raw data “detail page link” being received in an implied area “edit box” of the display interface, the materials including at least one of image materials and video material, and supplemental data including text data, video data, and image data; para. 0098, a details page can be understood as a page that contains detailed information about a certain object. The objects can be set according to specific application scenarios, such as product detail pages, news detail pages, etc.);
displaying object attribute information corresponding to the video production object in the first information display area, the object attribute information being determined based on the object association information (Li, para. 0098, a details page can be understood as a page that contains detailed information about a certain object “object attribute information”. The objects can be set according to specific application scenarios, such as product detail pages, news detail pages; para’s 0114-0118, after obtaining the product details page, image recognition technology can be used to decompose the content of the original details page, performing text recognition and extraction, and image recognition; based on the text recognition results, obtaining key content for generating the video; presenting/displaying key content on a display area “the first information display area”. It is noted that receiving trigger operation after entering object information on an information presentation display area and displaying object attribute corresponding to the object information in an area “first information display area” is also well known in the art as is cited below and disclosed by  Li et al., US Publication 2025/0310594, para’s 0007-0009, displaying an information presentation page of a first object; receiving an operation for entering a video playback page triggered on the information presentation page; and in response to the operation for entering the video playback page, displaying the video playback page, and presenting attribute information of an set corresponding to the video playback page on the video playback page); 
Li does not explicitly disclose:
wherein the first trigger operation triggers a first trigger control comprised in the information edit area;
the text setup interface comprising a second information display area; receiving a second trigger operation, wherein the second trigger operation triggers a second trigger control comprised in the first information display area; and displaying, in the second information display area, video related text corresponding to the video production object, the video related text being generated based on the object attribute information
Zhang discloses:
the text setup interface comprising a second information display area; receiving a second trigger operation, wherein the second trigger operation triggers a second trigger control; and obtaining video related text corresponding to the video production object, the video related text being generated based on the object attribute information (Zhang et al., US 2022/0415071, para’s 0146-0149, performing text recognition on the object to be recognized based on the pre-trained text recognition mode to obtain text content corresponding to the object to be recognized includes the following steps: performing feature-extraction processing on the text to be recognized, to obtain semantic features of the text to be recognized; performing, by adopting the text recognition model, text recognition on the text to be recognized according to the semantic features of the text to be recognized, to obtain text content corresponding to the text to be recognized; para. 0138, the tool for loading the object to be recognized can be an interface for connecting with an external device, or a display apparatus, for example, a text recognition apparatus can input an interface for loading the object to be recognized on the display apparatus, and the user can import the object to be recognized into the text recognition apparatus through the interface; para. 0223, functions/operations specified in the flowcharts and/or block diagrams can be implemented via an implied trigger operation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Zhang’s features into Li’s invention for enhancing user’s video generating experience by using text data corresponding to video object to generate video.
Li-Zhang discloses obtaining video related text corresponding to the video production object, but does not explicitly disclose wherein the first trigger operation triggers a first trigger control comprised in the information edit area; the text setup interface comprising a second information display area, wherein the second trigger operation triggers a second trigger control comprised in the first information display area; and displaying, in the second information display area, video related text corresponding to the video production object.
LiLija discloses:
wherein the first trigger operation triggers a first trigger control comprised in the information edit area; the text setup interface comprising a second information display area, wherein the second trigger operation triggers a second trigger control comprised in the first information display area; and displaying, in the second information display area, video related text corresponding to the video production object (LiLija, para. 0058-0059, the information presentation page of the first object may include at least one interactive control; and the user may trigger a corresponding interactive operation by using the interactive control; receiving an operation for entering a video playback page triggered on the information presentation page; para. 0071, the object attribute information presentation interface may include at least one region; since each of the at least one region can be used for presenting attribute information of the object; it is obvious that one region of the interface can also be used to display video related text corresponding to the video production object).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate LiLija’s features into Li-Zhang’s invention for enhancing user’s video editing experience by providing convenience for user to initiate an operation. 
Regarding claim 2, Li-Zhang-LiLija discloses the method of claim 1, wherein the object association information is access link information of the video production object, and a step of determining the object attribute information based on the object association information comprises: accessing an information page associated with the video production object by the access link information, the information page including object service information of the video production object; and extracting and parsing the object service information from the information page to obtain the object attribute information of the video production object (Li, para’s 0009, 0081, receiving user input of original raw data for video production, the raw data including at least one details page link; Li, para. 0098, a details page can be understood as a page that contains detailed information about a certain object “object attribute information”. The objects can be set according to specific application scenarios, such as product detail pages, news detail pages).
Regarding claim 3, Li-Zhang-LiLija discloses the method of claim 1, wherein a step of generating the video related text based on the object attribute information comprises: using the object attribute information as input data, or using the object attribute information and object context information extracted relative to the video production object from a data vector library as input data; inputting the input data into a trained text generation model, and determining output text description information as the video related text; and the video related text comprising video title information and text content to be displayed in the video (Zhang, para’s 0146-0149, performing text recognition on the object to be recognized based on the pre-trained text recognition mode to obtain text content corresponding to the object to be recognized includes the following steps: performing feature-extraction processing on the text to be recognized, to obtain semantic features of the text to be recognized; performing, by adopting the text recognition model, text recognition on the text to be recognized according to the semantic features of the text to be recognized, to obtain text content corresponding to the text to be recognized; para. 0138, the tool for loading the object to be recognized can be an interface for connecting with an external device, or a display apparatus, for example, a text recognition apparatus can input an interface for loading the object to be recognized on the display apparatus, and the user can import the object to be recognized into the text recognition apparatus through the interface; para. 0223, functions/operations specified in the flowcharts and/or block diagrams can be implemented via an implied trigger operation).
The motivation to combine the references and obviousness arguments are the same as claim 1.
Claims 17-20 are rejected the same reasons set forth in claim 1-3. Li-Zhang-LiLija further discloses processor(s), memory module(s), and computer readable medium (see Li, para’s 0076-0080 and 0219).


Claims 4-5 and 8-11 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Li-Zhao-LiLija, as applied to claim 1 above, in view of Jiang et al. (US Publication 2024/0399235) and Jin et al. (US Publication 2024/0290016).
Regarding claim 4, Li-Zhang-LiLija discloses the method of claim 1, comprising the text setup interface and a material uploading interface, and uploading material, the material file being associated with the video production object, as disclosed by Li above (see Li, para. 0096, uploading material via an implied interface).
Li-Zhang-LiLija does not explicitly disclose but Jiang discloses in response to a trigger operation on a NEXT execution control in the text setup interface, switching the text setup interface to a material uploading interface (Jiang, fig. 6(a), 6(b), and 12(a), providing NEXT control and BACK control for switching from one interface to another interface).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Jiang’s features into Li-Zhang-LiLija’s invention for enhancing user’s video editing experience by providing convenience for user to switch between interfaces.
Li-Zhang-LiLija-Jiang does not explicitly disclose but Jin discloses displaying a material thumbnail of an uploaded material file in the material uploading interface (Jin, para. 0077, the material display page may display materials as thumbnails).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Jin’s features into Li-Zhang-LiLija-Jiang’s invention for enhancing user’s video editing experience by displaying video material in the form of thumbnails. 
Regarding claim 5, Li-Zhang-LiLija-Jiang-Jin discloses the method of claim 4, further comprising: in response to a trigger operation on a BACK execution control in the material uploading interface, switching from the material uploading interface to the text setup interface to display; and in response to a trigger operation on a NEXT execution control in the material uploading interface, switching the material uploading interface to a video preview interface, and displaying a generated video file in the video preview interface, wherein the video file is generated based on the uploaded material file and is associated with the video production object (Li, para’s 0163-0165, displaying generated video on a preview interface; Jiang, fig’s 7(a), 7(b), and 12(a), providing NEXT control and BACK control for switching from one interface to another interface, i.e., a video preview interface).
The motivation to combine the references and obviousness arguments are the same as claim 4.

Regarding claim 8, Li-Zhang-LiLija-Jiang-Jin discloses the method of claim 5, wherein the video preview interface comprises a display type selection box, the display type selection box comprising: an all video display item, a first screening condition display item, a second screening condition display item, and a conventional video display item; and the displaying a generated video file in the video preview interface comprises: displaying all generated video files in the video preview interface by using the all video display item as a default display type, presenting a first screening result label on a first video file satisfying the first screening condition, and presenting a second screening result label on a second video file satisfying the second screening condition; in response to a trigger operation on the first screening condition display item, determining the first video file satisfying the first screening condition from all of the generated video files, and displaying the first video file in the video preview interface in a form of carrying the first screening result label; in response to a trigger operation on the second screening condition display item, determining the second video file satisfying the second screening condition from all of the generated video files, and displaying the second video file in the video preview interface in a form of carrying the second screening result label; and in response to a trigger operation on the conventional video display item, displaying other video files than the first video file and the second video file in all of the generated video files in the video preview interface; wherein screening content comprised in the first screening condition is different from screening content comprised in the second screening condition (Li, para. 0113, Specifically, for the text data in the supplementary data, the location of each text data and image data in the video (e.g., which image it is displayed in and its specific location within the image, or which video frame it is displayed in and its specific location within the video frame), and its format (e.g., font size, font color, font type, image size, etc.) are all customizable according to requirements. For the audio data in the supplementary data, the location of each audio data in the video (e.g., which videos and/or images are displayed and the format (e.g., audio volume, audio timbre, whether multiple audios are mixed) are all customizable according to requirements; Zhang, para’s 0146-0149, performing text recognition on the object to be recognized based on the pre-trained text recognition mode to obtain text content corresponding to the object to be recognized includes the following steps: performing feature-extraction processing on the text to be recognized, to obtain semantic features of the text to be recognized; performing, by adopting the text recognition model, text recognition on the text to be recognized according to the semantic features of the text to be recognized, to obtain text content corresponding to the text to be recognized; para. 0138, the tool for loading the object to be recognized can be an interface for connecting with an external device, or a display apparatus, for example, a text recognition apparatus can input an interface for loading the object to be recognized on the display apparatus, and the user can import the object to be recognized into the text recognition apparatus through the interface; para. 0223, functions/operations specified in the flowcharts and/or block diagrams can be implemented via an implied trigger operation; LiLija, para. 0058-0059, the information presentation page of the first object may include at least one interactive control; and the user may trigger a corresponding interactive operation by using the interactive control; receiving an operation for entering a video playback page triggered on the information presentation page; para. 0071, the object attribute information presentation interface may include at least one region; since each of the at least one region can be used for presenting attribute information of the object; it is obvious that one region of the interface can also be used to display video related text corresponding to the video production object). (The secondary reference, Wang et al., is not discussed in this rejection.)
The motivation to combine the references and obviousness arguments are the same as claim 5.
Regarding claim 9, Li-Zhang-LiLija-Jiang-Jin discloses the method of claim 8, wherein a step of determining the first video file and the second video file from the generated video files comprises: analyzing , for each of the generated video file, a material file source corresponding to video content in the video file; if none of material files contained in the material file source participates in generation of a further video file, determining the video file as the first video file satisfying the first screening condition; and if a number of material files, which participates in generation of the further video file and belongs to material files comprised in the material file source, is less than a predetermined number, determining the video file as the second video file satisfying the second screening condition (Li, para’s 0144-0148, Obtain the materials and supplementary data used to generate each video segment, and determine the order of each material in the video segment, as well as the correspondence between the text data and image data in the supplementary data and the materials; for each video segment, determine the transition effect between two adjacent materials based on the material attributes of any two adjacent materials contained in the video segment; based on the correspondence, display the text data and the image data in the material, and generate a video clip based on the transition effect between the two adjacent materials; the purpose of generating N videos is mainly to provide users with multiple options, which makes it easier for users to choose videos that meet their own needs. Therefore, the materials and supplementary data when generating each video can be different. So, the first step is to obtain the source material and supplementary data used to generate each video segment, and determine the order of each source material in the video segment, as well as the correspondence between the text data and image data in the supplementary data and the source material.; para’s 0114-0122, after obtaining the product details page, image recognition technology can be used to decompose the content of the original details page, performing text recognition and extraction, image recognition and processing, and music style selection. Then, based on the text recognition results, selling point copy is generated as text data for generating the video. Based on the image recognition results, high-quality product images (including images and video frames) are selected as material for generating the video. Based on the music style selection results, rhythm analysis is performed to select appropriate audio data. In addition, important icons can be obtained through image recognition as image data for generating the video. Correspondingly, text data and materials can be combined for image-text matching, and audio data and materials can be combined to achieve audio-visual linkage. If a video is generated from a product detail page, the system can also capture core high-quality reviews, identify the text in the detail page, extract text data such as highlights, and display this text data on the image to enhance the visualization of key information. For each of the details pages, text recognition is performed on the details page to obtain text that meets preset text conditions, which is used as text data for generating the video; wherein, the text includes at least one of evaluation information and descriptive information; para’s 0141-0142, in order to obtain materials for generating videos, image recognition can be performed on each details page, and then materials that meet the preset material conditions in the corresponding details page can be obtained based on the image recognition results. The preset material conditions can be set in advance according to requirements. Overall, the criteria for determining the first video file and the second video file from the generated video files are common knowledge as known in the art or merely are design choice; Zhang et al., US 2022/0415071, para’s 0146-0149, performing text recognition on the object to be recognized based on the pre-trained text recognition mode to obtain text content corresponding to the object to be recognized includes the following steps: performing feature-extraction processing on the text to be recognized, to obtain semantic features of the text to be recognized; performing, by adopting the text recognition model, text recognition on the text to be recognized according to the semantic features of the text to be recognized, to obtain text content corresponding to the text to be recognized; para. 0138, the tool for loading the object to be recognized can be an interface for connecting with an external device, or a display apparatus, for example, a text recognition apparatus can input an interface for loading the object to be recognized on the display apparatus, and the user can import the object to be recognized into the text recognition apparatus through the interface; para. 0223, functions/operations specified in the flowcharts and/or block diagrams can be implemented via an implied trigger operation; LiLija, para. 0058-0059, the information presentation page of the first object may include at least one interactive control; and the user may trigger a corresponding interactive operation by using the interactive control; receiving an operation for entering a video playback page triggered on the information presentation page; para. 0071, the object attribute information presentation interface may include at least one region; since each of the at least one region can be used for presenting attribute information of the object; it is obvious that one region of the interface can also be used to display video related text corresponding to the video production object).
The motivation to combine the references and well-known technique in the art and obviousness arguments are the same as claim 5.
Regarding claim 10, Li-Zhang-LiLija-Jiang-Jin discloses the method of claim 8, wherein a step of determining the first video file and the second video file from the generated video files comprises: obtaining a material similarity value that the uploaded material file has, the material similarity value being determined by performing similarity calculation on the comprised material content after the material file is uploaded; determining, for each of the generated video files, a target material file composing the video content in the video file; determining a video similarity of the video file by performing weighted calculation on the material similarity values of the target material file; in response to that the video similarity is less than a first similarity threshold, determining the video file as the first video file satisfying the first screening condition; and in response to that the video similarity is greater than or equal to the first similarity threshold and less than a second similarity threshold, determining the video file as the second video file satisfying the second screening condition (Li, para. 0152, in order to improve the visual effect of text and image data in video clips and highlight the key content contained in text and image data, the correspondence between text and image data and the material can be determined. Specifically, this can include which text and/or image data to display in which image data or which frames of which video material, etc. The correspondence can be generated randomly, or it can be based on the similarity between text data and/or image data and the material, establishing a correspondence for each piece of text data and/or image data with the material that has the highest similarity; and so on. The specific method for determining the correspondence can be preset according to requirements. Overall, the criteria for determining the first video file and the second video file from the generated video files are common knowledge as known in the art or merely are design choices; Zhang, para’s 0146-0149, performing text recognition on the object to be recognized based on the pre-trained text recognition mode to obtain text content corresponding to the object to be recognized includes the following steps: performing feature-extraction processing on the text to be recognized, to obtain semantic features of the text to be recognized; performing, by adopting the text recognition model, text recognition on the text to be recognized according to the semantic features of the text to be recognized, to obtain text content corresponding to the text to be recognized; para. 0138, the tool for loading the object to be recognized can be an interface for connecting with an external device, or a display apparatus, for example, a text recognition apparatus can input an interface for loading the object to be recognized on the display apparatus, and the user can import the object to be recognized into the text recognition apparatus through the interface; para. 0223, functions/operations specified in the flowcharts and/or block diagrams can be implemented via an implied trigger operation; LiLija, para. 0058-0059, the information presentation page of the first object may include at least one interactive control; and the user may trigger a corresponding interactive operation by using the interactive control; receiving an operation for entering a video playback page triggered on the information presentation page; para. 0071, the object attribute information presentation interface may include at least one region; since each of the at least one region can be used for presenting attribute information of the object; it is obvious that one region of the interface can also be used to display video related text corresponding to the video production object).
The motivation to combine the references and well-known technique in the art and obviousness arguments are the same as claim 5.
Regarding claim 11, Li-Zhang-LiLija-Jiang-Jin discloses the method of claim 5, further comprising: in response to a playback trigger operation for any of displayed video files, performing video content playback of a corresponding video file in a pop-up video preview playback window, wherein the video preview playback window is presented in a created layer, and the layer is above a layer where the video preview interface is located; or in response to the playback triggering operation for any of displayed video files, expanding a preview playback area in a converged state in the video preview interface, and displaying video content of the corresponding video file in the preview display area, wherein the preview playback area enters the convergence state after completion of the playing of the video file, and the display area size for displaying the video file is adjusted according to expansion or convergence of the preview playback area (Li, para. 0166, adjustment effect controls can be set up for each video in the preview interface. When a user clicks on the adjustment effect control below any video, an editing page for that video will pop up. Then, the user can trigger editing instructions for that video through the controls on the editing page. In addition, to make it convenient for users to view the edited video effect in real time, a preview area for the edited video can be set up in the editing page; using the "Video Editing" function on the editing page to fine-tune the video in terms of subtitle style, transition effects, cover stickers, etc.; Zhang et al., US 2022/0415071, para’s 0146-0149, performing text recognition on the object to be recognized based on the pre-trained text recognition mode to obtain text content corresponding to the object to be recognized includes the following steps: performing feature-extraction processing on the text to be recognized, to obtain semantic features of the text to be recognized; performing, by adopting the text recognition model, text recognition on the text to be recognized according to the semantic features of the text to be recognized, to obtain text content corresponding to the text to be recognized; para. 0138, the tool for loading the object to be recognized can be an interface for connecting with an external device, or a display apparatus, for example, a text recognition apparatus can input an interface for loading the object to be recognized on the display apparatus, and the user can import the object to be recognized into the text recognition apparatus through the interface; para. 0223, functions/operations specified in the flowcharts and/or block diagrams can be implemented via an implied trigger operation; LiLija, para. 0058-0059, the information presentation page of the first object may include at least one interactive control; and the user may trigger a corresponding interactive operation by using the interactive control; receiving an operation for entering a video playback page triggered on the information presentation page; para. 0071, the object attribute information presentation interface may include at least one region; since each of the at least one region can be used for presenting attribute information of the object; it is obvious that one region of the interface can also be used to display video related text corresponding to the video production object). (The secondary reference, Wang et al., is not discussed in this rejection.)
The motivation to combine the references and well-known technique in the art and obviousness arguments are the same as claim 5.

Claims 6-7 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Li-Zhao-LiLija-Jiang-Jin, as applied to claim 5 above, in view of Wang et al. (US Publication 2024/0404283).
Regarding claim 6, Li-Zhang-LiLija-Jiang-Jin discloses the method of claim 5, comprising combining the image or video material with text data (Li, para. 0114, text data and materials can be combined for image-text matching).
 Li-Zhang-LiLija-Jiang-Jin does not explicitly disclose but Wang discloses wherein a step of generating the video file based on the uploaded material file comprises: in response to that there is one uploaded material file, fusing the material file with the video related text, and performing enhancement processing on video content of a generated fused video to be displayed according to pre-configured video production attribute information to obtain a video file formed after enhancement processing (Wang, para’s 0079, fusing the video embeddings, text embeddings, and in some embodiments, the user query embeddings concatenated to the video embeddings and text embeddings).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Wang’s features into Li-Zhang-LiLija-Jiang-Jin’s invention for enhancing user’s playback experience by providing high quality edited video using fused features generated from video data and text data. 
Regarding claim 7, Li-Zhang-LiLija-Jiang-Jin discloses the method of claim 5, wherein a step of generating the video file based on the uploaded material file comprises: in response to that there are at least two uploaded material files (Li, para. 0100, if the details page contains multiple optional materials, one or more of them can be selected as the materials for generating the video; para’s 0102-0188, determining transition effect between each adjacent material used to generate the video; transition effects refer to the use of certain techniques, such as wipes, overlays, and page rolls, between two scenes (i.e., two pieces of footage) to achieve a smooth transition between scenes or plots, or to enrich the visuals and attract the audience; the transition effect between any two adjacent materials can be customized according to requirements), parsing the uploaded material files and obtaining predetermined synthesis video configuration information (Li, para. 0114, decomposing the detail page); dividing and cropping the uploaded material file according to a material analysis result to form a plurality of material segments corresponding to the material file, and selecting material segments from different material files for material content splicing according to the synthesis video configuration information to generate a plurality of synthesized videos (Li, para’s 0154-0161, dividing the material into multiple sets; wherein, each set contains all materials under a category, and the categories corresponding to each set do not overlap; cropping the material according to the requirement to generate the video is well known in the art).
Li-Zhang-LiLija-Jiang-Jin does not explicitly disclose but Wang discloses  performing a fusion processing on the video related text and different synthesized videos respectively, and performing enhancement processing on video content of a generated fused video to be displayed according to pre-configured video production attribute information to obtain a video file formed after enhancement processing (Wang, para’s 0079, fusing the video embeddings, text embeddings, and in some embodiments, the user query embeddings concatenated to the video embeddings and text embeddings).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Wang’s features into Li-Zhang-LiLija-Jiang-Jin’s invention for enhancing user’s playback experience by providing high quality edited video using fused features generated from video data and text data.

Claims 12 and 15-16 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Li-Zhao-LiLija-Jiang-Jin, as applied to claim 5 above, in view of Clarke et al. (US Publication 2013/0283999).
Regarding claim 12, Li-Zhang-LiLija-Jiang-Jin discloses the method of claim 5, further comprising: in response to a trigger operation on a BACK execution control in the video preview interface, switching from the video preview interface to the material uploading interface for display; and in response to a trigger operation on a NEXT execution control in the video preview interface, switching the video preview interface to a video interface, and displaying a selected video file in the video interface, wherein the selected video file is preselected from video files displayed on the video preview interface (Li, para’s 0163-0165, displaying generated video on a preview interface; Jiang, fig’s 7(a), 7(b), and 12(a), providing NEXT control and BACK control for switching from one interface to another interface, i.e., a video preview interface).
Li-Zhang-LiLija-Jiang-Jin does not explicitly disclose but Clarke discloses the video interface is a video export interface, and displaying a selected video file as a to-be-exported video file in the video interface, wherein the selected video file is preselected from video files displayed on the video preview interface  (Clarke, para, 0023, fig. 2, the user may initiate the generation and export of the animated video playback, for example by pressing a button labeled "export.". A preview pane (FIG. 3, 302) displays the consequences of changes in settings).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Clarke’s features into Li-Zhang-LiLija-Jiang-Jin’s invention for enhancing user’s playback experience by allowing user to export edited video file via a video export interface. 
Regarding claim 15, Li-Zhang-LiLija-Jiang-Jin discloses the method of claim 5.
Li-Zhang-LiLija-Jiang-Jin does not explicitly disclose but Clarke discloses in response to a trigger operation on a video publishing control, presenting the video publishing control in the video preview interface; and obtaining a selected video file selected from presented video files, and publishing the selected video file as a video publishing file (Clarke, para, 0023, fig. 2, the user may initiate the generation and export “publish” of the animated video playback, for example by pressing a button labeled "export.". A preview pane (FIG. 3, 302) displays the consequences of changes in settings).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Clarke’s features into Li-Zhang-LiLija-Jiang-Jin’s invention for enhancing user’s playback experience by allowing user to export edited video file via a video export interface. 
Regarding claim 16, Li-Zhang-LiLija-Jiang-Jin-Clarke discloses the method of claim 15, further comprising: aggregating playback effect description data generated after publishing the video publishing file; and updating a processing policy involved in video file generation according to the playback effect description data (Li, para’s 0061-0062, determining transition effect; Clarke, para, 0023, fig. 2, the user may initiate the generation and export “publish” of the animated video playback; aggregating playback effect description data generated after publishing the video publishing file; and updating a processing policy involved in video file generation according to the playback effect description data deter are processes well known in the art).
The motivation to combine the references and well-known technique in the art and obviousness arguments are the same as claim 15.

Claim 13 is rejected under AIA  35 U.S.C. 103 as being unpatentable over Li-Zhao-LiLija-Jiang-Jin-Clarke, as applied to claim 12 above, in view of Wang (English Translation of Chinese Publication CN116527994 08-2023).
Regarding claim 13, Li-Zhang-LiLija-Jiang-Jin-Clarke discloses the method of claim 12, wherein before switching the video preview interface to the video export interface, the method further comprises: popping out a video configuration window (Li, para. 0166, adjustment effect controls can be set up for each video in the preview interface. When a user clicks on the adjustment effect control below any video, an editing page for that video will pop up. Then, the user can trigger editing instructions for that video through the controls on the editing page. In addition, to make it convenient for users to view the edited video effect in real time, a preview area for the edited video can be set up in the editing page; using the "Video Editing" function on the editing page to fine-tune the video in terms of subtitle style, transition effects, cover stickers, etc.).
Li-Zhang-LiLija-Jiang-Jin-Clarke does not explicitly disclose but Wang discloses, the video configuration window including a video deduplication configuration window comprising: a deduplication option and a non-deduplication option; in response to receiving a select operation on the deduplication option, deduplicating the selected video file, and reserving the deduplicated selected video file; and in response to receiving a select operation on the non-deduplication option, reserving all selected video files (Wang, 0067-0068 and 0121-0125, performing deduplication process; non-duplication option is well known in the art).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Wang’s features and well-known technique in the art into Li-Zhang-LiLija-Jiang-Jin-Clarke’s invention for enhancing user’s playback experience by allowing user to perform deduplication process on generated videos. 

Claim 14 is rejected under AIA  35 U.S.C. 103 as being unpatentable over Li-Zhao-LiLija-Jiang-Jin-Clarke, as applied to claim 12 above, in view of Bian (US Publication 2022/0291794).
Regarding claim 14, Li-Zhang-LiLija-Jiang-Jin-Clarke discloses the method of claim 12, wherein the to-be-exported video file is provided, and in response to a select operation on the video view item, playing video content of the to-be-exported video file in a pop-up video playback window (Li, para. 0166, adjustment effect controls can be set up for each video in the preview interface. When a user clicks on the adjustment effect control below any video, an editing page for that video will pop up. Then, the user can trigger editing instructions for that video through the controls on the editing page. In addition, to make it convenient for users to view the edited video effect in real time, a preview area for the edited video can be set up in the editing page; using the "Video Editing" function on the editing page to fine-tune the video in terms of subtitle style, transition effects, cover stickers, etc.; Clarke, para, 0023, fig. 2, the user may initiate the generation and export of the animated video playback, for example by pressing a button labeled "export.". A preview pane (FIG. 3, 302) displays the consequences of changes in settings).
Li-Zhang-LiLija-Jiang-Jin-Clarke does not explicitly disclose but Bian discloses wherein the to-be-exported video file is provided with a hidden operation bar; the method further comprising: in response to detecting a hover event on any of to-be-exported video files, displaying a corresponding hidden operation bar, the hidden operation bar comprising a video download item and a video view item; in response to a select operation on the video download item, storing the to-be-exported video file according to a set storage path (Bian, para. 0168, hidden operation bar for storing video file).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Bian’s features into Li-Zhang-LiLija-Jiang-Jin’-Clarkes invention for enhancing storage control operation by allowing user to store file using a hidden operation bar.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LOI H TRAN whose telephone number is (571)270-5645. The examiner can normally be reached 8:00AM-5:00PM PST FIRST FRIDAY OF BIWEEK OFF.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, THAI TRAN can be reached at 571-272-7382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LOI H TRAN/           Primary Examiner, Art Unit 2484
Read full office action
Prosecution Timeline

Oct 11, 2024
Application Filed
Mar 21, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/463,427
Patent 12598366
CONTENT DATA PROCESSING METHOD AND CONTENT DATA PROCESSING APPARATUS
2y 5m to grant Granted Apr 07, 2026
18/194,454
Patent 12593112
METHOD, DEVICE, AND COMPUTER PROGRAM FOR ENCAPSULATING REGION ANNOTATIONS IN MEDIA TRACKS
2y 5m to grant Granted Mar 31, 2026
18/528,425
Patent 12592261
VIDEO EDITING METHOD AND APPARATUS, AND DEVICE AND STORAGE MEDIUM
2y 5m to grant Granted Mar 31, 2026
17/302,302
Patent 12576798
CAMERA SYSTEM AND ASSISTANCE SYSTEM FOR A VEHICLE AND A METHOD FOR OPERATING A CAMERA SYSTEM
2y 5m to grant Granted Mar 17, 2026
18/322,321
Patent 12579810
SYSTEM AND METHOD FOR AUTOMATIC EVENTS IDENTIFICATION ON VIDEO
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
64%
Grant Probability
88%
With Interview (+23.6%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 611 resolved cases by this examiner. Grant probability derived from career allow rate.