Last updated: April 19, 2026
Application No. 18/577,081
METHOD AND APPARATUS FOR VIDEO PROCESSING, DEVICE, AND MEDIUM

Non-Final OA §101§103
Filed
Jan 05, 2024
Examiner
ABDI, AMARA
Art Unit
2668
Tech Center
2600 — Communications
Assignee
BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.
OA Round
1 (Non-Final)
Interview Optional

— -7.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 816 resolved cases, 2023–2026
Examiner Intelligence

ABDI, AMARA View full profile →
Grants 83% — above average
Career Allow Rate
677 granted / 816 resolved
+21.0% vs TC avg
Minimal -8% lift
Without
With
+-7.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
33 currently pending
Career history
849
Total Applications
across all art units
Statute-Specific Performance

§101
9.8%
-30.2% vs TC avg
§103
60.7%
+20.7% vs TC avg
§102
10.2%
-29.8% vs TC avg
§112
10.0%
-30.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 816 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
-- “a shoot frame distribution module, configured to distribute first shoot frames to a recording unit and a content analysis unit…”; “shoot frame processing module, configured to perform a recording processing on the first shoot frames by the recording unit…”; “perform a content analysis processing on the first shoot frames by the content analysis unit to obtain content analysis results of the first shoot frames”, in claim 12
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.


If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. Claim 14 recites the limitation of: “a computer-readable storage medium storing a computer program, wherein the computer program is used to perform the method for video processing according to claim 1”. In the other hand, the specification states: “the computer program product may include various forms of computer readable storage media … Various contents, such as an input signal, a signal component, a noise component, and the like, may also be stored in the computer-readable storage medium”, (see par. 0107, published specification). As shown in paragraph 0107 of the specification, the computer readable storage media is not limited to physical devices (Rom, Ram ...etc.) and includes a signal carrier wave; and a “signal”, “carrier wave”, or “transmission medium” are deemed non-statutory.  
As remedy, the Examiner suggests amending the claims 3 and 11 to reflect such as: “a non-transitory computer-readable storage medium storing a computer program, wherein the computer program is used to perform the method for video processing according to claim 1”, to be consistent with the guidelines for Subject Matter Eligibility of Computer readable media, 1351 OG 212, Feb. 23, 2010.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 7, and 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over Patten et al, (US-PGPUB 20070074115)



In regards to claim 1, Patten et al discloses a method for video processing, 
comprising: 
distributing first shoot frames to a recording unit and a content analysis unit during a shooting process, (see at Fig. 2, and Par. 0021, the digital video camera 206 records a visual image or series of visual images and generates the video stream 205 representative of the visual image or series of visual images; and from Par. 0023, the capture tool 211 transfers the digital data directly to the MEA 204 or directly to the CRM 212 (e.g., hard drive or random access memory (RAM)) of the computer 202 for storage as a video file 214 containing, for example, DV data, “i.e., distributing, “transferring”, first shoot frames, “series of visual images or video stream 205”, to a recording unit, “MEA 204 or CRM 212”. Further, Par. 0029, Fig. 3 illustrates basic components of MEA 300 (e.g., MEA 204) comprising an analysis component 310, which analyzes video frames 104 included in the video stream 205 being captured or transferred from a video source (e.g., video camera 206), [i.e., distributing, “transferring”, first shoot frames, “series of visual images or video stream 205”, to a recording unit, “to the MEA 204 or to the CRM 212”, and a content analysis unit, “an analysis component 310 within the MEA 204”, during a shooting process, “during the recording of the visual image or the series of visual images, by the digital video camera 206”]);
 performing a recording processing on the first shoot frames by the recording unit to obtain a recorded video, (see at least:  Par. 0020, the video data stream 205 may transfer video data from the video source as the video data 208 is recorded (e.g., live feed or streaming video), or may transfer video data from a video file 102 stored on the video source 206, to the computer executing MEA 204, [i.e., performing a recording processing on the first shoot frames, “transferring the video data from the video source as the video data 208 is recorded”, by the recording unit, “implicitly by the memory 222 included in the MEA 204”, to obtain a recorded video, “implicitly obtaining the recorded video data 208”]); and 
performing a content analysis processing on the first shoot frames by the content analysis unit to obtain content analysis results of the first shoot frames, (see at least: Fig. 3, Par. 0029, MEA 300 (e.g., MEA 204) comprises an analysis component 310, which analyzes video frames 104 included in the video stream 205 being captured or transferred from a video source (e.g., video camera 206) to determine a content property value for each video frame 104; and from Par. 0033, the property value determination techniques including face recognition algorithm, … , [i.e., performing a content analysis processing on the first shoot frames, “analyzes video frames 104 included in the video stream 205”, by the content analysis unit, “analysis component 310”, to obtain content analysis results of the first shoot frames, “determine a content property value for each video frame 104 including face recognition”]);
in case of obtaining the recorded video, (see at least: Par. 0020, “as the video data 208 is recorded”; also from Par. 0036, while the video data stream is being captured from the video source, [implicitly as the video data 208 is recorded], [i.e., in case of obtaining the recorded video, “in case when the video data 208 is recorded”]); determining a content analysis result of the recorded video based on the content analysis results of the first shoot frames, wherein the content analysis result of the recorded video is used when performing an edit processing on the recorded video, (see at least: Par. 0036, the edit component 318 determines a default special effect to apply to a particular video clip as a function of the determined the property values of the video frames included in that particular video clip and/or based on special effect selection rules stored in a memory; which the default special effect is technically equivalent to the content analysis result of the recorded video, as it is determined based on the property values of the video frames, [i.e., determining a content analysis result of the recorded video, “determining a default special effect to apply to a particular video clip”, based on the content analysis results of the first shoot frames, “based on the property values of the video frames included in that particular video clip and/or based on special effect selection rules stored in a memory”, wherein the content analysis result of the recorded video is used when performing an edit processing on the recorded video, “the edit component 318, implicitly is used for performing an edit processing on the recorded video data 208”]).
Therefore, Patten et al is functionally equivalent to the recited limitations of claim 1 as addressed above.

In regards to claims 7, Patten et al obviously discloses the limitations of claim 1.
Patten further discloses wherein the first shoot frames comprise each of all shoot frames obtained during the shooting process, or the first shoot frames are shoot frames extracted at a specified interval during the shooting process, (see at least: Par. 0019, as shown by timeline 108, each video clip 106 represents a continuously recorded portion of the digital video 102 between a record operation R and a stop operation S of the recording device, (i.e., shoot frames extracted at a specified interval during the shooting process, correspond to the recorded video footage during the time interval between a record operation R and a stop operation S of the recording device)).
Regarding claim 12, claim 12 recites substantially similar limitations as set forth in claim 1. As such, claim 12 is rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation(s): “apparatus for video processing”. However, Patten discloses the “apparatus for video processing”, (see at least: Par. 0005, “embodiments of the invention may comprise various other methods and apparatuses”).

Regarding claim 13, claim 13 recites substantially similar limitations as set forth in claim 1. As such, claim 13 is rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation(s): “electronic device, comprising: a processor; a memory used to store an executable instruction”. However, Patten discloses the “electronic device, comprising: a processor; a memory used to store an executable instruction”, (see at least: Fig. 2, “computer”, (i.e., device), and Par. 0027, “multiprocessor”, (i.e., processor), and Par. 0023, random access memory (RAM) of the computer 202, (i.e., memory)

Regarding claim 14, claim 14 recites substantially similar limitations as set forth in claim 1. As such, claim 14 is rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation(s): “a computer-readable storage medium storing a computer program”. However, Patten discloses the “computer-readable storage medium storing a computer program”, (see at least: Par. 0026, computing device typically has at least some form of computer readable media (e.g., computer-readable medium 212), where the computer readable media comprise computer storage media).

Claims 2-3 are rejected under 35 U.S.C. 103 as being unpatentable over Patten et al, (US-PGPUB 20070074115) in view of Xiong et al, (US-PGPUB 20230370710)

In regards to claims 2, Patten et al obviously discloses the limitations of claim 1.
Patten further discloses distributing first shoot frames to a recording unit and a content analysis unit during a shooting process, (see at least: Par. 0021, 0023, and 0029, “see the rejection of claim 1 above, for more details”)
Patten does not expressly disclose wherein the distributing first shoot frames to a recording unit and a content analysis unit during a shooting process comprises: in case of meeting a specified condition, distributing the first shoot frames to the recording unit and the content analysis unit during the shooting process
Xiong discloses in case of meeting a specified condition, distributing the first shoot frames to the recording unit and the content analysis unit during the shooting process, (see at least: Par. 0094, event logic 336.4 may include logical rules configured to trigger video camera control, video storage, analytics, and/or user notification … to determine system responses to generated video streams and related conditions and analysis …., [i.e., in case of meeting a specified condition, “logical rules”, distributing the first shoot frames to the recording unit and the content analysis unit during the shooting process, “implicit by receiving and/or maintaining state information in the video storage, and analytic”]).
Patten and Xiong are combinable because they are both concerned with video processing. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Patten, to use event logic, as though by Xiong, in order to trigger video camera in response to real-time analytics, (Xiong, Par. 0094)

In regards to claims 3, the combine teaching Patten and Xiong as whole discloses the limitations of claim 2.
Furthermore, Xiong discloses wherein the specified condition comprises: a specified control on a shooting interface being triggered, (Xiong, see at least: Par. 0094, the event logic 336.4 may include logical rules configured to trigger video camera control, [i.e., specified control on a shooting interface being triggered, “logical rules configured to trigger video camera control”]).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Patten et al, (US-PGPUB 20070074115) in view of Xie et al, (US-PGPUB 20230252786)
Patten et al obviously discloses the limitations of claim 1.
Patten further discloses performing a content analysis processing on the first shoot frames, (see at least: Par. 0029, 0033, “see the rejection of claim 1 above, for more details”).
Patten does not expressly disclose wherein the performing a content analysis processing on the first shoot frames comprises: inputting the first shoot frames to a content recognition model, which is preset, to perform a recognition processing on screen contents of the first shoot frames by the content recognition model.
However, Xie et al discloses inputting the first shoot frames to a content recognition model, which is preset, to perform a recognition processing on screen contents of the first shoot frames by the content recognition model, (see at least: Par. 0079, after the target video frame is obtained, the target video frame is input into the text recognition model, so that text information displayed in the target video frame can be recognized by using the text recognition model, [i.e., inputting the first shoot frames to a content recognition model, “input video frame into the text recognition model”, which is preset, to perform a recognition processing on screen contents of the first shoot frames, “performing recognition for text information displayed in the target video frame”, by the content recognition model, “using the text recognition model”]).
Patten and Xie et al are combinable because they are both concerned with video processing. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Patten, to use the recognition model, as though by Xie, in order to recognize text information displayed in the target video frame, (Xie, Par. 0079).

Claims 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Patten et al, (US-PGPUB 20070074115) in view of Kunieda et al, (US-PGPUB 20210287412)

In regards to claims 5, Patten et al obviously discloses the limitations of claim 1.
Patten further discloses determining a content analysis result of the recorded video based on the content analysis results of the first shoot frames, (see at least: Par. 0036, see the rejection of claim 1 for more details)
Patten does not expressly disclose wherein determining a content analysis result of the recorded video based on the content analysis results of the first shoot frames, comprises: performing statistics based on the content analysis results of the first shoot frames to determine a type of a shot subject in the recorded video; obtaining the content analysis result of the recorded video based on the type of the shot subject in the recorded video.
However, Kunieda discloses performing statistics based on the content analysis results of the first shoot frames to determine a type of a shot subject in the recorded video, (see at least: Par. 0060, histogram can be formed by counting the frequency of appearance of a certain value within each range. For example, a deep neural network may be used for the feature value. Specifically, inputting an image into a network that performs object recognition yields an interim value in the process of computation in addition to the recognition results which are the type of the object and a numerical value indicating the probability thereof, [i.e., performing statistics based on the content analysis results of the first shoot frames, “counting the frequency of appearance of a certain value within each range”, to determine a type of a shot subject in the recorded video, “type of the object”]); and 
obtaining the content analysis result of the recorded video based on the type of the shot subject in the recorded video, (see at least: Par. 0060, the recognition results correspond to the content analysis result).
Patten and Kunieda are combinable because they are both concerned with video processing. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Patten, to use the neural network, as though by Kunieda, to yield an interim value in the process of computation in addition to the recognition results which are the type of the object and a numerical value indicating the probability thereof, (Par. 0060)

In regards to claims 6, the combine teaching Patten and Kunieda as whole discloses the limitations of claim 5.
Furthermore, Kunieda discloses wherein the performing statistics based on the content analysis results of the first shoot frames to determine a type of a shot subject in the recorded video comprises: counting a frequency of appearance of a shot object of each specified type in all of the first shoot frames, (Kunieda, Par. 0060, histogram can be formed by counting the frequency of appearance of a certain value within each range); and determining the type of the shot subject in the recorded video based on the frequency of appearance of the shot object of each specified type, (Kunieda, see at least: Par. 0060, inputting an image into a network that performs object recognition yields an interim value in the process of computation in addition to the recognition results which are the type of the object and a numerical value indicating the probability thereof).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Patten et al, (US-PGPUB 20070074115) in view of Stojancic et al, (US-PGPUB 20190354763)
Patten et al obviously discloses the limitations of claim 1.
Patten does not expressly disclose wherein the recorded video further comprises a second shoot frame, the second shoot frame is a video frame image which is not used in the content analysis processing.
Stojancic discloses wherein the recorded video further comprises a second shoot frame, the second shoot frame is a video frame image which is not used in the content analysis processing, (see at least: Par. 0158, If the option to process each video frame 300 in succession is not chosen, then N video frames 300 may be skipped in a step 810, and implicitly reading the next video frame, [i.e., the one or more skipped N video frame in step 810 correspond to the second shoot frame, that is a video frame image which is not used in the face recognition, (content analysis processing)]).
Patten and Stojancic are combinable because they are both concerned with video processing. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Patten, to skip N video frames, as though by Stojancic, in order to conduct face recognition on the selected video frame, (Stojancic, Par. 0158), to thereby identify a highlight based on two disconnected appearances of the same face, (Stojancic, Par. 0160)

Claims 9 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Patten et al, (US-PGPUB 20070074115) in view of Griffin, (US-PGPUB 20070201815)

In regards to claim 9, Patten et al obviously discloses the limitations of claim 1.
Patten does not expressly disclose obtaining a matching clip template of the recorded video based on the content analysis result of the recorded video; performing the edit processing on the recorded video through the matching clip template to obtain a target video
However, Griffin discloses obtaining a matching clip template of the recorded video based on the content analysis result of the recorded video, (see at least: Fig. 5, and Par. 0048, the video processor 22 (see FIG. 2) may be instructed automatically align or match the video files from the different video streams based on the time stamp values associated with each video file, [i.e., obtaining a matching clip template of the recorded video, “implicit by matching the video files from the different video streams”, based on the content analysis result of the recorded video, “implicit by time stamps analysis”]; and 
performing the edit processing on the recorded video through the matching clip template to obtain a target video, (see at least: Par. 0050, the interface page 250 may allow a video editor to visually confirm that the various video segments have been aligned or matched correctly, “performing the edit processing on the recorded video through the matching clip template”, to determine particular video file, “obtain a target video”]).
Patten and Griffin are combinable because they are both concerned with video processing. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Patten, to use the video editor, as though by Griffin, in order to visually confirm that the various video segments have been matched correctly, (Griffin, Par. 0050)

In regards to claim 11, the combine teaching Patten and Griffin as whole discloses the limitations of claim 9.
Griffin further discloses saving the recorded video by default, and/or, saving the target video upon receiving a save instruction issued by the user for the target video, (see at least: Par. 0051, Once the video files from the various video streams have been aligned or matched, the video editor may save the results, [implicitly by on receiving a save instruction issued by the user for the target video based on the video editor]).

Claims 10 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Patten et and Griffin, as applied to claim 9 above; and further in view of Mo, (US-PGPUB 20230368817) 

In regards to claims 10, the combine teaching Patten and Griffin as whole discloses the limitations of claim 9.
Griffin further discloses displaying template identifiers of a plurality of candidate clip templates together with the target video on a display page, (see at least: Par. 0050, the video segments corresponding to the same play or event may be grouped together and displayed by the user interface 26. Fig. 6, shows an interface page 250 for displaying groups of aligned or matched video segments, [i.e., displaying template identifiers of a plurality of candidate clip templates together with the target video on a display page, “implicit by displaying groups of matched video segments”]);
in response to monitoring that a user selects a template identifier of a target clip template from the template identifiers of the plurality of candidate clip templates, performing the edit processing on the recorded video through the target clip template to obtain a formulated video, (see at least: Par. 0051, when a video editor requests the video segments for a particular event or play, “i.e., in response to monitoring that a user selects a template identifier … of candidate clip templates”, all of the video segments available from the various video streams will be returned, “i.e., implicitly performing the edit processing on the recorded video through the target clip template”, and the video editor may select whichever segment best fits his requirements, “i.e., implicitly obtaining a formulated video (best fit segment));
The combine teaching Patten and Griffin as whole does not expressly disclose replacing the target video on the display page with the formulated video.
However, Mo discloses replacing the target video on the display page with the formulated video, (see at least: Par. 0082, as shown Fig. 10, video material editing window 100 may be displayed for the user to select a segment in the video and use the segment as the material; …. and the material displayed in the preview page at the current moment may be replaced by the segment selected by the user, [i.e., replacing the target video on the display page, “material displayed in the preview page”, with the formulated video, “video segment selected by the user”]).
Patten, Griffin, and Mo are combinable because they are all concerned with video processing. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Patten and Stojancic, to use the determination control 102 in the video material editing window 100, as though by Mo, in order to enable the user to click the determination control 102, when the user wants to replace the material currently displayed in the preview page with another material, (Mo, Par. 0082)

In regards to claim 21, the combine teaching Patten, Griffin, and Mo as whole discloses the limitations of claim 10.

Griffin further discloses saving the recorded video by default, and/or, saving the target video upon receiving a save instruction issued by the user for the target video, (Griffin, see at least: Par. 0051, Once the video files from the various video streams have been aligned or matched, the video editor may save the results, [implicitly by on receiving a save instruction issued by the user for the target video based on the video editor]).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Patten and Xiong, as applied to claim 2 above; and further in view of Xie et al, (US-PGPUB 20230252786)
The combine teaching Patten and Xiong as whole discloses the limitations of the claim 2.
Patten further discloses performing a content analysis processing on the first shoot frames, (see at least: Par. 0029, 0033, “see the rejection of claim 1 above, for more details”).
The combine teaching Patten and Xiong as whole does not expressly disclose wherein the performing a content analysis processing on the first shoot frames comprises: inputting the first shoot frames to a content recognition model, which is preset, to perform a recognition processing on screen contents of the first shoot frames by the content recognition model.
However, Xie discloses inputting the first shoot frames to a content recognition model, which is preset, to perform a recognition processing on screen contents of the first shoot frames by the content recognition model, (see at least: Par. 0079, after the target video frame is obtained, the target video frame is input into the text recognition model, so that text information displayed in the target video frame can be recognized by using the text recognition model, [i.e., inputting the first shoot frames to a content recognition model, “input video frame into the text recognition model”, which is preset, to perform a recognition processing on screen contents of the first shoot frames, “performing recognition for text information displayed in the target video frame”, by the content recognition model, “using the text recognition model”]).
Patten, Xiong, and Xie are combinable because they are all concerned with video processing. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Patten and Xiong, to use the recognition model, as though by Xie, in order to recognize text information displayed in the target video frame, (Xie, Par. 0079).

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Patten and Xiong, as applied to claim 2 above; and further in view of Kunieda et al, (US-PGPUB 20210287412)
The combine teaching Patten and Xiong as whole discloses the limitations of the claim 2.
Patten further discloses determining a content analysis result of the recorded video based on the content analysis results of the first shoot frames, (see at least: Par. 0036, see the rejection of claim 1 for more details)
The combine teaching Patten and Xiong as whole does not expressly disclose wherein determining a content analysis result of the recorded video based on the content analysis results of the first shoot frames, comprises: performing statistics based on the content analysis results of the first shoot frames to determine a type of a shot subject in the recorded video; obtaining the content analysis result of the recorded video based on the type of the shot subject in the recorded video.
However, Kunieda discloses performing statistics based on the content analysis results of the first shoot frames to determine a type of a shot subject in the recorded video, (see at least: Par. 0060, histogram can be formed by counting the frequency of appearance of a certain value within each range. For example, a deep neural network may be used for the feature value. Specifically, inputting an image into a network that performs object recognition yields an interim value in the process of computation in addition to the recognition results which are the type of the object and a numerical value indicating the probability thereof, [i.e., performing statistics based on the content analysis results of the first shoot frames, “counting the frequency of appearance of a certain value within each range”, to determine a type of a shot subject in the recorded video, “type of the object”]); and 
obtaining the content analysis result of the recorded video based on the type of the shot subject in the recorded video, (see at least: Par. 0060, the recognition results correspond to the content analysis result).
Patten, Xiong, and Kunieda are combinable because they are all concerned with video processing. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Patten and Xiong, to use the neural network, as though by Kunieda, to yield an interim value in the process of computation in addition to the recognition results which are the type of the object and a numerical value indicating the probability thereof, (Kunieda, Par. 0060)
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Patten and Xiong, as applied to claim 2 above; and further in view of Griffin, (US-PGPUB 20070201815)
The combine teaching Patten and Xiong as whole discloses the limitations of the 
claim 2.
The combine teaching Patten and Xiong as whole does not expressly disclose obtaining a matching clip template of the recorded video based on the content analysis result of the recorded video; performing the edit processing on the recorded video through the matching clip template to obtain a target video
However, Griffin discloses obtaining a matching clip template of the recorded video based on the content analysis result of the recorded video, (see at least: Fig. 5, and Par. 0048, the video processor 22 (see FIG. 2) may be instructed automatically align or match the video files from the different video streams based on the time stamp values associated with each video file, [i.e., obtaining a matching clip template of the recorded video, “implicit by matching the video files from the different video streams”, based on the content analysis result of the recorded video, “implicit by time stamps analysis”]; and 
performing the edit processing on the recorded video through the matching clip template to obtain a target video, (see at least: Par. 0050, the interface page 250 may allow a video editor to visually confirm that the various video segments have been aligned or matched correctly, “performing the edit processing on the recorded video through the matching clip template”, to determine particular video file, “obtain a target video”]).

Patten, Xiong, and Griffin are combinable because they are all concerned with video processing. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Patten and Xiong, to use the video editor, as though by Griffin, in order to visually confirm that the various video segments have been matched correctly, (Griffin, Par. 0050)

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Patten, Xiong, and Griffin, as applied to claim 18 above; and further in view of Mo, (US-PGPUB 20230368817) 
The combine teaching Patten, Xiong, and Griffin as whole discloses the limitations of claim 18.
Griffin further discloses displaying template identifiers of a plurality of candidate clip templates together with the target video on a display page, (see at least: Par. 0050, the video segments corresponding to the same play or event may be grouped together and displayed by the user interface 26. Fig. 6, shows an interface page 250 for displaying groups of aligned or matched video segments, [i.e., displaying template identifiers of a plurality of candidate clip templates together with the target video on a display page, “implicit by displaying groups of matched video segments”]);
in response to monitoring that a user selects a template identifier of a target clip template from the template identifiers of the plurality of candidate clip templates, performing the edit processing on the recorded video through the target clip template to obtain a formulated video, (see at least: Par. 0051, when a video editor requests the video segments for a particular event or play, “i.e., in response to monitoring that a user selects a template identifier … of candidate clip templates”, all of the video segments available from the various video streams will be returned, “i.e., implicitly performing the edit processing on the recorded video through the target clip template”, and the video editor may select whichever segment best fits his requirements, “i.e., implicitly obtaining a formulated video (best fit segment)).
The combine teaching Patten, Xiong, and Griffin as whole does not expressly disclose replacing the target video on the display page with the formulated video.
However, Mo discloses replacing the target video on the display page with the formulated video, (see at least: Par. 0082, as shown Fig. 10, video material editing window 100 may be displayed for the user to select a segment in the video and use the segment as the material; …. and the material displayed in the preview page at the current moment may be replaced by the segment selected by the user, [i.e., replacing the target video on the display page, “material displayed in the preview page”, with the formulated video, “video segment selected by the user”]).
Patten, Xiong, Griffin, and Mo are combinable because they are all concerned with video processing. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Patten, Xiong, and Griffin, to use the determination control 102 in the video material editing window 100, as though by Mo, in order to enable the user to click the determination control 102, when the user wants to replace the material currently displayed in the preview page with another material, (Mo, Par. 0082)

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Patten et and Xie et al, as applied to claim 4 above; and further in view of Griffin, (US-PGPUB 20070201815)
The combine teaching Patten and Xie et al as whole discloses the limitations of 
claim 4.
The combine teaching Patten and Xie as whole does not expressly disclose obtaining a matching clip template of the recorded video based on the content analysis result of the recorded video; performing the edit processing on the recorded video through the matching clip template to obtain a target video.
However, Griffin discloses obtaining a matching clip template of the recorded video based on the content analysis result of the recorded video, (see at least: Fig. 5, and Par. 0048, the video processor 22 (see FIG. 2) may be instructed automatically align or match the video files from the different video streams based on the time stamp values associated with each video file, [i.e., obtaining a matching clip template of the recorded video, “implicit by matching the video files from the different video streams”, based on the content analysis result of the recorded video, “implicit by time stamps analysis”]; and 
performing the edit processing on the recorded video through the matching clip template to obtain a target video, (see at least: Par. 0050, the interface page 250 may allow a video editor to visually confirm that the various video segments have been aligned or matched correctly, “performing the edit processing on the recorded video through the matching clip template”, to determine particular video file, “obtain a target video”]).

Patten, Xie, and Griffin are combinable because they are all concerned with video processing. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Patten and Xie, to use the video editor, as though by Griffin, in order to visually confirm that the various video segments have been matched correctly, (Griffin, Par. 0050)

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMARA ABDI whose telephone number is (571)272-0273. The examiner can normally be reached 9:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached at (571) 272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/AMARA ABDI/Primary Examiner, Art Unit 2668                                                                                                                                                                                            01/05/2026
Read full office action
Prosecution Timeline

Jan 05, 2024
Application Filed
Jan 05, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/569,692
Patent 12602822
METHOD DEVICE AND STORAGE MEDIUM FOR BACK-END OPTIMIZATION OF SIMULTANEOUS LOCALIZATION AND MAPPING
2y 5m to grant Granted Apr 14, 2026
18/962,814
Patent 12597252
METHOD OF TRACKING OBJECTS
2y 5m to grant Granted Apr 07, 2026
18/288,713
Patent 12576595
SYSTEMS AND METHODS FOR IMPROVED VOLUMETRIC ADDITIVE MANUFACTURING
2y 5m to grant Granted Mar 17, 2026
18/222,744
Patent 12574469
VIDEO SURVEILLANCE SYSTEM, VIDEO PROCESSING APPARATUS, VIDEO PROCESSING METHOD, AND VIDEO PROCESSING PROGRAM
2y 5m to grant Granted Mar 10, 2026
18/222,360
Patent 12563154
VIDEO SURVEILLANCE SYSTEM, VIDEO PROCESSING APPARATUS, VIDEO PROCESSING METHOD, AND VIDEO PROCESSING PROGRAM
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
76%
With Interview (-7.5%)
2y 7m
Median Time to Grant
Low
PTA Risk
Based on 816 resolved cases by this examiner. Grant probability derived from career allow rate.