Last updated: April 19, 2026
Application No. 18/688,705
VIDEO MERGING METHOD, DEVICE, EQUIPMENT AND MEDIUM

Final Rejection §103§112
Filed
Mar 01, 2024
Examiner
TRAN, LOI H
Art Unit
2484
Tech Center
2400 — Computer Networks
Assignee
BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.
OA Round
2 (Final)
This examiner grants 64% of cases after interview

— +23.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 611 resolved cases, 2023–2026
Examiner Intelligence

TRAN, LOI H View full profile →
Grants 64% of resolved cases
Career Allow Rate
394 granted / 611 resolved
+6.5% vs TC avg
Strong +24% interview lift
Without
With
+23.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
25 currently pending
Career history
636
Total Applications
across all art units
Statute-Specific Performance

§101
6.3%
-33.7% vs TC avg
§103
54.9%
+14.9% vs TC avg
§102
14.8%
-25.2% vs TC avg
§112
12.5%
-27.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 611 resolved cases
Office Action

§103 §112
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Interpretation under 35 USC § 112
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

3.	The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
          (A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
          (B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
          (C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f), except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) except as otherwise indicated in an Office action.
4.	This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f), because the claim limitations use a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are:  a clip acquisition module, an image processing module, an audio processing module, and a clip merging module in claim 13.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f), they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f), applicant may: (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f).

Claim Rejections - 35 USC § 112
5.	The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


6.	Claim 7 is rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claim 7 recites “wherein the image quality conversion algorithm comprises a conversion algorithm between LDR and HDR”. The abbreviated terms LDR and HDR are not previously defined in the claim, and therefore render the claim indefinite.  

Claim Rejections - 35 USC § 103
7.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1,148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 
1. Determining the scope and contents of the prior art. 
2. Ascertaining the differences between the prior art and the claims at issue. 
3. Resolving the level of ordinary skill in the pertinent art. 
4. Considering objective evidence present in the application indicating     
    obviousness or nonobviousness.

8.	Claims 1-6, 8, and 10-14 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Li et al. (English Translation of Chinese Publication CN112367481 02-2021) in view of Cui (US Publication 2020/0411059). 
Regarding claim 1, Li discloses a video merging method, comprising:
obtaining a first video clip and a second video clip which are to be merged (Li, para’s 0006-0009, acquiring multiple video clips to be processed, wherein each video clip includes multiple video frames and the processed clips are to be spliced together);
performing image processing on the first video clip and the second video clip, wherein the first video clip after the image processing and the second video clip after the image processing have a same picture display effect, and wherein the picture display effect comprises an image quality and/or a picture style (Li, para’s 0010-0012, preprocessing the video frames for the plurality of video clips, wherein the preprocessing includes at least one of adjusting resolution and correcting video rotation direction; performing rendering processing on the pre-processed video clips to obtain a plurality of normalized video clips, wherein the rendering processing includes at least one of filter processing and background rendering); 
merging the first video clip after the image processing and the audio processing with the second video clip after the image processing and the audio processing (Li, para. 0012, splicing the normalized video clips based on a specified splicing method to obtain a target video).
Li does not explicitly disclose but Cui discloses performing audio processing on the first video clip and the second video clip, wherein the first video clip after the audio processing and the second video clip after the audio processing have a same background sound (Cui, para’s 0041-0058, acquiring a first video, separate the first video through an application program having a sound file separation function to obtain the first video signal and the audio signal; the first video signal may be placed in a video track and the audio signal may be placed in an audio track; acquiring a second video signal; synthesize the acquired second video signal with the separated audio signal to obtain the second video; as a result, the audio signal of the first video and the audio signal of the second video are the same;  clipping the first video and the second video to obtain multiple first short videos and multiple second short videos respectively; stitching the first short videos and the second short videos to obtain a target synthesized video).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Cui’s features into Li’s invention for enhancing user’s auditory experience by providing a synthesized video having consistent and uniform audio.
	Regarding claim 2, Li-Cui discloses the method according to claim 1, wherein the performing image processing on the first video clip and the second video clip comprises: determining a target picture display effect; and converting both an original picture display effect of the first video clip and an original picture display effect of the second video clip into the target picture display effect (Li, para’s 0010-0012, performing at least one of adjusting resolution and correcting video rotation direction; performing at least one of filter processing and background rendering on the pre-processed video clips to obtain a plurality of normalized video clips; para’s 0047-0057, the resolutions of the video frames in different video segments may be different and inconsistent; after determining the first resolution corresponding to the video window, for any video frame in multiple video clips, it can be determined whether the resolution of the video frame is greater than the first resolution. If it is greater, the resolution of the video frame can be reduced. If it is equal to or less than, the resolution of the video frame may not be adjusted. It is also obvious that one can select original resolution of one of the video clips as a target resolution, and adjust the resolution of the video frames in other clip(s) according to the selected target resolution).
Regarding claim 3, Li-Cui discloses the method according to claim 2, wherein the determining the target picture display effect comprises: using a preset picture display effect as the target picture display effect; or determining the target picture display effect according to the original picture display effect of the first video clip and the original picture display effect of the second video clip (Li, para’s 0010-0012, performing at least one of adjusting resolution and correcting video rotation direction; performing at least one of filter processing and background rendering on the pre-processed video clips to obtain a plurality of normalized video clips).
Regarding claim 4, Li-Cui discloses the method according to claim 3, wherein the picture display effect comprises the image quality and the picture style (Li, para’s 0010-0012, performing at least one of adjusting resolution and correcting video rotation direction; performing at least one of filter processing and background rendering on the pre-processed video clips to obtain a plurality of normalized video clips); 
the determining the target picture display effect according to the original picture display effect of the first video clip and the original picture display effect of the second video clip comprises: selecting one of an original image quality of the first video clip and an original image quality of the second video clip as a target image quality (Li, para’s 0047-0048, the resolutions of the video frames in different video segments may be different; para. 0057, after determining the first resolution corresponding to the video window, for any video frame in multiple video clips, it can be determined whether the resolution of the video frame is greater than the first resolution. If it is greater, the resolution of the video frame can be reduced. If it is equal to or less than, the resolution of the video frame may not be adjusted. It is also obvious and/or well known in the art that one can select original resolution of one of the video clips as target resolution, and adjust the resolution of the video frames in other clip(s) according to the selected target resolution); 
selecting one of an original picture style of the first video clip and an original picture style of the second video clip as a target picture style (Li, para’s 0051, 0061, and 0067, the multiple video clips may be captured in portrait mode and/or landscape mode; if the video window is a window displayed in landscape mode, and the video frames in the video clip are displayed in portrait mode, the rotation direction of the video frames can be corrected, and the vertical display of the video frames can be adjusted to horizontal display. Similarly, if the video window is a window displayed in portrait mode, and the video frames in the video clip are displayed in landscape mode, the horizontal display of the video frames can be adjusted to vertical display; it is obvious and/or well  known in the art that one can select portrait mode or landscape mode corresponding to one of the  captured clips as target display mode or target picture style for displaying the video frames, and similarly select a background using the original video background or using a solid color or gradient color as the background); and
determining the target picture display effect based on the target image quality and the target picture style (Li, para’s 0010-0012, generating the target video having the selected target resolution and target picture style).
Regarding claim 5, Li-Cui discloses the method according to claim 4, wherein the selecting one of the original image quality of the first video clip and the original image quality of the second video clip as the target image quality comprises: selecting one of the original image quality of the first video clip and the original image quality of the second video clip as the target image quality according to a preset quality selection strategy; wherein the quality selection strategy comprises: quality selection based on user instructions or quality selection based on an image quality comparison result between the first video clip and the second video clip (Li, para’s 0047-0048, the resolutions of the video frames in different video segments may be different. It is obvious and/or well known in the art that user can select the original resolution of one of the video clips as a preferred target resolution based on image quality comparison and display compatibility with user display device).
Regarding claim 6, Li-Cui discloses the method according to claim 4, wherein the selecting one of the original picture style of the first video clip and the original picture style of the second video clip as the target picture style comprises: selecting one of the original picture style of the first video clip and the original picture style of the second video clip as the target picture style according to a preset style selection strategy; wherein the style selection strategy comprises: style selection based on user instructions, style selection based on video sources, or style selection based on clip sorting positions (Li, para’s 0051, 0061, and 0067, obtaining multiple video clips with the same resolution, video rotation, and effects such as filters, backgrounds; if the video window is a window displayed in landscape mode, and the video frames in the video clip are displayed in portrait mode, the rotation direction of the video frames can be corrected, and the vertical display of the video frames can be adjusted to horizontal display. Similarly, if the video window is a window displayed in portrait mode, and the video frames in the video clip are displayed in landscape mode, the horizontal display of the video frames can be adjusted to vertical display).
Regarding claim 8, Li-Cui discloses the method according to claim 1, wherein the performing audio processing on the first video clip and the second video clip comprises: obtaining an original background sound of the first video clip and an original background sound of the second video clip; determining a target background sound; and converting both the original background sound of the first video clip and the original background sound of the second video clip into the target background sound (Cui, para’s 0041-0058, acquiring a first video, separate the first video through an application program having a sound file separation function to obtain the first video signal and the audio signal; the first video signal may be placed in a video track and the audio signal may be placed in an audio track; acquiring a second video signal; synthesize the acquired second video signal with the separated audio signal to obtain the second video; as a result, the audio signal of the first video and the audio signal of the second video are the same;  clipping the first video and the second video to obtain multiple first short videos and multiple second short videos respectively; stitching the first short videos and the second short videos to obtain a target synthesized video).
The motivation to combine the references and obviousness arguments are the same as claim 1.

Regarding claim 10, Li-Cui discloses the method according to claim 8, wherein the determining the target background sound comprises: using a preset background sound as the target background sound; or determining the target background sound based on the original background sound of the first video clip and the original background sound of the second video clip (Cui, para’s 0041-0058, acquiring a first video, separate the first video through an application program having a sound file separation function to obtain the first video signal and the audio signal; the first video signal may be placed in a video track and the audio signal may be placed in an audio track; acquiring a second video signal; synthesize the acquired second video signal with the separated audio signal to obtain the second video; as a result, the audio signal of the first video and the audio signal of the second video are the same;  clipping the first video and the second video to obtain multiple first short videos and multiple second short videos respectively; stitching the first short videos and the second short videos to obtain a target synthesized video).
The motivation to combine the references and obviousness arguments are the same as claim 1.
Regarding claim 11, Li-Cui discloses the method according to claim 10, wherein the determining the target background sound based on the original background sound of the first video clip and the original background sound of the second video clip comprises: selecting one of the original background sound of the first video clip and the original background sound of the second video clip as the target background sound; or fusing the original background sound of the first video clip and the original background sound of the second video clip to obtain the target background sound (Cui, para’s 0041-0058, acquiring a first video, separate the first video through an application program having a sound file separation function to obtain the first video signal and the audio signal; the first video signal may be placed in a video track and the audio signal may be placed in an audio track; acquiring a second video signal; synthesize the acquired second video signal with the separated audio signal to obtain the second video; as a result, the audio signal of the first video and the audio signal of the second video are the same;  clipping the first video and the second video to obtain multiple first short videos and multiple second short videos respectively; stitching the first short videos and the second short videos to obtain a target synthesized video).
The motivation to combine the references and obviousness arguments are the same as claim 1.
Regarding claim 12, Li-Cui discloses the method according to claim 8, wherein the converting both the original background sound of the first video clip and the original background sound of the second video clip into the target background sound comprises: deleting the original background sound of the first video clip and the original background sound of the second video clip; and uniformly adding the target background sound to the first video clip and the second video clip (Cui, para’s 0041-0058, separating and obtaining an audio signal from a first audio/video content; acquiring a second video signal; separating or deleting original audio track from the acquired second video signal is obviously achievable as is disclosed by Cui;  synthesizing/adding the acquired second video signal with the separated audio signal to obtain the second video; similar process can be performed on another video signal; it is also noted that replacing an original audio track of the video with another audio file is well-known in the art, see Yu et al. , US Patent 9,813,725, col. 6 lines 31-33).
The motivation to combine the references and obviousness arguments are the same as claim 1.
Claims 13-14 are rejected for the same reasons set forth in claim 1, Li-Cui further discloses processor and memory (see Li para’s 0019-0020).

9.	Claim 7 is rejected under AIA  35 U.S.C. 103 as being unpatentable over Li-Cui, as applied to claim 1 above, in view of Zhang et al. (English Translation of Chinese Publication CN106506983 03-2017).
 	Regarding claim 7, Li-Cui discloses the method according to claim 2, wherein the converting the original picture display effect of the first video clip and the original picture display effect of the second video clip into the target picture display effect comprises: determining the original picture display effect that is inconsistent with the target picture display effect based on the original picture display effect of the first video clip and the original picture display effect of the second video clip, and using the inconsistent original picture display effect as a picture display effect to be converted; converting the original image quality in the picture display effect to be converted into the target image quality in the target picture display effect by using a preset image quality conversion algorithm (Li, para’s 0010-0012, performing at least one of adjusting resolution and correcting video rotation direction; performing at least one of filter processing and background rendering on the pre-processed video clips to obtain a plurality of normalized video clips; para’s 0047-0057, the resolutions of the video frames in different video segments may be different and inconsistent; after determining the first resolution corresponding to the video window, for any video frame in multiple video clips, it can be determined whether the resolution of the video frame is greater than the first resolution. If it is greater, the resolution of the video frame can be reduced. If it is equal to or less than, the resolution of the video frame may not be adjusted. It is also obvious that one can select original resolution of one of the video clips as a target resolution, and adjust the resolution of the video frames in other clip(s) according to the selected target resolution; para’s 0051, 0061, and 0067, the multiple video clips may be captured in portrait mode and/or landscape mode; if the video window is a window displayed in landscape mode, and the video frames in the video clip are displayed in portrait mode, the rotation direction of the video frames can be corrected, and the vertical display of the video frames can be adjusted to horizontal display. Similarly, if the video window is a window displayed in portrait mode, and the video frames in the video clip are displayed in landscape mode, the horizontal display of the video frames can be adjusted to vertical display; it is obvious and/or well  known in the art that one can select portrait mode or landscape mode corresponding to one of the  captured clips as target display mode or target picture style for displaying the video frames, and similarly select a background using the original video background or using a solid color or gradient color as the background); and 
migrating the target picture style in the target picture display effect to the picture display effect to be converted by using a preset style migration algorithm, so that the original picture style of the picture display effect to be converted is adjusted to match the target picture style (Li, para’s 0051, 0061, and 0067, the multiple video clips may be captured in portrait mode and/or landscape mode; if the video window is a window displayed in landscape mode, and the video frames in the video clip are displayed in portrait mode, the rotation direction of the video frames can be corrected, and the vertical display of the video frames can be adjusted to horizontal display. Similarly, if the video window is a window displayed in portrait mode, and the video frames in the video clip are displayed in landscape mode, the horizontal display of the video frames can be adjusted to vertical display; it is obvious and/or well  known in the art that one can select portrait mode or landscape mode corresponding to one of the  captured clips as target display mode or target picture style for displaying the video frames, and similarly select a background using the original video background or using a solid color or gradient color as the background).
Li-Cui does not explicitly disclose but Zhang discloses wherein the image quality conversion algorithm comprises a conversion algorithm between LDR and HDR (Zhang, para’s 0029, 0039, 0055, and 0068, generate a curve fit between the LDR image and the HDR image; the generated HDR image is perceived by the human eye, and the generated fitting curve is corrected with reference to the size of the information entropy, so as to find an optimal fitting curve as the curve fitting result between the LDR image and the HDR image)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Zhang’s features into Li-Cui’s invention for enhancing user’s visual playback experience by providing a synthesized video having optimum visual effect.
10.	Claim 9 is rejected under AIA  35 U.S.C. 103 as being unpatentable over Li-Cui, as applied to claim 8 above, in view of Warnick et al. (US Patent 12,334,047).
Regarding claim 9, Li-Cui discloses the method according to claim 8.
Li-Cui does not explicitly disclose but Warnick discloses wherein the obtaining the original background sound of the first video clip and the original background sound of the second video clip comprises: extracting a first specified type of sound contained in the first video clip, and using other sounds except the first specified type of sound as the original background sound of the first video clip; and extracting a second specified type of sound contained in the second video clip, and using other sounds except the second specified type of sound as the original background sound of the second video clip (Warnick, col. 2 line 57 through col. 3 line 13, extract unwanted sound, and replacing unwanted sound with other sound).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Warnick’s features into Li-Cui’s invention for enhancing user’s auditory experience by providing a synthesized video having preferred audio content.

Conclusion
11.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to LOI H TRAN whose telephone number is (571)270-5645. The examiner can normally be reached 8:00AM-5:00PM PST FIRST FRIDAY OF BIWEEK OFF.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, THAI TRAN can be reached at 571-272-7382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LOI H TRAN/           Primary Examiner, Art Unit 2484
Read full office action
Prosecution Timeline

Mar 01, 2024
Application Filed
Sep 06, 2025
Non-Final Rejection — §103, §112
Dec 10, 2025
Response Filed
Feb 24, 2026
Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/463,427
Patent 12598366
CONTENT DATA PROCESSING METHOD AND CONTENT DATA PROCESSING APPARATUS
2y 5m to grant Granted Apr 07, 2026
18/194,454
Patent 12593112
METHOD, DEVICE, AND COMPUTER PROGRAM FOR ENCAPSULATING REGION ANNOTATIONS IN MEDIA TRACKS
2y 5m to grant Granted Mar 31, 2026
18/528,425
Patent 12592261
VIDEO EDITING METHOD AND APPARATUS, AND DEVICE AND STORAGE MEDIUM
2y 5m to grant Granted Mar 31, 2026
17/302,302
Patent 12576798
CAMERA SYSTEM AND ASSISTANCE SYSTEM FOR A VEHICLE AND A METHOD FOR OPERATING A CAMERA SYSTEM
2y 5m to grant Granted Mar 17, 2026
18/322,321
Patent 12579810
SYSTEM AND METHOD FOR AUTOMATIC EVENTS IDENTIFICATION ON VIDEO
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
64%
Grant Probability
88%
With Interview (+23.6%)
2y 10m
Median Time to Grant
Moderate
PTA Risk
Based on 611 resolved cases by this examiner. Grant probability derived from career allow rate.