Office Action Analysis: 18696896 — VIDEO SUPER-RESOLUTION METHOD AND DEVICE

Examiner Intelligence

THOMAS, SOUMYA View full profile →
Grants 100% — above average
Career Allow Rate
2 granted / 2 resolved
+38.0% vs TC avg
Minimal +0% lift
Without
With
+0.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
17 currently pending
Career history
19
Total Applications
across all art units
Statute-Specific Performance

§101
6.8%
-33.2% vs TC avg
§103
64.4%
+24.4% vs TC avg
§102
13.6%
-26.4% vs TC avg
§112
11.9%
-28.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 2 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings are objected to because ‘RBD’ in Fig.2 should read RDB (where ‘RDB’ stand for residual dense block).  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 101
	35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because Claim 11 is directed to a ‘computer-readable storage medium’. Under the broadest reasonable interpretation, this limitation could include transitory computer-readable storage medium, such as carrier waves (see MPEP § 2106.03). 
The examiner suggests rewriting Claim 11 to read ‘A non-transitory computer-readable storage medium’. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1-3 and 6-8 are rejected under 35 U.S.C. 102(a)(1) and  as being anticipated by Nah et al. (S. Nah, H. Dong, et al., "NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 2019), hereinafter Nah. 

As to Claim 1, Nah teaches a video super-resolution method, comprising (see pg. 1991, Section 4.7,  “XJTU-IAIR team proposes a flow-guided spatio temporal dense network (FSTDN) for the joint video de blurring and super-resolution task as shown in Fig. 9.”, and see corresponding network shown in Fig. 9):
acquiring a first feature, wherein the first feature is a feature obtained by merging an initial feature of a target video frame and an initial feature of each of neighborhood video frames of the target video frame (see Fig. 9, where the 5D tensor is the first feature, formed by extracting features from target frame                         
                            
                                    L
                                    R
                                
                                    t
                                
                     and the neighborhood of frames                         
                            
                                    L
                                    R
                                
                                    t
                                    +
                                    1
                                
                            ,
                             
                                    L
                                    R
                                
                                    t
                                    +
                                    2
                                    ,
                                     
                                    L
                                    R
                                
                                    t
                                    -
                                    1
                                
                            ,
                        
                     and                         
                            
                                    L
                                    R
                                
                                    t
                                    -
                                    2
                                
                    ).

    PNG
    media_image1.png
    423
    655
    media_image1.png
    Greyscale

(Fig. 9 of Nah) 

    PNG
    media_image2.png
    358
    757
    media_image2.png
    Greyscale

(Fig. 2 of Instant Application)
processing the first feature by concatenated multistage residual dense blocks (RDBs) (see Fig. 9, multiple residual dense blocks labeled 3D-RDB), 

    PNG
    media_image3.png
    423
    510
    media_image3.png
    Greyscale

(Fig. 9 of Nah) 

    PNG
    media_image4.png
    362
    680
    media_image4.png
    Greyscale

(Fig. 2 of Instant Application)
to obtain a fusion feature output by a RDB in each stage (see Fig. 9, see features output from 3D-RDBs, labelled                          
                            
                                    F
                                
                                    1
                                
                    ,                         
                            
                                    F
                                
                                    d
                                
                    , and                         
                            
                                    F
                                
                                    D
                                
                    );

    PNG
    media_image5.png
    423
    510
    media_image5.png
    Greyscale

(Fig. 9 of Nah) 

    PNG
    media_image6.png
    381
    665
    media_image6.png
    Greyscale

(Fig. 2 of Instant Application)

for the fusion feature output by the RDB in each stage, aligning each of neighborhood features of the fusion feature with a target feature of the fusion feature to obtain an alignment feature corresponding to the RDB that outputs the fusion feature, (see Fig. 9, where                         
                            
                                    F
                                
                                    1
                                
                                    W
                                    a
                                    r
                                    p
                                
                    ,                         
                            
                                    F
                                
                                    d
                                
                                    W
                                    a
                                    r
                                    p
                                
                    , and                         
                            
                                    F
                                
                                    D
                                
                                    W
                                    a
                                    r
                                    p
                                
                      are all alignment features corresponding to their respective 3D-RDB blocks, and see Feature Warping Layer of Fig. 9, where the neighborhood of features comprising the ‘fusion feature’                         
                            
                                    F
                                
                                    D
                                
                     are warped by to a target feature)

    PNG
    media_image7.png
    423
    358
    media_image7.png
    Greyscale

(Fig. 9 of Nah) 

    PNG
    media_image8.png
    389
    770
    media_image8.png
    Greyscale

(Fig. 2 of Instant Application)

    PNG
    media_image9.png
    357
    881
    media_image9.png
    Greyscale

(Fig. 9 of Nah) 

    PNG
    media_image10.png
    536
    976
    media_image10.png
    Greyscale

(Fig. 4 of Instant Application)

wherein each of the neighborhood features of the fusion feature is a feature corresponding to each of the neighborhood video frames, and the target feature of the fusion feature is a feature corresponding to the target video frame (see Fig. 9, where the ‘fusion feature’                         
                            
                                    F
                                
                                    D
                                
                     is split per frame, and the target feature is                          
                            
                                    F
                                
                                    d
                                    ,
                                    t
                                
                     corresponds to a feature of the target frame , and the neighborhood features                         
                            
                                    F
                                
                                    d
                                    ,
                                    t
                                    +
                                    1
                                
                            ,
                             
                                    F
                                
                                    d
                                    ,
                                    t
                                    +
                                    2
                                    ,
                                     
                                    F
                                
                                    d
                                    ,
                                    t
                                    -
                                    1
                                
                            ,
                            
                                    F
                                
                                    d
                                    ,
                                    t
                                    -
                                    2
                                
                     correspond to                         
                            
                                    L
                                    R
                                
                                    t
                                    +
                                    1
                                
                            ,
                             
                                    L
                                    R
                                
                                    t
                                    +
                                    2
                                    ,
                                     
                                    L
                                    R
                                
                                    t
                                    -
                                    1
                                
                            ,
                            
                                    L
                                    R
                                
                                    t
                                    -
                                    2
                                
                     respectively),

    PNG
    media_image11.png
    357
    885
    media_image11.png
    Greyscale

(Fig. 9 of Nah)

and generating a super-resolution video frame corresponding to the target video frame on the basis of the alignment feature corresponding to the RDB in each stage and the initial feature of the target video frame (see Fig 9., super resolution video frame                        
                             
                                    H
                                    R
                                
                                    t
                                
                     , generated from the alignment features and the initial feature                         
                            
                                    F
                                
                                    t
                                
                            ,
                        
                     which is connected by the red dotted arrow). 

    PNG
    media_image12.png
    453
    1464
    media_image12.png
    Greyscale

(Fig. 9 of Nah) 

    PNG
    media_image13.png
    391
    793
    media_image13.png
    Greyscale

(Fig. 2 of Instant Application)

As to Claim 2, Nah teaches acquiring an optical flow between each of the neighborhood video frames and the target video frame respectively (see pg. 1991, Section 4.7,  “XJTU-IAIR team proposes a flow-guided patio temporal dense network (FSTDN) for the joint video de blurring and super-resolution task as shown in Fig. 9.”, and see calculated flows                          
                            
                                    F
                                    l
                                    o
                                    w
                                
                                    t
                                    +
                                    1
                                
                            ,
                             
                                    F
                                    l
                                    o
                                    w
                                
                                    t
                                    +
                                    1
                                
                            ,
                            
                                    F
                                    l
                                    o
                                    w
                                
                                    t
                                    +
                                    1
                                
                            ,
                            
                                    F
                                    l
                                    o
                                    w
                                
                                    t
                                    +
                                    1
                                
                     , which represent the optical flow between the target frame                         
                            
                                    L
                                    R
                                
                                    t
                                
                     and each respective neighboring frame)

    PNG
    media_image14.png
    357
    815
    media_image14.png
    Greyscale

(Fig. 9 of Nah) 

    PNG
    media_image15.png
    493
    716
    media_image15.png
    Greyscale
(Fig. 2 of Instant Application)

and aligning each of neighborhood features of the fusion feature with a target feature of the fusion feature on the basis of the optical flow between each of the neighborhood video frames and the target video frame (see Fig. 9, ‘Feature Warping Layer’, where each feature fusion feature                          
                            
                                    F
                                
                                    d
                                
                     is warped (aligned) using the flow calculated from the neighboring frames and target frames),

    PNG
    media_image9.png
    357
    881
    media_image9.png
    Greyscale

(Fig. 9 of Nah) 

to obtain an alignment feature corresponding to the RDB that outputs the alignment feature (see                          
                            
                                    F
                                
                                    d
                                
                                    W
                                    a
                                    r
                                    p
                                
                     is generated for its respective 3D-RBD D block).

    PNG
    media_image7.png
    423
    358
    media_image7.png
    Greyscale

(Fig. 9 of Nah) 

As to Claim 3, Nah teaches splitting the fusion feature to obtain each of the neighborhood features and the target feature (see Fig.9, ‘Feature Warping Layer’, where the ‘fusion feature                         
                            
                                    F
                                
                                    d
                                
                     is split to obtain target feature                         
                            
                                    F
                                
                                    d
                                    ,
                                     
                                    t
                                
                     and the neighboring features                         
                            
                                    F
                                
                                    d
                                    ,
                                    t
                                    +
                                    1
                                
                            ,
                             
                                    F
                                
                                    d
                                    ,
                                    t
                                    +
                                    2
                                    ,
                                     
                                    F
                                
                                    d
                                    ,
                                    t
                                    -
                                    1
                                
                            ,
                            
                                    F
                                
                                    d
                                    ,
                                    t
                                    -
                                    2
                                
                    ),
    PNG
    media_image11.png
    357
    885
    media_image11.png
    Greyscale

(Fig. 9 of Nah) 

aligning each of the neighborhood features with the target feature on the basis of the optical flow between each of the neighborhood video frames and the target video frame, to obtain an alignment feature for each of the neighborhood video frames (see Fig. 9, ‘Feature Warping Layer’, where each feature fusion feature of                         
                            
                                    F
                                
                                    d
                                
                                    (
                                     
                                    F
                                
                                    d
                                    ,
                                    t
                                    +
                                    1
                                
                            ,
                             
                                    F
                                
                                    d
                                    ,
                                    t
                                    +
                                    2
                                    ,
                                     
                            ,
                            
                                    F
                                
                                    d
                                    ,
                                    t
                                
                            ,
                             
                                    F
                                
                                    d
                                    ,
                                    t
                                    -
                                    1
                                
                            ,
                            
                                    F
                                
                                    d
                                    ,
                                    t
                                    -
                                    2
                                
                    )  is warped (or aligned) using the flow calculated from the neighboring frames and target frames);
and merging the target feature and the alignment feature of each of the neighborhood video frames to obtain an alignment feature corresponding to the RDB that outputs the fusion feature (see Fig.9, where the warped features of the fusion features are concatenated to form                         
                            
                                    F
                                
                                    d
                                
                                    W
                                    a
                                    r
                                    p
                                
                    ).

    PNG
    media_image16.png
    357
    850
    media_image16.png
    Greyscale

(Fig. 9 of Nah) 

	As to Claim 6, Nah teaches that generating a super-resolution video frame corresponding to the target video frame on the basis of the alignment feature corresponding to the RDB in each stage and the initial feature of the target video frame, comprises: merging alignment features corresponding to the multistage RDBs to obtain a second feature (see Fig.9,  the alignment features                         
                            
                                    F
                                
                                    1
                                
                                    W
                                    a
                                    r
                                    p
                                
                    ,                         
                            
                                    F
                                
                                    d
                                
                                    W
                                    a
                                    r
                                    p
                                
                    , and                         
                            
                                    F
                                
                                    D
                                
                                    W
                                    a
                                    r
                                    p
                                
                      being concatenated to form a 5D tensor),

    PNG
    media_image17.png
    522
    886
    media_image17.png
    Greyscale

(Fig. 9 of Nah) 

and converting, based on a feature conversion network, the second feature into a feature having the same tensor as an initial feature of the target video frame to obtain a third feature (see Fig. 9, ‘Temporal Fusion’, and see how the initial 5D Tensor (with dimensions n*(64*D)*5*h*w) is converted to a 4D tensor (with dimensions n*64*h*w). Additionally, see how the initial feature of the target frame is summed with the fourth feature, thus implying that the third feature is the same dimensions as the initial feature),

    PNG
    media_image18.png
    485
    1393
    media_image18.png
    Greyscale

(Fig. 9 of Nah) 

and generating a super-resolution video frame corresponding to the target video frame on the basis of the third feature and the initial feature of the target video frame (see Fig. 9, super resolution video frame                        
                             
                                    H
                                    R
                                
                                    t
                                
                     , generated from the 4D tensor and the initial feature                         
                            
                                    F
                                
                                    t
                                
                            ,
                        
                     which is connected by the red dotted arrow).

    PNG
    media_image19.png
    443
    1431
    media_image19.png
    Greyscale

(Fig. 9 of Nah) 

As to Claim 8, Nah teaches comprises: performing summation fusion on the third feature and the initial feature of the target video frame to obtain a fourth feature (see Fig.9, where the 4D tensor is the ‘third feature’ and the initial feature is added as indicated by the summation sign to obtain the fourth feature)

    PNG
    media_image20.png
    485
    1316
    media_image20.png
    Greyscale

(Fig. 9 of Nah) 

processing the fourth feature by a residual dense network RDN to obtain a fifth feature (see Fig 9, where the fourth feature is input into a 2D RDN to obtain the fifth feature); 

    PNG
    media_image21.png
    485
    623
    media_image21.png
    Greyscale

(Fig. 9 of Nah) 

and upsampling the fifth feature to obtain a super-resolution video frame corresponding to the target video frame (see the upsampling block after the 2D RDN, which then outputs the super-resolution frame                        
                             
                                    H
                                    R
                                
                                    t
                                
                    ).
	
    PNG
    media_image22.png
    485
    785
    media_image22.png
    Greyscale

(Fig. 9 of Nah) 

    PNG
    media_image23.png
    331
    827
    media_image23.png
    Greyscale

(Fig. 4 of Instant Application)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Nah et al. (S. Nah et al., "NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 2019), hereinafter Nah, in view of Gupta et al. (A. Gupta, et al., "Enhancing and experiencing spacetime resolution with videos and stills," 2009 IEEE International Conference on Computational Photography (ICCP)), hereinafter Gupta.

As to Claim 4, Nah fails to explicitly teach upsampling the target video frame and each of the neighborhood video frames of the target video frame, to obtain an upsampled video frame of the target video frame and an upsampled video frame of each of the neighborhood video frames; acquiring an optical flow between the upsampled video frame of each of the neighborhood video frames and the upsampled video frame of the target video frame; and aligning each of the neighborhood features of the fusion feature with the target feature of the fusion feature on the basis of the optical flow between the upsampled video frame of each of the neighborhood video frames and the upsampled video frame of the target video frame, to obtain an alignment feature corresponding to the RDB that outputs the fusion feature.
However, in an analogous art, Gupta teaches a method for enhancing the spacetime resolution of videos (see abstract on page 1), which includes upsampling adjacent video frames (see page 3, section 3.1, “The input consists of a stream of low-resolution frames with intermittent high-resolution stills. We upsample the low-resolution frames using bicubic interpolation to match the size of the high-resolution stills and denote them by fi. For each fi, the nearest two high-resolution stills are denoted as Sleft and Sright”),
then calculating the flow between the upsampled frames (see page 3.1, “The system estimates motion between every fi and corresponding Sleft & Sright… One approach is to compute optical flow directly from the high-resolution stills, Sleft or Sright, to the upsampled frames fi”.)
and then aligning the frames on the basis of optical flow between the upsampled video frames see page 3, section 3.1, “Once the system has computed correspondences from Sleft to fi and Sright to fi, it warps the high-resolution stills to bring them into alignment with fi”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the upsampling taught by Gupta with the super-resolution method taught by Nah. Gupta teaches on page3, section 3.1, “The summed motion estimation serves as initialization to bring long range motion within the operating range of the optical flow algorithm and reduces the errors accumulated from the pairwise sums.” Thus, it would have been obvious to combine the teachings of Gupta with the teachings of Nah in order to obtain the invention as claimed in Claim 4.

Claims 7, 10, 13-14, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Nah et al. (S. Nah et al., "NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 2019), hereinafter Nah in view of Hu et al. (CN 112565887), hereinafter Hu. 	

As to Claim 7, Nah teaches, the feature conversion network comprises a first convolutional layer, a second convolutional layer, and a third convolutional layer concatenated sequentially; and the second convolutional layer and the third convolutional layer both have a kernel of 3*3*3 and have a padding parameter of 0 in a time dimension and a padding parameter of 1 in both length dimension and width dimension (see Fig.9, ‘Temporal Fusion ‘ block with three convolutional layers, and see kernel and padding labeled for the second and third convolutional layer, where ‘k’ stands for kernel, and ‘pad’ stands for padding).

    PNG
    media_image24.png
    483
    492
    media_image24.png
    Greyscale

(Fig.9 of Nah, with kernel and padding size) 

    PNG
    media_image25.png
    229
    277
    media_image25.png
    Greyscale

(Fig.5 of instant application, with kernel and padding sizes) 

Nah fails to explicitly teach that the first convolutional layer has a kernel of 1 *1* 1 and has a padding parameter of 0 in each dimension.
	However, Hu teaches a super-resolution method, which includes a pointwise convolution kernel (see paragraph [0102], “This application introduces the depthwise separable convolution in the neural network model. The depthwise separable convolution uses different convolution kernels for each channel of the input image for operation and operation, and the operation steps can be divided into depthwise convolution (Depthwise) Convolution with point (Pointwise)”, and see paragraph [0104], “The convolution kernel of deep convolution is k×k, the channel is cd, and the convolution kernel of point convolution is 1 ×1”, where it is known in the art that a pointwise kernel has no padding).
	Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the convolutional kernel taught by Hu with the super-resolution method taught by Nah. The motivation for doing so would be would be to reduce the amount of calculation needed (see paragraph [0104] and [0106], “Further, the depth separable convolution is to split the one-step convolution operation into two steps of deep convolution and point convolution…Compared with the standard convolution, the amount of calculation is reduced”). Thus, it would have been obvious to combine the kernel taught by Hu with the teachings of Nah in order to obtain the invention as claimed in Claim 7. 

	As to Claim 10, Claim 10 is directed towards an electronic device, comprising a memory and a processor, the memory being configured to store a computer program, the processor being configured to, when executing the computer program, cause the electronic device to implement the same method as claimed in Claim 1. 
	Nah teaches the video super-resolution method of Claim 1, but fails to explicitly teach an electronic device comprising a memory and a processor.
	However, Hu teaches a video super-resolution device (see paragraph [0001], “The embodiments of the present invention provide a video processing method, device, terminal, and storage medium, which can adaptively adjust a super-resolution strategy to perform super-resolution reconstruction on a video stream, thereby effectively improving video quality”),
which comprises a memory and processor (see paragraph [0060], “In another aspect, an embodiment of the present invention provides an intelligent terminal, which includes a processor, a communication interface, and a memory”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the video super-resolution device taught by Hu with the video super resolution method that by Nah. The motivation for doing so would be to integrate the device into another system. Hu teaches in paragraph [0077], “The video processing system may be specifically integrated in an electronic device, and the electronic device may be a terminal or a server. For example, the video processing system can be integrated in the terminal. The terminal may be a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal computer (PC, Personal Computer), a TV, or other smart playback device, which is not limited in this application.” Thus, it would have been obvious to combine the video-super resolution device taught by Hu with the method taught by Nah in order to obtain the invention as claimed in Claim 10.

	As to Claim 11, Claim 11 is directed towards a computer-readable storage medium, the computer-readable storage medium storing a computer program which, when executed by a computing device, causing the computing device to implement the same method as claimed in Claim 1. 
	Nah teaches the video super-resolution method of Claim 1, but fails to explicitly teach a computer-readable storage medium.
	However, Hu teaches a computer-readable storage medium (see paragraph [0001], “The embodiments of the present invention provide a video processing method, device, terminal, and storage medium, which can adaptively adjust a super-resolution strategy to perform super-resolution reconstruction on a video stream, thereby effectively improving video quality”),
	which can contain a computer program (see paragraph [0060], “The processor, the communication interface, and the memory are connected to each other, wherein the memory is used to store a computer program, The computer program includes program instructions, and the processor is configured to call the program instructions for performing operations involved in the foregoing video processing method”). 
	Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date  of the claimed invention to combine the video processing device taught by Hu with the video processing method taught by Nah. The motivation for doing so would be to integrate the device into other electronic devices, as taught by Hu in paragraph [0077]. Thus, it would have been obvious to combine the video-super resolution device taught by Hu with the super-resolution method taught by Nah in order to obtain the invention as claimed in Claim 11.

	As to Claim 13, Claim 13 claims the same limitation as Claim 2 and is dependent on a similarly rejected independent claim. Therefore, the rejection and rationale are analogous to that made in Claim 2.
As to Claim 14, Claim 14 claims the same limitation as Claim 3 and is dependent on a similarly rejected independent claim. Therefore, the rejection and rationale are analogous to that made in Claim 3.
As to Claim 17, Claim 17 claims the same limitation as Claim 6 and is dependent on a similarly rejected independent claim. Therefore, the rejection and rationale are analogous to that made in Claim 6.
As to Claim 18, Claim 18 claims the same limitation as Claim 7 and is dependent on a similarly rejected independent claim. Therefore, the rejection and rationale are analogous to that made in Claim 7.
As to Claim 19, Claim 19 claims the same limitation as Claim 6 and is dependent on a similarly rejected independent claim. Therefore, the rejection and rationale are analogous to that made in Claim 8.

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Nah et al. (S. Nah et al., "NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results,"2019), hereinafter Nah, in view of Gupta et al. (A. Gupta, et al., "Enhancing and experiencing spacetime resolution with videos and stills," 2009 IEEE International Conference on Computational Photography (ICCP)), hereinafter Gupta, and further in view of Hu et al. (CN 112565887), hereinafter Hu. 	

As to Claim 4, Nah and Hu fail to explicitly teach upsampling the target video frame and each of the neighborhood video frames of the target video frame, to obtain an upsampled video frame of the target video frame and an upsampled video frame of each of the neighborhood video frames; acquiring an optical flow between the upsampled video frame of each of the neighborhood video frames and the upsampled video frame of the target video frame; and aligning each of the neighborhood features of the fusion feature with the target feature of the fusion feature on the basis of the optical flow between the upsampled video frame of each of the neighborhood video frames and the upsampled video frame of the target video frame, to obtain an alignment feature corresponding to the RDB that outputs the fusion feature.
However, in an analogous art, Gupta teaches a method for enhancing the spacetime resolution of videos (see abstract on page 1), which includes upsampling adjacent video frames (see page 3, section 3.1, “The input consists of a stream of low-resolution frames with intermittent high-resolution stills. We upsample the low-resolution frames using bicubic interpolation to match the size of the high-resolution stills and denote them by fi. For each fi, the nearest two high-resolution stills are denoted as Sleft and Sright”),
then calculating the flow between the upsampled frames (see page 3.1, “The system estimates motion between every fi and corresponding Sleft & Sright… One approach is to compute optical flow directly from the high-resolution stills, Sleft or Sright, to the upsampled frames fi”.)
and then aligning the frames on the basis of optical flow between the upsampled video frames see page 3, section 3.1, “Once the system has computed correspondences from Sleft to fi and Sright to fi, it warps the high-resolution stills to bring them into alignment with fi”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the upsampling taught by Gupta with the super-resolution method taught by Nah and Hu. Gupta teaches on page3, section 3.1, “The summed motion estimation serves as initialization to bring long range motion within the operating range of the optical flow algorithm and reduces the errors accumulated from the pairwise sums.” Thus, it would have been obvious to combine the upsampling taught by Gupta with the teachings of Nah and Hu in order to obtain the invention as claimed in Claim 4.

Allowable Subject Matter
Claims 5 and 10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 
Nah, Gupta, and Hu fail to teach: upsampling each of the neighborhood features and the target feature respectively, to obtain an upsampled feature of each of the neighborhood video frames and an upsampled feature of the target video frame; aligning the upsampled feature of each of the neighborhood video frames with the upsampled feature of the target video frame on the basis of the optical flow between the upsampled video frame of each of the neighborhood video frames and the upsampled video frame of the target video frame, to obtain an upsampled alignment feature of each of the neighborhood video frames; performing a space-to-depth conversion on the upsampled feature of the target video frame and the upsampled aligned feature of each of the neighborhood video frames respectively, to obtain an equivalent feature of the target video frame and an equivalent feature of each of the neighborhood video frames; and merging the equivalent feature of the target video frame and the equivalent feature of each of the neighborhood video frames, to obtain an alignment feature corresponding to the RDB that outputs the fusion features. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Porikli (US Pub No 2022/0222776) teaches a video super-resolution method comprising acquiring a first feature, processing the first feature with a network residual dense units, and then using the output of the residual to output a high-resolution frame. The frame is then aligned with previously processed frames in order, and then put into another network in order to generate a frame with higher resolution. Porikli fails to teach a ‘fusion feature’ comprising a target feature and multiple neighboring features.
Hou (CN 113628115) teaches space-to-depth conversion of features for the purposes of super-resolution. However, Hou fails to explicitly teach upsampling each feature of the fusion feature, and to obtain an equivalent feature of the target video frame and an equivalent feature of each of the neighborhood video frames; and merging the equivalent feature of the target video frame and the equivalent feature of each of the neighborhood video frames.
	Wang et al. (CN 111583112), cited in the Chinese Search Report,  teaches a method for video super-resolution which includes aligning video frames through deformable convolution. However, the alignment occurs before the frames are input into the residual dense network, and thus each feature produced by the RDB is not aligned. The same author published a paper (H. Wang, D. Su, C. Liu, L. Jin, X. Sun and X. Peng, "Deformable Non-Local Network for Video Super-Resolution," in IEEE Access, vol. 7, pp. 177734-177744, 2019), that teaches a similar architecture that also teaches aligning video frames before inputting the frames into a residual network. 
	Dai et al. (CN 112767251), cited in the Chinese Search Report is directed towards a method of image super-resolution. Dai teaches extracting and fusing feature, but fails to teach aligning features. 
Du et al. (X. Du, Y. Zhou, Y. Chen, Y. Zhang, J. Yang and D. Jin, "Dense-Connected Residual Network for Video Super-Resolution," 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 2019), teaches a residual network for video super-resolution, that uses optical flow to align video frames. However, the video frames are aligned before they are input into the residual network. 
	Su et al. (D. Su, H. Wang, L. Jin, X. Sun and X. Peng, "Local-Global Fusion Network for Video Super-Resolution," in IEEE Access, vol. 8, pp. 172443-172456, 2020) teaches a video super-resolution method that uses residual blocks to extract features, and then aligns the features. However, Su fails to teach that the features are aligned to a target feature of a fusion feature. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOUMYA THOMAS whose telephone number is (571)272-8639. The examiner can normally be reached M-F 8:30-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached at (571) 272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.T./Examiner, Art Unit 2664                                                                                                                                                                                                        
/JENNIFER MEHMOOD/Supervisory Patent Examiner, Art Unit 2664
Read full office action
Prosecution Timeline

Mar 28, 2024
Application Filed
Mar 06, 2026
Non-Final Rejection — §101, §102, §103 (current)
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

1-2
Expected OA Rounds
100%
Grant Probability
99%
With Interview (+0.0%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 2 resolved cases by this examiner. Grant probability derived from career allow rate.
VIDEO SUPER-RESOLUTION METHOD AND DEVICE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

VIDEO SUPER-RESOLUTION METHOD AND DEVICE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email