DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings are objected to because ‘RBD’ in Fig.2 should read RDB (where ‘RDB’ stand for residual dense block). Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because Claim 11 is directed to a ‘computer-readable storage medium’. Under the broadest reasonable interpretation, this limitation could include transitory computer-readable storage medium, such as carrier waves (see MPEP § 2106.03).
The examiner suggests rewriting Claim 11 to read ‘A non-transitory computer-readable storage medium’.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-3 and 6-8 are rejected under 35 U.S.C. 102(a)(1) and as being anticipated by Nah et al. (S. Nah, H. Dong, et al., "NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 2019), hereinafter Nah.
As to Claim 1, Nah teaches a video super-resolution method, comprising (see pg. 1991, Section 4.7, “XJTU-IAIR team proposes a flow-guided spatio temporal dense network (FSTDN) for the joint video de blurring and super-resolution task as shown in Fig. 9.”, and see corresponding network shown in Fig. 9):
acquiring a first feature, wherein the first feature is a feature obtained by merging an initial feature of a target video frame and an initial feature of each of neighborhood video frames of the target video frame (see Fig. 9, where the 5D tensor is the first feature, formed by extracting features from target frame
L
R
t
and the neighborhood of frames
L
R
t
+
1
,
L
R
t
+
2
,
L
R
t
-
1
,
and
L
R
t
-
2
).
PNG
media_image1.png
423
655
media_image1.png
Greyscale
(Fig. 9 of Nah)
PNG
media_image2.png
358
757
media_image2.png
Greyscale
(Fig. 2 of Instant Application)
processing the first feature by concatenated multistage residual dense blocks (RDBs) (see Fig. 9, multiple residual dense blocks labeled 3D-RDB),
PNG
media_image3.png
423
510
media_image3.png
Greyscale
(Fig. 9 of Nah)
PNG
media_image4.png
362
680
media_image4.png
Greyscale
(Fig. 2 of Instant Application)
to obtain a fusion feature output by a RDB in each stage (see Fig. 9, see features output from 3D-RDBs, labelled
F
1
,
F
d
, and
F
D
);
PNG
media_image5.png
423
510
media_image5.png
Greyscale
(Fig. 9 of Nah)
PNG
media_image6.png
381
665
media_image6.png
Greyscale
(Fig. 2 of Instant Application)
for the fusion feature output by the RDB in each stage, aligning each of neighborhood features of the fusion feature with a target feature of the fusion feature to obtain an alignment feature corresponding to the RDB that outputs the fusion feature, (see Fig. 9, where
F
1
W
a
r
p
,
F
d
W
a
r
p
, and
F
D
W
a
r
p
are all alignment features corresponding to their respective 3D-RDB blocks, and see Feature Warping Layer of Fig. 9, where the neighborhood of features comprising the ‘fusion feature’
F
D
are warped by to a target feature)
PNG
media_image7.png
423
358
media_image7.png
Greyscale
(Fig. 9 of Nah)
PNG
media_image8.png
389
770
media_image8.png
Greyscale
(Fig. 2 of Instant Application)
PNG
media_image9.png
357
881
media_image9.png
Greyscale
(Fig. 9 of Nah)
PNG
media_image10.png
536
976
media_image10.png
Greyscale
(Fig. 4 of Instant Application)
wherein each of the neighborhood features of the fusion feature is a feature corresponding to each of the neighborhood video frames, and the target feature of the fusion feature is a feature corresponding to the target video frame (see Fig. 9, where the ‘fusion feature’
F
D
is split per frame, and the target feature is
F
d
,
t
corresponds to a feature of the target frame , and the neighborhood features
F
d
,
t
+
1
,
F
d
,
t
+
2
,
F
d
,
t
-
1
,
F
d
,
t
-
2
correspond to
L
R
t
+
1
,
L
R
t
+
2
,
L
R
t
-
1
,
L
R
t
-
2
respectively),
PNG
media_image11.png
357
885
media_image11.png
Greyscale
(Fig. 9 of Nah)
and generating a super-resolution video frame corresponding to the target video frame on the basis of the alignment feature corresponding to the RDB in each stage and the initial feature of the target video frame (see Fig 9., super resolution video frame
H
R
t
, generated from the alignment features and the initial feature
F
t
,
which is connected by the red dotted arrow).
PNG
media_image12.png
453
1464
media_image12.png
Greyscale
(Fig. 9 of Nah)
PNG
media_image13.png
391
793
media_image13.png
Greyscale
(Fig. 2 of Instant Application)
As to Claim 2, Nah teaches acquiring an optical flow between each of the neighborhood video frames and the target video frame respectively (see pg. 1991, Section 4.7, “XJTU-IAIR team proposes a flow-guided patio temporal dense network (FSTDN) for the joint video de blurring and super-resolution task as shown in Fig. 9.”, and see calculated flows
F
l
o
w
t
+
1
,
F
l
o
w
t
+
1
,
F
l
o
w
t
+
1
,
F
l
o
w
t
+
1
, which represent the optical flow between the target frame
L
R
t
and each respective neighboring frame)
PNG
media_image14.png
357
815
media_image14.png
Greyscale
(Fig. 9 of Nah)
PNG
media_image15.png
493
716
media_image15.png
Greyscale
(Fig. 2 of Instant Application)
and aligning each of neighborhood features of the fusion feature with a target feature of the fusion feature on the basis of the optical flow between each of the neighborhood video frames and the target video frame (see Fig. 9, ‘Feature Warping Layer’, where each feature fusion feature
F
d
is warped (aligned) using the flow calculated from the neighboring frames and target frames),
PNG
media_image9.png
357
881
media_image9.png
Greyscale
(Fig. 9 of Nah)
to obtain an alignment feature corresponding to the RDB that outputs the alignment feature (see
F
d
W
a
r
p
is generated for its respective 3D-RBD D block).
PNG
media_image7.png
423
358
media_image7.png
Greyscale
(Fig. 9 of Nah)
As to Claim 3, Nah teaches splitting the fusion feature to obtain each of the neighborhood features and the target feature (see Fig.9, ‘Feature Warping Layer’, where the ‘fusion feature
F
d
is split to obtain target feature
F
d
,
t
and the neighboring features
F
d
,
t
+
1
,
F
d
,
t
+
2
,
F
d
,
t
-
1
,
F
d
,
t
-
2
),
PNG
media_image11.png
357
885
media_image11.png
Greyscale
(Fig. 9 of Nah)
aligning each of the neighborhood features with the target feature on the basis of the optical flow between each of the neighborhood video frames and the target video frame, to obtain an alignment feature for each of the neighborhood video frames (see Fig. 9, ‘Feature Warping Layer’, where each feature fusion feature of
F
d
(
F
d
,
t
+
1
,
F
d
,
t
+
2
,
,
F
d
,
t
,
F
d
,
t
-
1
,
F
d
,
t
-
2
) is warped (or aligned) using the flow calculated from the neighboring frames and target frames);
and merging the target feature and the alignment feature of each of the neighborhood video frames to obtain an alignment feature corresponding to the RDB that outputs the fusion feature (see Fig.9, where the warped features of the fusion features are concatenated to form
F
d
W
a
r
p
).
PNG
media_image16.png
357
850
media_image16.png
Greyscale
(Fig. 9 of Nah)
As to Claim 6, Nah teaches that generating a super-resolution video frame corresponding to the target video frame on the basis of the alignment feature corresponding to the RDB in each stage and the initial feature of the target video frame, comprises: merging alignment features corresponding to the multistage RDBs to obtain a second feature (see Fig.9, the alignment features
F
1
W
a
r
p
,
F
d
W
a
r
p
, and
F
D
W
a
r
p
being concatenated to form a 5D tensor),
PNG
media_image17.png
522
886
media_image17.png
Greyscale
(Fig. 9 of Nah)
and converting, based on a feature conversion network, the second feature into a feature having the same tensor as an initial feature of the target video frame to obtain a third feature (see Fig. 9, ‘Temporal Fusion’, and see how the initial 5D Tensor (with dimensions n*(64*D)*5*h*w) is converted to a 4D tensor (with dimensions n*64*h*w). Additionally, see how the initial feature of the target frame is summed with the fourth feature, thus implying that the third feature is the same dimensions as the initial feature),
PNG
media_image18.png
485
1393
media_image18.png
Greyscale
(Fig. 9 of Nah)
and generating a super-resolution video frame corresponding to the target video frame on the basis of the third feature and the initial feature of the target video frame (see Fig. 9, super resolution video frame
H
R
t
, generated from the 4D tensor and the initial feature
F
t
,
which is connected by the red dotted arrow).
PNG
media_image19.png
443
1431
media_image19.png
Greyscale
(Fig. 9 of Nah)
As to Claim 8, Nah teaches comprises: performing summation fusion on the third feature and the initial feature of the target video frame to obtain a fourth feature (see Fig.9, where the 4D tensor is the ‘third feature’ and the initial feature is added as indicated by the summation sign to obtain the fourth feature)
PNG
media_image20.png
485
1316
media_image20.png
Greyscale
(Fig. 9 of Nah)
processing the fourth feature by a residual dense network RDN to obtain a fifth feature (see Fig 9, where the fourth feature is input into a 2D RDN to obtain the fifth feature);
PNG
media_image21.png
485
623
media_image21.png
Greyscale
(Fig. 9 of Nah)
and upsampling the fifth feature to obtain a super-resolution video frame corresponding to the target video frame (see the upsampling block after the 2D RDN, which then outputs the super-resolution frame
H
R
t
).
PNG
media_image22.png
485
785
media_image22.png
Greyscale
(Fig. 9 of Nah)
PNG
media_image23.png
331
827
media_image23.png
Greyscale
(Fig. 4 of Instant Application)
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Nah et al. (S. Nah et al., "NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 2019), hereinafter Nah, in view of Gupta et al. (A. Gupta, et al., "Enhancing and experiencing spacetime resolution with videos and stills," 2009 IEEE International Conference on Computational Photography (ICCP)), hereinafter Gupta.
As to Claim 4, Nah fails to explicitly teach upsampling the target video frame and each of the neighborhood video frames of the target video frame, to obtain an upsampled video frame of the target video frame and an upsampled video frame of each of the neighborhood video frames; acquiring an optical flow between the upsampled video frame of each of the neighborhood video frames and the upsampled video frame of the target video frame; and aligning each of the neighborhood features of the fusion feature with the target feature of the fusion feature on the basis of the optical flow between the upsampled video frame of each of the neighborhood video frames and the upsampled video frame of the target video frame, to obtain an alignment feature corresponding to the RDB that outputs the fusion feature.
However, in an analogous art, Gupta teaches a method for enhancing the spacetime resolution of videos (see abstract on page 1), which includes upsampling adjacent video frames (see page 3, section 3.1, “The input consists of a stream of low-resolution frames with intermittent high-resolution stills. We upsample the low-resolution frames using bicubic interpolation to match the size of the high-resolution stills and denote them by fi. For each fi, the nearest two high-resolution stills are denoted as Sleft and Sright”),
then calculating the flow between the upsampled frames (see page 3.1, “The system estimates motion between every fi and corresponding Sleft & Sright… One approach is to compute optical flow directly from the high-resolution stills, Sleft or Sright, to the upsampled frames fi”.)
and then aligning the frames on the basis of optical flow between the upsampled video frames see page 3, section 3.1, “Once the system has computed correspondences from Sleft to fi and Sright to fi, it warps the high-resolution stills to bring them into alignment with fi”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the upsampling taught by Gupta with the super-resolution method taught by Nah. Gupta teaches on page3, section 3.1, “The summed motion estimation serves as initialization to bring long range motion within the operating range of the optical flow algorithm and reduces the errors accumulated from the pairwise sums.” Thus, it would have been obvious to combine the teachings of Gupta with the teachings of Nah in order to obtain the invention as claimed in Claim 4.
Claims 7, 10, 13-14, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Nah et al. (S. Nah et al., "NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 2019), hereinafter Nah in view of Hu et al. (CN 112565887), hereinafter Hu.
As to Claim 7, Nah teaches, the feature conversion network comprises a first convolutional layer, a second convolutional layer, and a third convolutional layer concatenated sequentially; and the second convolutional layer and the third convolutional layer both have a kernel of 3*3*3 and have a padding parameter of 0 in a time dimension and a padding parameter of 1 in both length dimension and width dimension (see Fig.9, ‘Temporal Fusion ‘ block with three convolutional layers, and see kernel and padding labeled for the second and third convolutional layer, where ‘k’ stands for kernel, and ‘pad’ stands for padding).
PNG
media_image24.png
483
492
media_image24.png
Greyscale
(Fig.9 of Nah, with kernel and padding size)
PNG
media_image25.png
229
277
media_image25.png
Greyscale
(Fig.5 of instant application, with kernel and padding sizes)
Nah fails to explicitly teach that the first convolutional layer has a kernel of 1 *1* 1 and has a padding parameter of 0 in each dimension.
However, Hu teaches a super-resolution method, which includes a pointwise convolution kernel (see paragraph [0102], “This application introduces the depthwise separable convolution in the neural network model. The depthwise separable convolution uses different convolution kernels for each channel of the input image for operation and operation, and the operation steps can be divided into depthwise convolution (Depthwise) Convolution with point (Pointwise)”, and see paragraph [0104], “The convolution kernel of deep convolution is k×k, the channel is cd, and the convolution kernel of point convolution is 1 ×1”, where it is known in the art that a pointwise kernel has no padding).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the convolutional kernel taught by Hu with the super-resolution method taught by Nah. The motivation for doing so would be would be to reduce the amount of calculation needed (see paragraph [0104] and [0106], “Further, the depth separable convolution is to split the one-step convolution operation into two steps of deep convolution and point convolution…Compared with the standard convolution, the amount of calculation is reduced”). Thus, it would have been obvious to combine the kernel taught by Hu with the teachings of Nah in order to obtain the invention as claimed in Claim 7.
As to Claim 10, Claim 10 is directed towards an electronic device, comprising a memory and a processor, the memory being configured to store a computer program, the processor being configured to, when executing the computer program, cause the electronic device to implement the same method as claimed in Claim 1.
Nah teaches the video super-resolution method of Claim 1, but fails to explicitly teach an electronic device comprising a memory and a processor.
However, Hu teaches a video super-resolution device (see paragraph [0001], “The embodiments of the present invention provide a video processing method, device, terminal, and storage medium, which can adaptively adjust a super-resolution strategy to perform super-resolution reconstruction on a video stream, thereby effectively improving video quality”),
which comprises a memory and processor (see paragraph [0060], “In another aspect, an embodiment of the present invention provides an intelligent terminal, which includes a processor, a communication interface, and a memory”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the video super-resolution device taught by Hu with the video super resolution method that by Nah. The motivation for doing so would be to integrate the device into another system. Hu teaches in paragraph [0077], “The video processing system may be specifically integrated in an electronic device, and the electronic device may be a terminal or a server. For example, the video processing system can be integrated in the terminal. The terminal may be a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal computer (PC, Personal Computer), a TV, or other smart playback device, which is not limited in this application.” Thus, it would have been obvious to combine the video-super resolution device taught by Hu with the method taught by Nah in order to obtain the invention as claimed in Claim 10.
As to Claim 11, Claim 11 is directed towards a computer-readable storage medium, the computer-readable storage medium storing a computer program which, when executed by a computing device, causing the computing device to implement the same method as claimed in Claim 1.
Nah teaches the video super-resolution method of Claim 1, but fails to explicitly teach a computer-readable storage medium.
However, Hu teaches a computer-readable storage medium (see paragraph [0001], “The embodiments of the present invention provide a video processing method, device, terminal, and storage medium, which can adaptively adjust a super-resolution strategy to perform super-resolution reconstruction on a video stream, thereby effectively improving video quality”),
which can contain a computer program (see paragraph [0060], “The processor, the communication interface, and the memory are connected to each other, wherein the memory is used to store a computer program, The computer program includes program instructions, and the processor is configured to call the program instructions for performing operations involved in the foregoing video processing method”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the video processing device taught by Hu with the video processing method taught by Nah. The motivation for doing so would be to integrate the device into other electronic devices, as taught by Hu in paragraph [0077]. Thus, it would have been obvious to combine the video-super resolution device taught by Hu with the super-resolution method taught by Nah in order to obtain the invention as claimed in Claim 11.
As to Claim 13, Claim 13 claims the same limitation as Claim 2 and is dependent on a similarly rejected independent claim. Therefore, the rejection and rationale are analogous to that made in Claim 2.
As to Claim 14, Claim 14 claims the same limitation as Claim 3 and is dependent on a similarly rejected independent claim. Therefore, the rejection and rationale are analogous to that made in Claim 3.
As to Claim 17, Claim 17 claims the same limitation as Claim 6 and is dependent on a similarly rejected independent claim. Therefore, the rejection and rationale are analogous to that made in Claim 6.
As to Claim 18, Claim 18 claims the same limitation as Claim 7 and is dependent on a similarly rejected independent claim. Therefore, the rejection and rationale are analogous to that made in Claim 7.
As to Claim 19, Claim 19 claims the same limitation as Claim 6 and is dependent on a similarly rejected independent claim. Therefore, the rejection and rationale are analogous to that made in Claim 8.
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Nah et al. (S. Nah et al., "NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results,"2019), hereinafter Nah, in view of Gupta et al. (A. Gupta, et al., "Enhancing and experiencing spacetime resolution with videos and stills," 2009 IEEE International Conference on Computational Photography (ICCP)), hereinafter Gupta, and further in view of Hu et al. (CN 112565887), hereinafter Hu.
As to Claim 4, Nah and Hu fail to explicitly teach upsampling the target video frame and each of the neighborhood video frames of the target video frame, to obtain an upsampled video frame of the target video frame and an upsampled video frame of each of the neighborhood video frames; acquiring an optical flow between the upsampled video frame of each of the neighborhood video frames and the upsampled video frame of the target video frame; and aligning each of the neighborhood features of the fusion feature with the target feature of the fusion feature on the basis of the optical flow between the upsampled video frame of each of the neighborhood video frames and the upsampled video frame of the target video frame, to obtain an alignment feature corresponding to the RDB that outputs the fusion feature.
However, in an analogous art, Gupta teaches a method for enhancing the spacetime resolution of videos (see abstract on page 1), which includes upsampling adjacent video frames (see page 3, section 3.1, “The input consists of a stream of low-resolution frames with intermittent high-resolution stills. We upsample the low-resolution frames using bicubic interpolation to match the size of the high-resolution stills and denote them by fi. For each fi, the nearest two high-resolution stills are denoted as Sleft and Sright”),
then calculating the flow between the upsampled frames (see page 3.1, “The system estimates motion between every fi and corresponding Sleft & Sright… One approach is to compute optical flow directly from the high-resolution stills, Sleft or Sright, to the upsampled frames fi”.)
and then aligning the frames on the basis of optical flow between the upsampled video frames see page 3, section 3.1, “Once the system has computed correspondences from Sleft to fi and Sright to fi, it warps the high-resolution stills to bring them into alignment with fi”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the upsampling taught by Gupta with the super-resolution method taught by Nah and Hu. Gupta teaches on page3, section 3.1, “The summed motion estimation serves as initialization to bring long range motion within the operating range of the optical flow algorithm and reduces the errors accumulated from the pairwise sums.” Thus, it would have been obvious to combine the upsampling taught by Gupta with the teachings of Nah and Hu in order to obtain the invention as claimed in Claim 4.
Allowable Subject Matter
Claims 5 and 10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Nah, Gupta, and Hu fail to teach: upsampling each of the neighborhood features and the target feature respectively, to obtain an upsampled feature of each of the neighborhood video frames and an upsampled feature of the target video frame; aligning the upsampled feature of each of the neighborhood video frames with the upsampled feature of the target video frame on the basis of the optical flow between the upsampled video frame of each of the neighborhood video frames and the upsampled video frame of the target video frame, to obtain an upsampled alignment feature of each of the neighborhood video frames; performing a space-to-depth conversion on the upsampled feature of the target video frame and the upsampled aligned feature of each of the neighborhood video frames respectively, to obtain an equivalent feature of the target video frame and an equivalent feature of each of the neighborhood video frames; and merging the equivalent feature of the target video frame and the equivalent feature of each of the neighborhood video frames, to obtain an alignment feature corresponding to the RDB that outputs the fusion features.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Porikli (US Pub No 2022/0222776) teaches a video super-resolution method comprising acquiring a first feature, processing the first feature with a network residual dense units, and then using the output of the residual to output a high-resolution frame. The frame is then aligned with previously processed frames in order, and then put into another network in order to generate a frame with higher resolution. Porikli fails to teach a ‘fusion feature’ comprising a target feature and multiple neighboring features.
Hou (CN 113628115) teaches space-to-depth conversion of features for the purposes of super-resolution. However, Hou fails to explicitly teach upsampling each feature of the fusion feature, and to obtain an equivalent feature of the target video frame and an equivalent feature of each of the neighborhood video frames; and merging the equivalent feature of the target video frame and the equivalent feature of each of the neighborhood video frames.
Wang et al. (CN 111583112), cited in the Chinese Search Report, teaches a method for video super-resolution which includes aligning video frames through deformable convolution. However, the alignment occurs before the frames are input into the residual dense network, and thus each feature produced by the RDB is not aligned. The same author published a paper (H. Wang, D. Su, C. Liu, L. Jin, X. Sun and X. Peng, "Deformable Non-Local Network for Video Super-Resolution," in IEEE Access, vol. 7, pp. 177734-177744, 2019), that teaches a similar architecture that also teaches aligning video frames before inputting the frames into a residual network.
Dai et al. (CN 112767251), cited in the Chinese Search Report is directed towards a method of image super-resolution. Dai teaches extracting and fusing feature, but fails to teach aligning features.
Du et al. (X. Du, Y. Zhou, Y. Chen, Y. Zhang, J. Yang and D. Jin, "Dense-Connected Residual Network for Video Super-Resolution," 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 2019), teaches a residual network for video super-resolution, that uses optical flow to align video frames. However, the video frames are aligned before they are input into the residual network.
Su et al. (D. Su, H. Wang, L. Jin, X. Sun and X. Peng, "Local-Global Fusion Network for Video Super-Resolution," in IEEE Access, vol. 8, pp. 172443-172456, 2020) teaches a video super-resolution method that uses residual blocks to extract features, and then aligns the features. However, Su fails to teach that the features are aligned to a target feature of a fusion feature.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOUMYA THOMAS whose telephone number is (571)272-8639. The examiner can normally be reached M-F 8:30-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached at (571) 272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/S.T./Examiner, Art Unit 2664
/JENNIFER MEHMOOD/Supervisory Patent Examiner, Art Unit 2664