DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
This application claims benefit of foreign priority under 35 U.S.C. 119(a)-(d) of KR10-2023-0045040, filed in REPUBLIC OF KOREA on 04/05/2023, and KR10-2023-0113187, filed in REPUBLIC OF KOREA on 08/28/2023.
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 03/27/2024, 08/30/2024, and 06/13/2025 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered and attached by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to nonstatutory subject matter. Claim 10 is non-statutory under the most recent interpretation of the Interim Guidelines regarding 35 U.S. C.101 because: the computer-readable recording medium configured to claimed is not positively disclosed in the specification as a statutory only embodiment and is not limited to non-transitory media. The broadest reasonable interpretation of a claim drawn to a computer readable medium (also called machine readable medium and other such variations) typically covers forms of non-transitory tangible media and transitory propagating signa Is per se in view of the ordinary and customary meaning of computer readable media, particularly when the specification is silent. See MPEP 2111.01. When the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. § 101 as covering non-statutory subject matter. See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007) transitory embodiments are not directed to statutory subject matter and Interim Examination Instructions for Evaluating Subject Matter Eligibility Under 35 U.S.C. § 101, Aug. 24, 2009; p. 2.
To overcome this rejection, the claim may be amended to recite "a non-transitory computer readable medium configured to"
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 4, 9, 10, 12, 15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over X. Wang, K. C. K. Chan, K. Yu, C. Dong and C. C. Loy, "EDVR: Video Restoration With Enhanced Deformable Convolutional Networks," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 2019, pp. 1954-1963 hereinafter referred to as (Wang) in view of Luo (U.S. Patent Pub. No. 2022/0076002).
Regarding Claim 1, Wang teaches a method of processing a video, the method comprising:
extracting a first image feature from a first input image included in a scene (Section 3.1: Given 2N+1 consecutive low-quality frames, we denote the middle frame as the reference frame and the other frames as neighboring frames; Section 3.2: We first briefly review the use of deformable convolution for alignment [40], i.e., aligning features of each neighboring frame (first input image) to that of the reference one (Second image feature taught below); This shows features are extracted then aligned.)
extracting a second image feature from a second input image included in the scene, wherein the second input image is a target frame (Section 3.1: Given 2N+1 consecutive low-quality frames, we denote the middle frame as the reference frame (second input image) and the other frames as neighboring frames (first input image); Section 3.2: We first briefly review the use of deformable convolution for alignment [40], i.e., aligning features of each neighboring frame (first input image) to that of the reference one (second input image); This shows features are extracted then aligned.)
generating, based on the first image feature and the second image feature, a temporal feature associated with information about a temporal change between the first image feature and the second image feature; and (Section 3.3: we propose TSA fusion module to assign pixel-level aggregation weights on each frame. Specifically, we adopt temporal and spatial attentions during the fusion process, as shown in Fig. 4. The goal of temporal attention is to compute frame similarity in an embedding space … The temporal attention maps (temporal feature) are then multiplied in a pixel-wise manner to the original aligned features.)
generating an output image based on the temporal feature (Section 3.1: The TSA fusion module fuses image information of different frames (temporal feature taught in above mapping)… The fused features then pass through a reconstruction module… Finally, the high-resolution frame is obtained,)
wherein the generating the temporal feature comprises:
performing a first convolution operation on the first image feature and performing a second convolution operation on the second image feature (Section 3.2: Specifically, as shown with black dash lines in Fig. 3, to generate feature … we use strided convolution filters to downsample the features;)
generating an offset image feature based on a result of the first convolution operation and a result of the second convolution operation using an offset network configured to learn an amount of movement between pixels in the first image feature and pixels in the second image feature (Section 3.2: The learnable offset Δpk and the modulation scalar Δmk are predicted from concatenated features of a neighboring frame and the reference one … offsets and aligned features are predicted; The aligned features correspond to offset image feature and calculated after the convolution filters)
performing a third convolution operation on the offset image feature; performing a fourth convolution operation on the offset image feature; and (Section 3.2: Following the pyramid structure, a subsequent deformable alignment is cascaded to further refine the coarsely aligned features (the part with light purple background in Fig. 3; The convolutions can be seen in Fig. 3.)
generating the temporal feature by performing a first self-attention operation that uses the result of the second convolution operation as a first query, uses a result of the third convolution operation as a first key, and uses a result of the fourth convolution operation as a first value (Section 3.3: The goal of temporal attention (first self-attention operation) is to compute frame similarity in an embedding space; "
PNG
media_image1.png
39
81
media_image1.png
Greyscale
" (equation 5) describes the reference feature corresponding to the second convolution result and the first query; "
PNG
media_image2.png
39
95
media_image2.png
Greyscale
where I is 1” (equation 5) describes the aligned offset feature corresponding to the third convolution result and the first key; and "Fat+1" (equation 7) describes the aligned offset feature itself corresponding to the fourth convolution result and the first value).)
Wang infers, but does not explicitly disclose extracting a first image feature from a first input image included in a scene extracting a second image feature from a second input image included in the scene, wherein the second input image is a target frame; generating, based on the first image feature and the second image feature, a temporal feature associated with information about a temporal change between the first image feature and the second image feature.
Luo is in the same field of art of image analysis. Further, Luo teaches extracting a first image feature from a first input image included in a scene extracting a second image feature from a second input image included in the scene, wherein the second input image is a target frame (¶37 Step S302: Obtain image data of video data in a plurality of temporal frames; and obtain original feature submaps of each of the temporal frames on a plurality of convolutional channels by using a multi-channel convolutional layer.)
generating, based on the first image feature and the second image feature, a temporal feature associated with information about a temporal change between the first image feature and the second image feature (¶42 Step S304: Calculate, by using each of the temporal frames as a target temporal frame, motion information weights of the target temporal frame on the convolutional channels according to original feature submaps of the target temporal frame on the convolutional channels and original feature submaps of a next temporal frame adjacent to the target temporal frame on the convolutional channels.)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang by explicitly extracting features from the images and a temporal feature associated with them that is taught by Luo; thus, one of ordinary skilled in the art would be motivated to combine the references to improve recognition accuracy (Luo ¶3).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Regarding Claim 4, Wang in view of Luo discloses the method of claim 1, wherein the extracting of the first image feature comprises extracting the first image feature using a first convolution layer, and wherein the extracting of the second image feature comprises extracting the second image feature using a second convolution layer (Luo, ¶40 the image data is inputted into the multi-channel convolutional layer as input data of the multi-channel convolutional layer, and the convolution kernels in the multi-channel convolutional layer perform convolution calculation on the image data.)
Regarding Claim 9, Wang in view of Luo discloses the method of claim 1, further comprising determining whether the first input image and the second input image are included in the scene based on at least one of meta information about the video or a change in a frame of the video (Luo determines the action in the scene based on differences and changes in the temporal images.)
Regarding claim 10, claim 10 has been analyzed with regard to claim 1 and is rejected for the same reasons of obviousness as used above as well as in accordance with Luo further teaching on: A computer-readable recording medium configured to store one or more instructions which, when executed by at least one processor of a device for processing a video, causes the device to (¶7 One or more non-transitory computer-readable storage media are provided, storing computer-readable instructions, the computer-readable instructions, when executed by one or more processors, causing the one or more processors to perform the operations.)
Regarding claim 12, claim 12 has been analyzed with regard to claim 1 and is rejected for the same reasons of obviousness as used above as well as in accordance with Luo further teaching on: An apparatus for processing a video, the apparatus comprising: at least one processor; and a memory configured to store one or more instructions which, when executed by the at least one processor, cause the apparatus to perform operations (¶7 One or more non-transitory computer-readable storage media are provided, storing computer-readable instructions, the computer-readable instructions, when executed by one or more processors, causing the one or more processors to perform the operations.)
Claim 15 recites limitations similar to claim 4 and is rejected under the same rationale and reasoning.
Claim 20 recites limitations similar to claim 9 and is rejected under the same rationale and reasoning.
Allowable Subject Matter
Claims 2-3, 5-8, 11, 13-14, and 16-19 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Regarding claims 2, 11, and 13, No prior art teaches generating a first offset from the result of the first convolution operation using a first offset network; generating a second offset from the result of the second convolution operation using a second offset network; and generating the offset image feature by adding, to the first image feature, a third offset obtained by subtracting the first offset from the second offset.
Regarding claims 5 and 16, No prior art teaches generating a spatial feature for the second input image based on the second image feature and a third image feature, wherein the output image is generated further based on the spatial feature, and wherein the third image feature is an output of a k-1-th layer in a second convolutional neural network, where k denotes a number of layers included in the second convolutional neural network.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DUSTIN BILODEAU whose telephone number is (571)272-1032. The examiner can normally be reached 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached at (571) 272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DUSTIN BILODEAU/Examiner, Art Unit 2664
/JENNIFER MEHMOOD/Supervisory Patent Examiner, Art Unit 2664