Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The Amendment filed November 25 2025 has been entered and considered. Claims 1, 13, and 17 have been amended. In light of the amendment the prior art rejections of claims 1, 13, and 17 are withdrawn as moot. The new grounds of rejection set forth in the present action were necessitated by Applicants’ claim amendments; accordingly, this action is made final.
Drawing/Specification Objections –
In view of the amendments to the specification, the objections are withdrawn as moot.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-8 and 11-20 are rejected under 35 U.S.C. 103 as being unpatentable over Jo et al. (NPL, “Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation”, published 2018) in view of Niklaus et al. (NPL, “Context-aware Synthesis for Video Frame Interpolation”, published 2018).
PNG
media_image1.png
276
664
media_image1.png
Greyscale
Regarding claim 1, Jo teaches a method of generating an image frame comprising: applying a neural network to at least one of one or more pre-processed image frames of a temporal sequence of image frames to generate a residual and a mask (Fig. 3, reprinted below, shows the generation of filters (mask) as well as a residual by applying a neural network to one or more pre-processed image frames);
applying the mask to features of the one or more pre-processed image frames to provide approximated features of a temporally upsampled image frame to be in the temporal sequence of image frames (Fig. 3, reprinted above, shows the filters being applied to the image frames); and combining the approximated features of the temporally upsampled image frame with the residual to generate an output temporally upsampled image frame (Fig. 3, reprinted above, shows the combination of the residual with the filtered output to generate an output temporally upsampled image frame).
Jo does not explicitly disclose inserting the output temporally upsampled image frame in the temporal sequence of image frames.
Niklaus discloses inserting the output temporally upsampled image frame in the temporal sequence of image frames (Pg. 2, Col. 2, “Given two consecutive video frames I1 and I2, our goal is to generate an intermediate frame ˆIt at the temporal location t in between the two input frames”; Pg. 8, Col. 1, “…we are able to interpolate a frame at an arbitrary temporal position t ∈[0,1], as shown in Figure 10.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jo to incorporate the teachings of Niklaus to include inserting the output temporally upsampled image frame in the temporal sequence of image frames. Niklaus discloses a system to generate intermediate frames in order to increase temporal resolution. One of ordinary skill in the art would recognize that applying these interpolation techniques to the video processing system of Jo would produce temporally upsampled frames and improve motion smoothness.
Regarding claim 2, Jo teaches all of the elements of claim 1, as stated above. They do not explicitly disclose warping one or more image frames of the temporal sequence of image frames to provide the one or more pre-processed image frames. However, they do acknowledge related works which perform warping processing (Pg. 3225, Col. 2).
Niklaus teaches warping one or more image frames of the temporal sequence of image frames to provide the one or more pre-processed image frames (Fig. 2 annotation, “Given two consecutive input frames, our method first estimates bidirectional flow between them and extracts per-pixel context maps. Our method then pre-warps the input frames and their corresponding context maps.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jo to incorporate the teachings of Niklaus to include warping one or more image frames of the temporal sequence of image frames to provide the one or more pre-processed image frames. Jo expressly acknowledges that other video super-resolution techniques employ warping of frames using motion vectors as a preprocessing step to better align temporal information, but chooses instead to rely on implicit compensation. One of ordinary skill in the art would recognize that substituting or supplementing Jo’s implicit temporal handling with the warping method of Niklaus would be a routine substitution as both approaches were well-known alternatives for addressing the same design need. Warping is used to enhance temporal consistency and reduce misalignment artifacts, which is the end goal in the method of Jo.
PNG
media_image2.png
200
663
media_image2.png
Greyscale
Regarding claim 3, Jo as modified teaches all of the elements of claim 2, as stated above, as well as wherein warping the one or more image frames of the temporal sequence of image frames comprises applying motion vectors from a rendering pipeline to the one or more image frames of the temporal sequence of image frames to provide one or more approximations of the temporally upsampled image frame (Niklaus; Fig. 2, reprinted below, shows the image frames being warped by applying motion (flow) vectors).
Regarding claim 4, Jo teaches all of the elements of claim 1, as stated above, as well as wherein the neural network is defined, at least in part, by parameters determined in training operations including: generation of one or more image frames based, at least in part, on application of a generated mask and a generated residual (Fig. 3, reprinted above); application of a loss function to a comparison of a real image frame as a ground truth label to the generated one or more image frames (Pg. 3227, Col. 2, “We sample 160, 000 ground truth training data with the spatial resolution of 144 × 144 by selecting areas with sufficient amount of motion... To train our network Gθ, we use the Huber loss as the cost function for stable convergence:”, trains the network end-to-end against ground truth frames with a Huber reconstruction loss); and update of the parameters based, at least in part, on application of a gradient to the loss function (Pg. 3228, Col. 1, “We use Adam optimizer [18] and initially set learning rate to 0.001 and multiply by 0.1 after every 10 epochs.”).
Regarding claim 5, Jo teaches all of the elements of claim 1, as stated above, and when modified in view of Niklaus also teaches computing parameters of two or more warped image frames based, at least in part, on the features of at least one image frame of the temporal sequence of image frames rendered at rendering instances to at least in part provide the features of the at least one image frame of the temporal sequence of image frames rendered at the rendering instances (Niklaus; Fig. 2 annotation, “Given two consecutive input frames, our method first estimates bidirectional flow between them and extracts per-pixel context maps. Our method then pre-warps the input frames and their corresponding context maps.”, context maps are the parameters of the warped features), and wherein applying the mask to features of the at least one image frame of the temporal sequence of image frames rendered at the rendering instances comprises applying the mask to the computed parameters of the two or more warped image frames to at least in part generate the approximated features of the temporally upsampled image frame (Jo; Fig. 3, reprinted above, shows the filters (mask) being applied to the image frames features).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jo to incorporate the teachings of Niklaus to include applying the mask to features of the at least one image frame of the temporal sequence of image frames rendered at the rendering instances comprises applying the mask to the computed parameters of the two or more warped image frames to at least in part generate the approximated features of the temporally upsampled image frame. Jo teaches applying a mask to features of an input frame to generate approximated features. Niklaus teaches warping frames and contextual feature maps (parameters) using motion vectors to align them at rendering instances before synthesis. One of ordinary skill in the art would understand that applying Jo’s known mask mechanism to the warped feature parameters of Niklaus, rather than center-frame features, would improve temporal alignment and reduce motion artifacts.
Regarding claim 6, Jo teaches all of the elements of claim 1, as stated above, and when modified in view of Niklaus also teaches computing parameters of a first warped image frame based, at least in part, on the features of at least a first image frame of the temporal sequence of image frames rendered at a rendering instance in the temporal sequence prior to the temporally upsampled image frame (Niklaus; Fig. 2 annotation, “Given two consecutive input frames, our method first estimates bidirectional flow between them and extracts per-pixel context maps. Our method then pre-warps the input frames and their corresponding context maps.”); computing parameters of a second warped image frame based, at least in part, on features of at least a second image frame of the temporal sequence of image frames (See above citation of Niklaus); and applying the mask to the parameters of the first and second warped image frames to at least in part generate the approximated features of the temporally upsampled image frame (Jo; Fig. 3, reprinted above, shows the filters (mask) being applied to the image frames features. The analysis set forth with respect to claim 5 is incorporated herein and applies to claim 6).
Regarding claim 7, the recited elements perform variably the same function as that of claim 6. It is rejected under the same analysis.
Regarding claim 8, Jo teaches all of the elements of claim 1, as stated above, and when modified in view of Niklaus also teaches computing an approximated motion vector based, at least in part, on at least one image frame in the temporal sequence of image frames (Niklaus; Fig. 2 annotation, “Given two consecutive input frames, our method first estimates bidirectional flow between them”); and computing a warped image frame based, at least in part, on the approximated motion vector to at least in part provide the features of the one or more pre-processed image frames (Niklaus; Fig. 2 annotation, “Our method then pre-warps the input frames and their corresponding context maps.”).
Regarding claim 11, Jo teaches all of the elements of claim 1, as stated above, and when modified in view of Niklaus also teaches wherein: the features of one or more rendered image frames comprises image signal intensity values of a warped image frame of at least one of the one or more rendered image frames (Niklaus; Fig. 2 annotation, “Our method then pre-warps the input frames and their corresponding context maps.”); the features of the mask comprise coefficients to be applied to image signal intensity values associated with pixel locations and color channels of the warped image frame of at least one of the one or more rendered image frames (Pg. 3226, Col. 1, “First, a set of input LR frames {Xt−N:t+N } (7 frames in our network: N = 3) is fed into the dynamic filter generation network. The trained network outputs a set of r2 HW upsampling filters Ft of a certain size (5 × 5 in our network), which will be used to generate new pixels in the filtered HR frame Y˜t1. Finally, each output HR pixel value is created by local filtering on an LR pixel in the input frame Xt with the corresponding filter Fy,x,v,u”, the image frame would be pre-warped in view of Niklaus); and the features of the residual comprise values to be additively combined with approximated image signal intensity values of the temporally upsampled image frame (Jo; Fig. 3, reprinted above, shows the Residual being additively combined with approximated image signal intensity values).
Regarding claim 12, Jo teaches all of the elements of claim 1, as stated above, as well as wherein the neural network comprises activation functions defined in part by weights determined in iterations of a machine learning process according to a loss function (Pg. 3226, Col. 2, “Each part of the dense block is composed of batch normalization (BN) [12], ReLU [5], 1×1×1 convolution, BN, ReLU, and 3×3×3 convolution in order.”; Pg. 3227, Col. 2, “To train our network Gθ, we use the Huber loss as the cost function for stable convergence:”), the loss function to be based, at least in part, on a temporally upscaled image frame to a reference time instance and an image frame rendered at the reference time instance applied as a ground truth label (Pg. 3227, Eq. 3,
PNG
media_image3.png
54
323
media_image3.png
Greyscale
, wherein Yt is the ground truth label).
Regarding claim 13, the article comprising a non-transitory storage medium recites variably the same function as that of claim 1. It is rejected under the same analysis.
Regarding claim 14, the recited elements perform variably the same function as that of claim 2. It is rejected under the same analysis.
Regarding claim 15, the recited elements perform variably the same function as that of claim 3. It is rejected under the same analysis.
Regarding claim 16, the recited elements perform variably the same function as that of claim 11. It is rejected under the same analysis.
Regarding claim 17, the computing device comprising a memory and one or more processors recites variably the same function as that of claim 1. It is rejected under the same analysis.
Regarding claim 18, the recited elements perform variably the same function as that of claim 2. It is rejected under the same analysis.
Regarding claim 19, the recited elements perform variably the same function as that of claim 5. It is rejected under the same analysis.
Regarding claim 20, the recited elements perform variably the same function as that of claim 6. It is rejected under the same analysis.
Claim(s) 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Jo et al. as modified in view of Niklaus further in view of Schmidt (NPL, “Recurrent Neural Networks (RNNs): A gentle Introduction and Overview”, published 2019).
Regarding claim 9, Jo as modified teaches all of the elements of claim 1, as stated above. They do not explicitly disclose the usage of a sigmoid operation as an activation function to generate features of the mask or the usage of a tanh operation as an activation function to generate features of the residual.
Schmidt teaches applying a sigmoid operation as an activation function to at least in part generate the features of the mask (Pg. 4, “The shown equations use Wxi,Wxf ,Wxo ∈ Rd×h and Whi,Whf ,Who ∈ Rh×h as weight matrices while bi, bf , bo ∈ R1×h are their respective biases. Further, they use the sigmoid activation function σ to transform the output ∈ (0, 1) which each results in a vector with entries ∈ (0, 1).”, the sigmoid function is used to transform the weight matrix into a mask); and applying a tanh operation as an activation function to at least in part generate the features of the residual (Pg. 4, “Next, we need a candidate memory cell C˜t ∈ Rn×h which has a similar computation as the previously mentioned gates but instead uses a tanh activation function to have an output ∈ (−1, 1).”, tanh is used for a memory cell (residual)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jo to incorporate the teachings of Schmidt to include applying a sigmoid operation as an activation function to at least in part generate the features of the mask; and applying a tanh operation as an activation function to at least in part generate the features of the residual. Utilizing a sigmoid operation to control a mask and a tanh operation to generate a residual is not a new combination of elements but rather a description of the inherent operation of a widely known neural network architecture. One of ordinary skill in the art would recognize it as a routine substitution, leaving it obvious to incorporate within the method disclosed by Jo.
Regarding claim 10, the recited elements perform variably the same function as that of claim 9. It is rejected under the same analysis.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID A WAMBST whose telephone number is (703)756-1750. The examiner can normally be reached M-F 9-6:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Gregory Morse can be reached at (571)272-3838. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DAVID ALEXANDER WAMBST/Examiner, Art Unit 2663
/GREGORY A MORSE/Supervisory Patent Examiner, Art Unit 2698