DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-20 are pending under this Office action.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-9, 11-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Luo, etc. (US 20250106355 A1) in view of Petersson (US 20250069319 A1).
Regarding claim 1, Luo teaches that a method (See Luo: Fig. 1, and [0096], “FIG. 1 illustrates an example diagram 100 where blending factors for frame motion are generated using a neural network, according to at least one embodiment. In at least one embodiment, a processor 102 executes or otherwise performs one or more instructions to use a neural network 110 to generate blending factors of frame motion, using systems and methods such as those described herein”), comprising:
creating first interpolated optical flow data based, at least in part, on an optical flow from a preceding frame, an optical flow from a following frame, or a combination thereof (See Luo: Figs. 4-6, and [0170], “FIG. 6 illustrates an example diagram 600 where optical flow analysis is used to generate intermediate frames, according to at least one embodiment. In at least one embodiment, a current frame 602 (which is a current frame such as current frame 402, described herein at least in connection with FIG. 4) and a previous frame 606 (which is a previous frame such as previous frame 502, described herein at least in connection with FIG. 5) are used as input to optical flow 610. In at least one embodiment, current frame 602 includes a dynamic object 604 (and a shadow) as described herein at least in connection with FIG. 4 and previous frame 606 includes a dynamic object 608 (and a shadow) as described herein at least in connection with FIG. 5. In at least one embodiment, optical flow 610 moves contents of previous frame 606 to previous to current intermediate frame 616 based on flow. In at least one embodiment, optical flow 610 moves contents of current frame 602 to current to previous intermediate frame 624 based on flow”. Note that the optical flow 610 from the previous frame and the current frame are mapped to the optical flow from the previous frame and the optical flow from the following frame; and the previous to current intermediate frame 616 and current to previous intermediate frame 624 are mapped to the first interpolated optical flow data),
the first interpolated optical flow data having a resolution reduced relative to the preceding frame, the following frame, or a combination thereof (See Luo: Fig. 6, and [0106], “In at least one embodiment, processor 102 pre-processes frames 126 to generate one or more pre-processed frames (e.g., performs conversion and downsampling and uses only a luma channel of the YUV color space to generate lower resolution (LR) luma motion warped frames) as described above. In at least one embodiment, pre-processed frames 128 (e.g., converted and downsampled frames) are provided as input to neural network 110, and neural network uses pre-processed frames to generate blending factors 112 and output blending factors 114, as described herein. In at least one embodiment, neural network 110 neural network 110 uses pre-processed frames 128 to generate one or more blending factors 112, using techniques, systems, and methods such as those described herein”. Note that the lower resolution luma motion warped frames is mapped to the first interpolated optical flow data having a resolution reduced);
creating first interpolated motion vector data (See Luo: Figs. 4-5, and [0166], “In at least one embodiment, a scattered motion vector is used to motion vector warp 410 dynamic object 404 to a current to previous intermediate frame 412 based on motion. In at least one embodiment, a motion vector warp 410 of a dynamic object to an intermediate frame such as current to previous intermediate frame 412 based on motion, transforms dynamic object 404 to a position in current to previous intermediate frame 412 based on motion, by applying one or more motion vectors to dynamic object 404”; and [0169], “In at least one embodiment, a scattered forward motion vector is used to motion vector warp 512 dynamic object 504 to a previous to current intermediate frame 514 based on motion”. Note that the a current to previous intermediate frame 412 motion and a previous to current intermediate frame 514 motion is mapped to the first interpolated motion vector data) based at least in part,
on motion vectors from a preceding frame (See Luo: Figs. 4-5, and [0168], “In at least one embodiment, forward motion vectors 508 are calculated, using systems and methods such as those described herein. In at least one embodiment, forward motion vectors 508 are calculated based on one or more current frame motion vectors 506”. Note that the forward motion vectors 508 is mapped to the motion vector from the preceding frame),
a following frame, or a combination thereof (See Luo: Fig. 4, and [0163], “In at least one embodiment, one or more current frame motion vectors 406 describe motion of objects such as dynamic object 404. In at least one embodiment, current frame motion vectors 406 describe forward motion (e.g., motion from a previous frame) of dynamic objects such as dynamic object 404, as described herein”. Note that the current motion vector 406 is mapped to the motion vector of the following frame),
the first interpolated motion vector data having a resolution reduced relative to the preceding frame, the following frame, or the combination thereof (See Luo: Figs. 1-5, and [0086], “In at least one embodiment, said processor or other processor performs conversion and downsampling and uses only a luma channel of the YUV color space to generate lower resolution (LR) luma motion warped frames, where an LR luma motion warped frame (e.g., an LR frame with only luma values from the YUV color space). In at least one embodiment, this or other processor performs said downsampling to match a resolution of frames output by said game engine or other video provider”. Note that the lower resolution luma motion warped frames is mapped to the first interpolated motion vector data having a resolution reduced);
determining a motion vector nearest in depth from among the first interpolated motion vector data or an optical flow nearest in depth from among the first interpolated optical flow data, or a combination thereof, for each pixel of an interpolated frame (See Luo: Fig. 15-16, and [0204], “In at least one embodiment, if a depth difference of a neighbor 1508 is greater than a threshold value (e.g., a depth difference greater than 1, in FIG. 15), said neighbor is ignored when determining a max distance to neighbors, as described herein. In at least one embodiment, for example, in destination frame 1510, after motion, neighbor “A” is (4,0) away from a pixel, neighbor “D” is (4,1) away from said pixel, and neighbor “F” is (4,2) away from said pixel. In at least one embodiment, max distance to neighbors 1512 is (1,1) because neighbor “A”, neighbor “D”, and neighbor “F” are ignored when determining max distance to neighbors 1512 is (1,1) because they are at a different depth (e.g., that is greater than a threshold). In at least one embodiment, not illustrated in FIG. 15, if a threshold of 10 is chosen, then max distance to neighbors is (4,2). In at least one embodiment, different threshold values can be selected based, at least in part, on contents of source frames”; [0211], “In at least one embodiment, at step 1612 of example process 1600, calculates depth for a pixel selected at step 1604 that does not have valid data and that also does not have valid depth data. In at least one embodiment, at step 1612, depth is calculated using one or more neighbor pixels. In at least one embodiment, depth is calculated as a heuristic (e.g., estimated). In at least one embodiment, depth is given a default value (e.g., a very high depth value) in instances when depth cannot be calculated. In at least one embodiment, after step 1612, example process 1600 continues at step 1614”; and [0082], “In at least one embodiment, to calculate a value of a pixel in the intermediate image, values of one or more nearby pixels with similar depth (e.g., depth within a threshold of a depth value corresponding to a pixel location to be filled) is used. In at least one embodiment, for example, pixel values can be averaged or summed with a weighted average where weights of a sum depend on how far the depth is from the depth of the pixel whose value is being calculated”. Note that the depth is calculated, compared to a threshold, discarded the pixel if the threshold is exceeded, and this could be broadly mapped to the nearest in depth optical flow and the nearest in depth motion vector, but a secondary art will be searched and cited); and
selectively gathering one or more color signal values for at least some pixels in the interpolated frame from the preceding frame or the following frame, or a combination thereof, based, at least in part, on the determined at least one nearest in depth motion vector, or at least one nearest in depth optical flow, or a combination thereof (See Luo: Fig. 3, and [0156], “In at least one embodiment, at step 314 of example process 300, one or more intermediate frames are blended to generate one or more interpolated frames using systems and methods such as those described herein at least in connection with FIG. 2. In at least one embodiment, at step 314, one or more interpolated frames are generated by, for example, blending contents of one or more post-processed frames (e.g., frames post-processed at step 312). In at least one embodiment, for example, if there are two frames generated at step 312, at step 314, an interpolated frame is generated by combining pixels from a first frame generated at step 312 with pixels of a second frame generated at step 312 (e.g., pixels of an interpolated frame will be generated by blending colors and/or other information from frames generated at step 312). In at least one embodiment, not illustrated in FIG. 3, an interpolated frame is generated based, at least in part, on one or more blending weights such as those described herein. In at least one embodiment, after step 314, example process 300 continues at step 316”. Note that the pixel values of the interpolated frame are obtained by blending the intermediate frames, and this is mapped to the claimed cited limitations of “selectively gathering one or more color signal values for at least some pixels in the interpolated frame from the preceding frame or the following frame, or a combination thereof, based, at least in part, on the determined at least one nearest in depth motion vector, or at least one nearest in depth optical flow, or a combination thereof”).
However, Luo fails to explicitly disclose that a motion vector nearest in depth from among the first interpolated motion vector data.
However, Petersson teaches that a motion vector nearest in depth from among the first interpolated motion vector data (See Petersson: Fig. 4, and [0054], “In embodiments, dilated current depth values 415 and dilated motion vectors 455 each include data emphasizing the edges of geometry (e.g., images, graphics objects) in the current rendered frame as represented by current depth values 435 as stored in one or more respective depth buffers 222. These edges of geometry, for example, often introduce discontinuities into a contiguous series of depth values. Therefore, as the depth values and motion vectors are dilated, they naturally follow the contours of the geometric edges present in current depth values 435 as stored in one or more respective depth buffers 222. According to embodiments, reconstruct and dilate circuitry 432 is configured to compute dilated current depth values 415 and dilated motion vectors 455 by considering the depth values of a respective pixel neighborhood around each pixel of the current frame as indicated by current depth values 435. Such a pixel neighborhood, for example, includes a first number of pixels in a first direction (e.g., 3) and a second number of pixels in a second direction (e.g., 3) with a corresponding pixel being in the center (e.g., the pixel around which the pixel neighborhood is being considered). Within the depth values of the pixels in a pixel neighborhood, reconstruct and dilate circuitry 432 selects the depth values and corresponding motion vectors of the pixel where the depth value is nearest (e.g., appears closest) to a viewpoint of the scene represented by the current rendered frame. Reconstruct and dilate circuitry 432 then updates the pixel in the center of the pixel neighborhood with the selected depth value and selects the corresponding motion vector 103”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention was effectively filed to modify Luo to have a motion vector nearest in depth from among the first interpolated motion vector data as taught by Petersson in order to improve the clarity of the interpolated frame and improve user experience (See Petersson: Fig. 1, and [0026], “In this way, post-processing circuitry 120 reduces the influence of occluded pixels in the rendered frames that become disoccluded in a resulting interpolated frame 122. By reducing the influence of these occluded pixels in the rendered frames, the number of ghosting artifacts in a resulting interpolated frame 122 is reduced, helping to improve the clarity of the interpolated frame 122 and improve user experience”). Luo teaches a method and system that may generate the interpolated frames based on the depth information and the optical flow data and motion vector data; while Petersson teaches a system and method that may use the nearest in depth motion vectors to generate the pixel values for the interpolated frame. Therefore, it is obvious to one of ordinary skill in the art to modify Luo by Petersson to use the nearest in depth motion vectors to select the pixel values for the interpolated frame. The motivation to modify Luo by Petersson is “Use of known technique to improve similar devices (methods, or products) in the same way”.
Regarding claim 2, Luo and Petersson teach all the features with respect to claim 1 as outlined above. Further, Luo teaches that the method of claim 1, further comprising selecting between the first interpolated optical flow data and the first interpolated motion vector data for at least one pixel in the interpolated frame to provide a selected first interpolated optical flow data or first interpolated motion vector data, and using the selected first interpolated optical flow data or first interpolated motion vector data to gather at least one color signal value for the at least one pixel (See Luo: Fig. 7, and [0174], “FIG. 7 illustrates an example diagram 700 where forward motion candidates are blended, according to at least one embodiment. In at least one embodiment, a previous frame 702 (e.g., previous frame 502), a previous to current intermediate frame 704 based on motion (e.g., previous to current intermediate frame 514), and a previous to current intermediate frame 706 based on flow (e.g., previous to current intermediate frame 616) are blended using blending weights 708, using systems and methods such as those described herein. In at least one embodiment, blending weights 708 are generated by a neural network 714 (e.g., neural network 110 and/or neural network 212, as described herein at least in connection with FIGS. 1 and 2)”. Note that blending with weights from frame motion and optical flow is mapped to select from the interpolated optical data or the interpolated motion vector data).
Regarding claim 3, Luo and Petersson teach all the features with respect to claim 1 as outlined above. Further, Luo teaches that the method of claim 1, further comprising blending color signal values from the preceding frame and the following frame based, at least in part, on a computed blending value, a warped interpolated optical flow record or a warped interpolated motion vector record, or a combination thereof, for one or more pixels in the interpolated frame (See Luo: Fig. 9, and [0178], “FIG. 9 illustrates an example diagram 900 where an interpolated frame is generated, according to at least one embodiment. In at least one embodiment, a blended previous to current intermediate frame 902 (e.g., blended previous to current intermediate frame 710) and a blended current to previous intermediate frame 904 (e.g., blended current to previous intermediate frame 810) are blended 906 using systems and methods such as those described herein at least in connection with FIGS. 2 and 3 to generate one or more interpolated frames 908 (e.g., to generate one or more interpolated frames 220, described herein at least in connection with FIG. 2). In at least one embodiment, generating one or more interpolated frames 908 is generating interpolated frame 120, described herein at least in connection with FIG. 1. In at least one embodiment, generating one or more interpolated frames 908 includes post-processing frames 218 and/or generate interpolated frame(s) 220, described herein at least in connection with FIG. 2”. Note that blending the blended previous to current intermediate frame 902 with the blended current to previous intermediate frame 904 to generate the interpolated frames 908 is mapped to the current claim cited limitation).
Regarding claim 4, Luo and Petersson teach all the features with respect to claim 3 as outlined above. Further, Luo teaches that the method of claim 3, and further comprising computing the at least one computed blending value using a trained neural network (See Luo: Figs. 59-60, and [0655], “FIG. 59 illustrates a super sampling neural network, in accordance with at least one embodiment. In at least one embodiment, a neural network 5906 is referred to as a super sampling neural network, deep learning super sampling (DLSS) network, super sampling network, and/or variations thereof. In at least one embodiment, an input frame 5902 and motion vectors 5904 are processed by a neural network 5906 to generate an output frame 5908. In at least one embodiment, neural networks such as those described in connection with FIGS. 59-63 are DLSS networks”; and [0663], “FIG. 60 illustrates an architecture of a super sampling neural network, in accordance with at least one embodiment. In at least one embodiment, a neural network 6006 is referred to as a super sampling neural network, DLSS network, super sampling network, and/or variations thereof. In at least one embodiment, a neural network 6006 is trained to generate output frames 6008 from input frames 6002 and motion vectors 6004. In at least one embodiment, as part of training a neural network 6006, output frames 6008 generated by a neural network 6006 are compared with reference frames 6010 to update neural network 6006”. Note that the neural network techniques are used in generation the interpolated frames based on the motion vectors and optical flow in the primary art).
Regarding claim 5, Luo and Petersson teach all the features with respect to claim 4 as outlined above. Further, Luo teaches that the method of claim 4, wherein the at least one computed blending value is at lower spatial resolution than a spatial resolution of the interpolated frame, and the at least one computed blending value is upsampled to the spatial resolution of the interpolated frame (See Luo: Fig. 1, and [0083], “In at least one embodiment, a game engine (such as noted above and elsewhere herein) or other provider of video generates or otherwise provides video frames which include two successive frames (referred to respectively as a previous frame and a current frame, even though the words “previous” and “current” refer to frames between which one or more frames are to be generated where the words may not be accurate adjectives in some contexts). In at least one embodiment, said processor or another processor (such as processor 102 described below in FIG. 1) performs spatial upsampling (e.g., using a neural network technique such as described below or without a neural network) of previous frame and current frame to increase resolution of the previous and current frame (e.g., from 1080p to 4K or from 4K to 8K or otherwise) although, in some embodiments, upsampling is not applied. Upsampling can be referred to also as super sampling and upsampled frames can be referred to as super sampled frames”).
Regarding claim 6, Luo and Petersson teach all the features with respect to claim 4 as outlined above. Further, Luo teaches that the method of claim 4, wherein the trained neural network is provided with a warped interpolated optical flow frame (See Luo: Figs. 1 and 7, and [0174], “FIG. 7 illustrates an example diagram 700 where forward motion candidates are blended, according to at least one embodiment. In at least one embodiment, a previous frame 702 (e.g., previous frame 502), a previous to current intermediate frame 704 based on motion (e.g., previous to current intermediate frame 514), and a previous to current intermediate frame 706 based on flow (e.g., previous to current intermediate frame 616) are blended using blending weights 708, using systems and methods such as those described herein. In at least one embodiment, blending weights 708 are generated by a neural network 714 (e.g., neural network 110 and/or neural network 212, as described herein at least in connection with FIGS. 1 and 2)”. Noe that the a previous to current intermediate frame 706 based on flow is mapped to a warped interpolated optical flow frame),
a warped interpolated motion vector frame (See Luo: Figs. 1 and 7, and [0084], “These frames of the first plurality of frames and second plurality of frames can be referred to as motion warped color frames (or high resolution (HR) motion warped color frames or otherwise) and these frames may have pixel values in an RGB or other color space. It should be noted that, despite this name of “motion warped,” one or more of these motion warped color frames may lack any motion warping, such as described in the next paragraph”. Note that the motion warped color frames is mapped to the warped interpolated motion vector frame),
rendered object depth parameters for at least one of the preceding frame and the following frame (See Luo: Fig. 1, and [0096], “In at least one embodiment, inputs to the neural network 110 comprise one or more frames (e.g., a previous frame 104 and/or a current frame 106) and additional frame information including, but not limited to, depth information of pixels of previous frame 104 and/or current frame 106, motion information of pixels of previous frame 104 and/or current frame 106, camera location and/or orientation, and/or other such information such as that described herein at least in connection with FIGS. 1 and 2. In at least one embodiment, outputs from the neural network 110 blending factors of the one or more intermediate frames”),
a disocclusion mask, or a combination thereof (See Luo: Fig. 7, and [0175], “In at least one embodiment, auxiliary information includes, for example, quality masks, indications as to whether motion vectors and/or flow vectors generate duplicate objects, and/or whether any additional deocclusion occurs when generating blended previous to current intermediate frame 710, depth, motion, occlusion masks, etc. In at least one embodiment, processes illustrated by example diagram 700 continue at example diagram 800 described herein at least in connection with FIG. 8”).
Regarding claim 7, Luo and Petersson teach all the features with respect to claim 1 as outlined above. Further, Petersson teaches that the method of claim 1, wherein creating the first interpolated optical flow data or creating the first interpolated motion vector data, or a combination thereof, further comprises retaining a scattered element for one or more pixels in the interpolated frame having a nearest depth (See Petersson: Figs. 1 and 4, and [0058], “In embodiments, one or more depth values of dilated current depth values 415 are scattered among impacted depth values using, for example, backward or reverse reprojection. Because, in some embodiments, many pixels of the current rendered frame reproject into the same pixel of the interpolated frame 122, previous rendered frame, or both, reconstruct and dilate circuitry 432 uses atomic operations to resolve the value of the nearest depth value for each pixel. As an example, in embodiments, reconstruct and dilate circuitry 432 uses atomic operations including, for example, InterlockedMax or InterlockedMin provided by the High-Level Shader Language (HLSL) or comparable equivalents. According to some embodiments, reconstruct and dilate circuitry 432 performs different atomic operations (e.g., InterlockedMax or InterlockedMin) depending on whether the depth buffer storing dilated current depth values 415 is inverted or non-inverted. Reconstruct and dilate circuitry 432 then stores the reconstructed/determined depth values in a respective depth buffer as estimated previous depth values 405”. Note that the depth value is scattered to the impacted depth values, and this is equivalent to retaining the scattered element in the interpolated frames in the nearest depth).
Regarding claim 8, Luo and Petersson teach all the features with respect to claim 7 as outlined above. Further, Luo teaches that the method of claim 7, wherein creating the first interpolated optical flow data or creating the first interpolated motion vector data, or a combination thereof, further comprises filling any unfilled pixels with a pixel value having the nearest depth from a mask area comprising one or more pixels near the unfilled pixel (See Luo: Fig. 16, and [0202], “In at least one embodiment, at step 1614 of example process 1600, a filter size is determined based, at least in part, on depth and location of neighboring pixels. In at least one embodiment, for example, a filter size is calculated using a maximum distance to neighbors (e.g., max distance to neighbors 1410) as described above. In at least one embodiment, for example, a filter size is calculated using a maximum distance to neighbors that discounts neighbors that are pixels of different objects (e.g., max distance to neighbors 1512) as described above. In at least one embodiment, after step 1614, example process 1600 continues at step 1616”; and [0214], “In at least one embodiment, at step 1618 of example process 1600, a filter generated at step 1616 is applied to a source frame, to inpaint a hole with valid data. In at least one embodiment, for example, if process 1600 executes while a process such as process 300 is processing intermediate frames (e.g., at step 308), process 1600 can inpaint a hole in motion data with valid motion data from neighboring pixels, as described herein. In at least one embodiment, after step 1618, example process 1600 terminates. In at least one embodiment, after step 1618, example continues at step 1604 to select another pixel. In at least one embodiment, after step 1618, example continues at step 1602 to receive another source frame”. Note that the filter size is mapped to the mask, and inpainting the hole is mapped to the filling the unfilled pixels).
Regarding claim 9, Luo and Petersson teach all the features with respect to claim 1 as outlined above. Further, Luo teaches that the method of claim 1, further comprising interpolating or warping the first interpolated optical flow data or the first interpolated motion vector data, or a combination thereof, to a time between the preceding frame and the following frame (See Luo: Fig. 1, and [0120], “In at least one embodiment, an intermediate frame comprises data that, for each pixel in a frame (e.g., said current frame or said previous frame) indicates motion from that frame to a to-be-generated interpolated frame, where the motion is determined according in a way corresponding to said intermediate frame and where each of multiple intermediate frames has this information for each pixel according to a different way of determining the motion. In at least one embodiment, an intermediate frame lacks sufficient information to be rendered as an image, although in some embodiments, intermediate frames can be images. In at last one embodiment, an intermediate frame comprises information to indicate, for each pixel of said intermediate frame, motion from a previous frame to a location in time halfway between said previous frame and a current frame. In at least one embodiment, different ways of determining motion comprise: using motion vectors from a game engine or other source (which may indicate motion of some pixels, but not of other pixels); using motion calculated using standard geometrical techniques based on a change in camera position from a previous frame to a current frame, which may also used depth of pixels which can be provided from said game engine or other source; motion calculated based on an optical flow analysis, and/or motion calculated in other ways. In at least one embodiment, a blending factor indicates a weighted sum of motions of a pixel, where motions to be summed from each of multiple types of motion from multiple respective intermediate frames”. Note that the intermediate frame is at a time halfway between the previous frame and the current frame, and this is mapped to the current claim cited limitations).
Regarding claim 11, Luo and Petersson teach all the features with respect to claim 1 as outlined above. Further, Luo and Petersson teach that a computing device (See Luo: Fig. 1, and [0096], “FIG. 1 illustrates an example diagram 100 where blending factors for frame motion are generated using a neural network, according to at least one embodiment. In at least one embodiment, a processor 102 executes or otherwise performs one or more instructions to use a neural network 110 to generate blending factors of frame motion, using systems and methods such as those described herein”), comprising:
a memory comprising one more storage devices (See Luo: Figs. 17A-B, and [0218], “In at least one embodiment, any portion of code and/or data storage 1701 may be internal or external to one or more processors or other hardware logic devices or circuits. In at least one embodiment, code and/or code and/or data storage 1701 may be cache memory, dynamic randomly addressable memory (“DRAM”), static randomly addressable memory (“SRAM”), non-volatile memory (e.g., flash memory), or other storage. In at least one embodiment, a choice of whether code and/or code and/or data storage 1701 is internal or external to a processor, for example, or comprising DRAM, SRAM, flash or some other storage type may depend on available storage on-chip versus off-chip, latency requirements of training and/or inferencing functions being performed, batch size of data used in inferencing and/or training of a neural network, or some combination of these factors”); and
one or more processors coupled to the memory, the one or more processors operable to execute instructions stored in the memory to, for a rendered image sequence (See Luo: Figs. 17A-B, and [0220], “In at least one embodiment, code, such as graph code, causes the loading of weight or other parameter information into processor ALUs based on an architecture of a neural network to which such code corresponds. In at least one embodiment, any portion of code and/or data storage 1705 may be included with other on-chip or off-chip data storage, including a processor's L1, L2, or L3 cache or system memory. In at least one embodiment, any portion of code and/or data storage 1705 may be internal or external to one or more processors or other hardware logic devices or circuits. In at least one embodiment, code and/or data storage 1705 may be cache memory, DRAM, SRAM, non-volatile memory (e.g., flash memory), or other storage. In at least one embodiment, a choice of whether code and/or data storage 1705 is internal or external to a processor, for example, or comprising DRAM, SRAM, flash memory or some other storage type may depend on available storage on-chip versus off-chip, latency requirements of training and/or inferencing functions being performed, batch size of data used in inferencing and/or training of a neural network, or some combination of these factors”):
create first interpolated optical flow data based, at least in part, on an optical flow from a preceding frame, a following frame, or a combination thereof (See Luo: Figs. 4-6, and [0170], “FIG. 6 illustrates an example diagram 600 where optical flow analysis is used to generate intermediate frames, according to at least one embodiment. In at least one embodiment, a current frame 602 (which is a current frame such as current frame 402, described herein at least in connection with FIG. 4) and a previous frame 606 (which is a previous frame such as previous frame 502, described herein at least in connection with FIG. 5) are used as input to optical flow 610. In at least one embodiment, current frame 602 includes a dynamic object 604 (and a shadow) as described herein at least in connection with FIG. 4 and previous frame 606 includes a dynamic object 608 (and a shadow) as described herein at least in connection with FIG. 5. In at least one embodiment, optical flow 610 moves contents of previous frame 606 to previous to current intermediate frame 616 based on flow. In at least one embodiment, optical flow 610 moves contents of current frame 602 to current to previous intermediate frame 624 based on flow”. Note that the optical flow 610 from the previous frame and the current frame are mapped to the optical flow from the previous frame and the optical flow from the following frame; and the previous to current intermediate frame 616 and current to previous intermediate frame 624 are mapped to the first interpolated optical flow data),
the first interpolated optical flow data having a resolution reduced relative to the preceding frame, the following frame, or a combination thereof (See Luo: Fig. 6, and [0106], “In at least one embodiment, processor 102 pre-processes frames 126 to generate one or more pre-processed frames (e.g., performs conversion and downsampling and uses only a luma channel of the YUV color space to generate lower resolution (LR) luma motion warped frames) as described above. In at least one embodiment, pre-processed frames 128 (e.g., converted and downsampled frames) are provided as input to neural network 110, and neural network uses pre-processed frames to generate blending factors 112 and output blending factors 114, as described herein. In at least one embodiment, neural network 110 neural network 110 uses pre-processed frames 128 to generate one or more blending factors 112, using techniques, systems, and methods such as those described herein”. Note that the lower resolution luma motion warped frames is mapped to the first interpolated optical flow data having a resolution reduced);
create first interpolated motion vector data (See Luo: Figs. 4-5, and [0166], “In at least one embodiment, a scattered motion vector is used to motion vector warp 410 dynamic object 404 to a current to previous intermediate frame 412 based on motion. In at least one embodiment, a motion vector warp 410 of a dynamic object to an intermediate frame such as current to previous intermediate frame 412 based on motion, transforms dynamic object 404 to a position in current to previous intermediate frame 412 based on motion, by applying one or more motion vectors to dynamic object 404”; and [0169], “In at least one embodiment, a scattered forward motion vector is used to motion vector warp 512 dynamic object 504 to a previous to current intermediate frame 514 based on motion”. Note that the a current to previous intermediate frame 412 motion and a previous to current intermediate frame 514 motion is mapped to the first interpolated motion vector data) based, at least in part,
on motion vectors from a preceding frame (See Luo: Figs. 4-5, and [0168], “In at least one embodiment, forward motion vectors 508 are calculated, using systems and methods such as those described herein. In at least one embodiment, forward motion vectors 508 are calculated based on one or more current frame motion vectors 506”. Note that the forward motion vectors 508 is mapped to the motion vector from the preceding frame),
a following frame, or a combination thereof (See Luo: Fig. 4, and [0163], “In at least one embodiment, one or more current frame motion vectors 406 describe motion of objects such as dynamic object 404. In at least one embodiment, current frame motion vectors 406 describe forward motion (e.g., motion from a previous frame) of dynamic objects such as dynamic object 404, as described herein”. Note that the current motion vector 406 is mapped to the motion vector of the following frame),
the first interpolated motion vector data having a resolution reduced relative to the preceding frame, the following frame, or the combination thereof (See Luo: Figs. 1-5, and [0086], “In at least one embodiment, said processor or other processor performs conversion and downsampling and uses only a luma channel of the YUV color space to generate lower resolution (LR) luma motion warped frames, where an LR luma motion warped frame (e.g., an LR frame with only luma values from the YUV color space). In at least one embodiment, this or other processor performs said downsampling to match a resolution of frames output by said game engine or other video provider”. Note that the lower resolution luma motion warped frames is mapped to the first interpolated motion vector data having a resolution reduced);
determine a motion vector nearest in depth from among the first interpolated motion vector data (See Petersson: Fig. 4, and [0054], “In embodiments, dilated current depth values 415 and dilated motion vectors 455 each include data emphasizing the edges of geometry (e.g., images, graphics objects) in the current rendered frame as represented by current depth values 435 as stored in one or more respective depth buffers 222. These edges of geometry, for example, often introduce discontinuities into a contiguous series of depth values. Therefore, as the depth values and motion vectors are dilated, they naturally follow the contours of the geometric edges present in current depth values 435 as stored in one or more respective depth buffers 222. According to embodiments, reconstruct and dilate circuitry 432 is configured to compute dilated current depth values 415 and dilated motion vectors 455 by considering the depth values of a respective pixel neighborhood around each pixel of the current frame as indicated by current depth values 435. Such a pixel neighborhood, for example, includes a first number of pixels in a first direction (e.g., 3) and a second number of pixels in a second direction (e.g., 3) with a corresponding pixel being in the center (e.g., the pixel around which the pixel neighborhood is being considered). Within the depth values of the pixels in a pixel neighborhood, reconstruct and dilate circuitry 432 selects the depth values and corresponding motion vectors of the pixel where the depth value is nearest (e.g., appears closest) to a viewpoint of the scene represented by the current rendered frame. Reconstruct and dilate circuitry 432 then updates the pixel in the center of the pixel neighborhood with the selected depth value and selects the corresponding motion vector 103”) or an optical flow nearest in depth from among the first interpolated optical flow data, or a combination thereof, for each pixel of an interpolated frame (See Luo: Fig. 15-16, and [0204], “In at least one embodiment, if a depth difference of a neighbor 1508 is greater than a threshold value (e.g., a depth difference greater than 1, in FIG. 15), said neighbor is ignored when determining a max distance to neighbors, as described herein. In at least one embodiment, for example, in destination frame 1510, after motion, neighbor “A” is (4,0) away from a pixel, neighbor “D” is (4,1) away from said pixel, and neighbor “F” is (4,2) away from said pixel. In at least one embodiment, max distance to neighbors 1512 is (1,1) because neighbor “A”, neighbor “D”, and neighbor “F” are ignored when determining max distance to neighbors 1512 is (1,1) because they are at a different depth (e.g., that is greater than a threshold). In at least one embodiment, not illustrated in FIG. 15, if a threshold of 10 is chosen, then max distance to neighbors is (4,2). In at least one embodiment, different threshold values can be selected based, at least in part, on contents of source frames”; [0211], “In at least one embodiment, at step 1612 of example process 1600, calculates depth for a pixel selected at step 1604 that does not have valid data and that also does not have valid depth data. In at least one embodiment, at step 1612, depth is calculated using one or more neighbor pixels. In at least one embodiment, depth is calculated as a heuristic (e.g., estimated). In at least one embodiment, depth is given a default value (e.g., a very high depth value) in instances when depth cannot be calculated. In at least one embodiment, after step 1612, example process 1600 continues at step 1614”; and [0082], “In at least one embodiment, to calculate a value of a pixel in the intermediate image, values of one or more nearby pixels with similar depth (e.g., depth within a threshold of a depth value corresponding to a pixel location to be filled) is used. In at least one embodiment, for example, pixel values can be averaged or summed with a weighted average where weights of a sum depend on how far the depth is from the depth of the pixel whose value is being calculated”. Note that the depth is calculated, compared to a threshold, discarded the pixel if the threshold is exceeded, and this could be broadly mapped to the nearest in depth optical flow and the nearest in depth motion vector, but a secondary art will be searched and cited); and
selectively gather one or more color signal values for at least some pixels in the interpolated frame from the preceding frame or the following frame, or a combination thereof, based, at least in part, on the determined at least one nearest in depth motion vector, or at least one nearest in depth optical flow, or a combination thereof (See Luo: Fig. 3, and [0156], “In at least one embodiment, at step 314 of example process 300, one or more intermediate frames are blended to generate one or more interpolated frames using systems and methods such as those described herein at least in connection with FIG. 2. In at least one embodiment, at step 314, one or more interpolated frames are generated by, for example, blending contents of one or more post-processed frames (e.g., frames post-processed at step 312). In at least one embodiment, for example, if there are two frames generated at step 312, at step 314, an interpolated frame is generated by combining pixels from a first frame generated at step 312 with pixels of a second frame generated at step 312 (e.g., pixels of an interpolated frame will be generated by blending colors and/or other information from frames generated at step 312). In at least one embodiment, not illustrated in FIG. 3, an interpolated frame is generated based, at least in part, on one or more blending weights such as those described herein. In at least one embodiment, after step 314, example process 300 continues at step 316”. Note that the pixel values of the interpolated frame are obtained by blending the intermediate frames, and this is mapped to the claimed cited limitations of “selectively gathering one or more color signal values for at least some pixels in the interpolated frame from the preceding frame or the following frame, or a combination thereof, based, at least in part, on the determined at least one nearest in depth motion vector, or at least one nearest in depth optical flow, or a combination thereof”).
Regarding claim 12, Luo and Petersson teach all the features with respect to claim 11 as outlined above. Further, Luo teaches that the e computing device of claim 11, the one or more processors further operable to execute instructions stored in the memory to select between the first interpolated optical flow data and the first interpolated motion vector data for at least one pixel in the interpolated frame to provide a selected first interpolated optical flow data or first interpolated motion vector data, and to use the selected first interpolated optical flow data or first interpolated motion vector data to gather at least one color signal value for the at least one pixel (See Luo: Fig. 7, and [0174], “FIG. 7 illustrates an example diagram 700 where forward motion candidates are blended, according to at least one embodiment. In at least one embodiment, a previous frame 702 (e.g., previous frame 502), a previous to current intermediate frame 704 based on motion (e.g., previous to current intermediate frame 514), and a previous to current intermediate frame 706 based on flow (e.g., previous to current intermediate frame 616) are blended using blending weights 708, using systems and methods such as those described herein. In at least one embodiment, blending weights 708 are generated by a neural network 714 (e.g., neural network 110 and/or neural network 212, as described herein at least in connection with FIGS. 1 and 2)”. Note that blending with weights from frame motion and optical flow is mapped to select from the interpolated optical data or the interpolated motion vector data).
Regarding claim 13, Luo and Petersson teach all the features with respect to claim 11 as outlined above. Further, Luo teaches that the computing device of claim 11, the one or more processors further operable to execute instructions stored in the memory to blend color signal values from the preceding frame and the following frame based, at least in part, on at least one computed blending value, a warped interpolated optical flow data, and a warped interpolated motion vector data for one or more pixels in the interpolated frame (See Luo: Fig. 9, and [0178], “FIG. 9 illustrates an example diagram 900 where an interpolated frame is generated, according to at least one embodiment. In at least one embodiment, a blended previous to current intermediate frame 902 (e.g., blended previous to current intermediate frame 710) and a blended current to previous intermediate frame 904 (e.g., blended current to previous intermediate frame 810) are blended 906 using systems and methods such as those described herein at least in connection with FIGS. 2 and 3 to generate one or more interpolated frames 908 (e.g., to generate one or more interpolated frames 220, described herein at least in connection with FIG. 2). In at least one embodiment, generating one or more interpolated frames 908 is generating interpolated frame 120, described herein at least in connection with FIG. 1. In at least one embodiment, generating one or more interpolated frames 908 includes post-processing frames 218 and/or generate interpolated frame(s) 220, described herein at least in connection with FIG. 2”. Note that blending the blended previous to current intermediate frame 902 with the blended current to previous intermediate frame 904 to generate the interpolated frames 908 is mapped to the current claim cited limitation).
Regarding claim 14, Luo and Petersson teach all the features with respect to claim 13 as outlined above. Further, Luo teaches that the computing device of claim 13, wherein the at least one blending value is predicted using a trained neural network (See Luo: Figs. 59-60, and [0655], “FIG. 59 illustrates a super sampling neural network, in accordance with at least one embodiment. In at least one embodiment, a neural network 5906 is referred to as a super sampling neural network, deep learning super sampling (DLSS) network, super sampling network, and/or variations thereof. In at least one embodiment, an input frame 5902 and motion vectors 5904 are processed by a neural network 5906 to generate an output frame 5908. In at least one embodiment, neural networks such as those described in connection with FIGS. 59-63 are DLSS networks”; and [0663], “FIG. 60 illustrates an architecture of a super sampling neural network, in accordance with at least one embodiment. In at least one embodiment, a neural network 6006 is referred to as a super sampling neural network, DLSS network, super sampling network, and/or variations thereof. In at least one embodiment, a neural network 6006 is trained to generate output frames 6008 from input frames 6002 and motion vectors 6004. In at least one embodiment, as part of training a neural network 6006, output frames 6008 generated by a neural network 6006 are compared with reference frames 6010 to update neural network 6006”. Note that the neural network techniques are used in generation the interpolated frames based on the motion vectors and optical flow in the primary art).
Regarding claim 15, Luo and Petersson teach all the features with respect to claim 14 as outlined above. Further, Luo teaches that the computing device of claim 14, wherein the at least one predicted blending value is at lower spatial resolution than a spatial resolution of the interpolated frame, and the predicted blending value is upsampled to the spatial resolution of the interpolated frame . (See Luo: Fig. 1, and [0083], “In at least one embodiment, a game engine (such as noted above and elsewhere herein) or other provider of video generates or otherwise provides video frames which include two successive frames (referred to respectively as a previous frame and a current frame, even though the words “previous” and “current” refer to frames between which one or more frames are to be generated where the words may not be accurate adjectives in some contexts). In at least one embodiment, said processor or another processor (such as processor 102 described below in FIG. 1) performs spatial upsampling (e.g., using a neural network technique such as described below or without a neural network) of previous frame and current frame to increase resolution of the previous and current frame (e.g., from 1080p to 4K or from 4K to 8K or otherwise) although, in some embodiments, upsampling is not applied. Upsampling can be referred to also as super sampling and upsampled frames can be referred to as super sampled frames”)
Regarding claim 16, Luo and Petersson teach all the features with respect to claim 14 as outlined above. Further, Luo teaches that the e computing device of claim 14, wherein the trained neural network is provided with a warped interpolated optical flow frame (See Luo: Figs. 1 and 7, and [0174], “FIG. 7 illustrates an example diagram 700 where forward motion candidates are blended, according to at least one embodiment. In at least one embodiment, a previous frame 702 (e.g., previous frame 502), a previous to current intermediate frame 704 based on motion (e.g., previous to current intermediate frame 514), and a previous to current intermediate frame 706 based on flow (e.g., previous to current intermediate frame 616) are blended using blending weights 708, using systems and methods such as those described herein. In at least one embodiment, blending weights 708 are generated by a neural network 714 (e.g., neural network 110 and/or neural network 212, as described herein at least in connection with FIGS. 1 and 2)”. Noe that the a previous to current intermediate frame 706 based on flow is mapped to a warped interpolated optical flow frame),
a warped interpolated motion vector frame (See Luo: Figs. 1 and 7, and [0084], “These frames of the first plurality of frames and second plurality of frames can be referred to as motion warped color frames (or high resolution (HR) motion warped color frames or otherwise) and these frames may have pixel values in an RGB or other color space. It should be noted that, despite this name of “motion warped,” one or more of these motion warped color frames may lack any motion warping, such as described in the next paragraph”. Note that the motion warped color frames is mapped to the warped interpolated motion vector frame),
rendered object depth parameters for at least one of the preceding frame and the following frame (See Luo: Fig. 1, and [0096], “In at least one embodiment, inputs to the neural network 110 comprise one or more frames (e.g., a previous frame 104 and/or a current frame 106) and additional frame information including, but not limited to, depth information of pixels of previous frame 104 and/or current frame 106, motion information of pixels of previous frame 104 and/or current frame 106, camera location and/or orientation, and/or other such information such as that described herein at least in connection with FIGS. 1 and 2. In at least one embodiment, outputs from the neural network 110 blending factors of the one or more intermediate frames”),
a disocclusion mask, or a combination thereof (See Luo: Fig. 7, and [0175], “In at least one embodiment, auxiliary information includes, for example, quality masks, indications as to whether motion vectors and/or flow vectors generate duplicate objects, and/or whether any additional deocclusion occurs when generating blended previous to current intermediate frame 710, depth, motion, occlusion masks, etc. In at least one embodiment, processes illustrated by example diagram 700 continue at example diagram 800 described herein at least in connection with FIG. 8”).
Regarding claim 17, Luo and Petersson teach all the features with respect to claim 11 as outlined above. Further, Petersson teaches that the computing device of claim 11, wherein creating the first interpolated optical flow data or creating the first interpolated motion vector data, or a combination thereof, further comprises retaining a scattered element for one or more pixels in the interpolated frame having a nearest depth (See Petersson: Figs. 1 and 4, and [0058], “In embodiments, one or more depth values of dilated current depth values 415 are scattered among impacted depth values using, for example, backward or reverse reprojection. Because, in some embodiments, many pixels of the current rendered frame reproject into the same pixel of the interpolated frame 122, previous rendered frame, or both, reconstruct and dilate circuitry 432 uses atomic operations to resolve the value of the nearest depth value for each pixel. As an example, in embodiments, reconstruct and dilate circuitry 432 uses atomic operations including, for example, InterlockedMax or InterlockedMin provided by the High-Level Shader Language (HLSL) or comparable equivalents. According to some embodiments, reconstruct and dilate circuitry 432 performs different atomic operations (e.g., InterlockedMax or InterlockedMin) depending on whether the depth buffer storing dilated current depth values 415 is inverted or non-inverted. Reconstruct and dilate circuitry 432 then stores the reconstructed/determined depth values in a respective depth buffer as estimated previous depth values 405”. Note that the depth value is scattered to the impacted depth values, and this is equivalent to retaining the scattered element in the interpolated frames in the nearest depth).
Regarding claim 18, Luo and Petersson teach all the features with respect to claim 17 as outlined above. Further, Luo teaches that the computing device of claim 17, wherein creating the first interpolated optical flow data or creating the first interpolated motion vector data, or a combination thereof, further comprises filling any unfilled pixels with a pixel value having the nearest depth from a mask area comprising one or more pixels near the unfilled pixel (See Luo: Fig. 16, and [0202], “In at least one embodiment, at step 1614 of example process 1600, a filter size is determined based, at least in part, on depth and location of neighboring pixels. In at least one embodiment, for example, a filter size is calculated using a maximum distance to neighbors (e.g., max distance to neighbors 1410) as described above. In at least one embodiment, for example, a filter size is calculated using a maximum distance to neighbors that discounts neighbors that are pixels of different objects (e.g., max distance to neighbors 1512) as described above. In at least one embodiment, after step 1614, example process 1600 continues at step 1616”; and [0214], “In at least one embodiment, at step 1618 of example process 1600, a filter generated at step 1616 is applied to a source frame, to inpaint a hole with valid data. In at least one embodiment, for example, if process 1600 executes while a process such as process 300 is processing intermediate frames (e.g., at step 308), process 1600 can inpaint a hole in motion data with valid motion data from neighboring pixels, as described herein. In at least one embodiment, after step 1618, example process 1600 terminates. In at least one embodiment, after step 1618, example continues at step 1604 to select another pixel. In at least one embodiment, after step 1618, example continues at step 1602 to receive another source frame”. Note that the filter size is mapped to the mask, and inpainting the hole is mapped to the filling the unfilled pixels).
Regarding claim 20, Luo and Petersson teach all the features with respect to claim 1 as outlined above. Further, Luo and Petersson teach that an article comprising a non-transitory computer-readable medium to store computer-readable hardware description language code for fabrication of a device, the device (See Luo: Fig. 1, and [0096], “FIG. 1 illustrates an example diagram 100 where blending factors for frame motion are generated using a neural network, according to at least one embodiment. In at least one embodiment, a processor 102 executes or otherwise performs one or more instructions to use a neural network 110 to generate blending factors of frame motion, using systems and methods such as those described herein”) comprising:
an optical flow processing unit operable to create first interpolated optical flow data based, at least in part, on an optical flow from a preceding frame, a following frame, or a combination thereof (See Luo: Figs. 4-6, and [0170], “FIG. 6 illustrates an example diagram 600 where optical flow analysis is used to generate intermediate frames, according to at least one embodiment. In at least one embodiment, a current frame 602 (which is a current frame such as current frame 402, described herein at least in connection with FIG. 4) and a previous frame 606 (which is a previous frame such as previous frame 502, described herein at least in connection with FIG. 5) are used as input to optical flow 610. In at least one embodiment, current frame 602 includes a dynamic object 604 (and a shadow) as described herein at least in connection with FIG. 4 and previous frame 606 includes a dynamic object 608 (and a shadow) as described herein at least in connection with FIG. 5. In at least one embodiment, optical flow 610 moves contents of previous frame 606 to previous to current intermediate frame 616 based on flow. In at least one embodiment, optical flow 610 moves contents of current frame 602 to current to previous intermediate frame 624 based on flow”. Note that the optical flow 610 from the previous frame and the current frame are mapped to the optical flow from the previous frame and the optical flow from the following frame; and the previous to current intermediate frame 616 and current to previous intermediate frame 624 are mapped to the first interpolated optical flow data),
the first interpolated optical flow data having a resolution reduced relative to the preceding frame, the following frame, or a combination thereof (See Luo: Fig. 6, and [0106], “In at least one embodiment, processor 102 pre-processes frames 126 to generate one or more pre-processed frames (e.g., performs conversion and downsampling and uses only a luma channel of the YUV color space to generate lower resolution (LR) luma motion warped frames) as described above. In at least one embodiment, pre-processed frames 128 (e.g., converted and downsampled frames) are provided as input to neural network 110, and neural network uses pre-processed frames to generate blending factors 112 and output blending factors 114, as described herein. In at least one embodiment, neural network 110 neural network 110 uses pre-processed frames 128 to generate one or more blending factors 112, using techniques, systems, and methods such as those described herein”. Note that the lower resolution luma motion warped frames is mapped to the first interpolated optical flow data having a resolution reduced);
a motion vector processing unit operable to create first interpolated motion vector data (See Luo: Figs. 4-5, and [0166], “In at least one embodiment, a scattered motion vector is used to motion vector warp 410 dynamic object 404 to a current to previous intermediate frame 412 based on motion. In at least one embodiment, a motion vector warp 410 of a dynamic object to an intermediate frame such as current to previous intermediate frame 412 based on motion, transforms dynamic object 404 to a position in current to previous intermediate frame 412 based on motion, by applying one or more motion vectors to dynamic object 404”; and [0169], “In at least one embodiment, a scattered forward motion vector is used to motion vector warp 512 dynamic object 504 to a previous to current intermediate frame 514 based on motion”. Note that the a current to previous intermediate frame 412 motion and a previous to current intermediate frame 514 motion is mapped to the first interpolated motion vector data) based, at least in part,
on motion vectors from a preceding frame (See Luo: Figs. 4-5, and [0168], “In at least one embodiment, forward motion vectors 508 are calculated, using systems and methods such as those described herein. In at least one embodiment, forward motion vectors 508 are calculated based on one or more current frame motion vectors 506”. Note that the forward motion vectors 508 is mapped to the motion vector from the preceding frame),
a following frame, or a combination thereof (See Luo: Fig. 4, and [0163], “In at least one embodiment, one or more current frame motion vectors 406 describe motion of objects such as dynamic object 404. In at least one embodiment, current frame motion vectors 406 describe forward motion (e.g., motion from a previous frame) of dynamic objects such as dynamic object 404, as described herein”. Note that the current motion vector 406 is mapped to the motion vector of the following frame),
the first interpolated motion vector data having a resolution reduced relative to the preceding frame, the following frame, or the combination thereof (See Luo: Figs. 1-5, and [0086], “In at least one embodiment, said processor or other processor performs conversion and downsampling and uses only a luma channel of the YUV color space to generate lower resolution (LR) luma motion warped frames, where an LR luma motion warped frame (e.g., an LR frame with only luma values from the YUV color space). In at least one embodiment, this or other processor performs said downsampling to match a resolution of frames output by said game engine or other video provider”. Note that the lower resolution luma motion warped frames is mapped to the first interpolated motion vector data having a resolution reduced);
a scatter processing unit operable to determine a motion vector nearest in depth from among the first interpolated motion vector data (See Petersson: Fig. 4, and [0054], “In embodiments, dilated current depth values 415 and dilated motion vectors 455 each include data emphasizing the edges of geometry (e.g., images, graphics objects) in the current rendered frame as represented by current depth values 435 as stored in one or more respective depth buffers 222. These edges of geometry, for example, often introduce discontinuities into a contiguous series of depth values. Therefore, as the depth values and motion vectors are dilated, they naturally follow the contours of the geometric edges present in current depth values 435 as stored in one or more respective depth buffers 222. According to embodiments, reconstruct and dilate circuitry 432 is configured to compute dilated current depth values 415 and dilated motion vectors 455 by considering the depth values of a respective pixel neighborhood around each pixel of the current frame as indicated by current depth values 435. Such a pixel neighborhood, for example, includes a first number of pixels in a first direction (e.g., 3) and a second number of pixels in a second direction (e.g., 3) with a corresponding pixel being in the center (e.g., the pixel around which the pixel neighborhood is being considered). Within the depth values of the pixels in a pixel neighborhood, reconstruct and dilate circuitry 432 selects the depth values and corresponding motion vectors of the pixel where the depth value is nearest (e.g., appears closest) to a viewpoint of the scene represented by the current rendered frame. Reconstruct and dilate circuitry 432 then updates the pixel in the center of the pixel neighborhood with the selected depth value and selects the corresponding motion vector 103”) or an optical flow nearest in depth from among the first interpolated optical flow data, or a combination thereof, for each pixel of an interpolated frame (See Luo: Fig. 15-16, and [0204], “In at least one embodiment, if a depth difference of a neighbor 1508 is greater than a threshold value (e.g., a depth difference greater than 1, in FIG. 15), said neighbor is ignored when determining a max distance to neighbors, as described herein. In at least one embodiment, for example, in destination frame 1510, after motion, neighbor “A” is (4,0) away from a pixel, neighbor “D” is (4,1) away from said pixel, and neighbor “F” is (4,2) away from said pixel. In at least one embodiment, max distance to neighbors 1512 is (1,1) because neighbor “A”, neighbor “D”, and neighbor “F” are ignored when determining max distance to neighbors 1512 is (1,1) because they are at a different depth (e.g., that is greater than a threshold). In at least one embodiment, not illustrated in FIG. 15, if a threshold of 10 is chosen, then max distance to neighbors is (4,2). In at least one embodiment, different threshold values can be selected based, at least in part, on contents of source frames”; [0211], “In at least one embodiment, at step 1612 of example process 1600, calculates depth for a pixel selected at step 1604 that does not have valid data and that also does not have valid depth data. In at least one embodiment, at step 1612, depth is calculated using one or more neighbor pixels. In at least one embodiment, depth is calculated as a heuristic (e.g., estimated). In at least one embodiment, depth is given a default value (e.g., a very high depth value) in instances when depth cannot be calculated. In at least one embodiment, after step 1612, example process 1600 continues at step 1614”; and [0082], “In at least one embodiment, to calculate a value of a pixel in the intermediate image, values of one or more nearby pixels with similar depth (e.g., depth within a threshold of a depth value corresponding to a pixel location to be filled) is used. In at least one embodiment, for example, pixel values can be averaged or summed with a weighted average where weights of a sum depend on how far the depth is from the depth of the pixel whose value is being calculated”. Note that the depth is calculated, compared to a threshold, discarded the pixel if the threshold is exceeded, and this could be broadly mapped to the nearest in depth optical flow and the nearest in depth motion vector, but a secondary art will be searched and cited); and
a gather processing unit operable to selectively gather one or more color signal values for at least some pixels in the interpolated frame from the preceding frame or the following frame, or a combination thereof, based, at least in part, on the determined at least one nearest in depth motion vector, or at least one nearest in depth optical flow, or a combination thereof (See Luo: Fig. 3, and [0156], “In at least one embodiment, at step 314 of example process 300, one or more intermediate frames are blended to generate one or more interpolated frames using systems and methods such as those described herein at least in connection with FIG. 2. In at least one embodiment, at step 314, one or more interpolated frames are generated by, for example, blending contents of one or more post-processed frames (e.g., frames post-processed at step 312). In at least one embodiment, for example, if there are two frames generated at step 312, at step 314, an interpolated frame is generated by combining pixels from a first frame generated at step 312 with pixels of a second frame generated at step 312 (e.g., pixels of an interpolated frame will be generated by blending colors and/or other information from frames generated at step 312). In at least one embodiment, not illustrated in FIG. 3, an interpolated frame is generated based, at least in part, on one or more blending weights such as those described herein. In at least one embodiment, after step 314, example process 300 continues at step 316”. Note that the pixel values of the interpolated frame are obtained by blending the intermediate frames, and this is mapped to the claimed cited limitations of “selectively gathering one or more color signal values for at least some pixels in the interpolated frame from the preceding frame or the following frame, or a combination thereof, based, at least in part, on the determined at least one nearest in depth motion vector, or at least one nearest in depth optical flow, or a combination thereof”).
Claims 10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Luo, etc. (US 20250106355 A1) in view of Petersson (US 20250069319 A1), further in view of Jiang, etc. (US 20190138889 A1).
Regarding claim 10, Luo and Petersson teach all the features with respect to claim 1 as outlined above. Further, Luo teaches the method of claim 1, further comprising:
creating second interpolated motion vector data based, at least in part, on motion vectors from a preceding frame or a following frame such that one of the first interpolated motion vector data and the second interpolated motion vector data are based on the preceding frame and the other of the first interpolated motion vector data and second interpolated motion vector data are based on the following frame, the second interpolated motion vector data having a resolution reduced relative to the preceding frame or the following frame (See Luo: Figs. 1-2, and [0084], “In at least one embodiment, said processor or another processor generates, from upsampled current frame and from upsampled previous frame, a first plurality of frames and a second plurality of frames that have the same resolution as said upsampled previous and current frame and upsampled previous frame (e.g., 4K or 8K). In at least one embodiment. These frames of the first plurality of frames and second plurality of frames can be referred to as motion warped color frames (or high resolution (HR) motion warped color frames or otherwise) and these frames may have pixel values in an RGB or other color space. It should be noted that, despite this name of “motion warped,” one or more of these motion warped color frames may lack any motion warping, such as described in the next paragraph”. Note that the second motion warped color frames are mapped to the second interpolated motion vector data).
However, Luo, modified by Petersson, fails to explicitly disclose that the method of claim 1, further comprising: creating second interpolated optical flow data based, at least in part, on optical flow from a preceding frame or a following frame, such that one of the first interpolated optical flow data and second interpolated optical flow data are based on the preceding frame and the other of the first interpolated optical flow data and the second interpolated optical flow data are based on the following frame, the second interpolated optical flow data having a resolution reduced relative to the preceding frame or the following frame; and using the first interpolated optical flow data, the second interpolated optical flow data, the first interpolated motion vector data, and the second interpolated motion vector data in determining at least one closest motion vector or a closest optical flow, or a combination thereof, for one or more pixels of an interpolated frame and in selectively gathering one or more color signal values for the one or more pixels in the interpolated frame from the preceding frame or the following frame, or a combination thereof, based, at least in part, on the determined at least one closest motion vector or at least one closest optical flow, or a combination thereof.
However, Jiang teaches that creating second interpolated optical flow data based, at least in part, on optical flow from a preceding frame or a following frame, such that one of the first interpolated optical flow data and second interpolated optical flow data are based on the preceding frame and the other of the first interpolated optical flow data and the second interpolated optical flow data are based on the following frame, the second interpolated optical flow data having a resolution reduced relative to the preceding frame or the following frame (See Jiang: Figs. 1A-F and 2A-D, and [0030], “Bi-directional optical flows (F.sub.1.fwdarw.0, F.sub.0.fwdarw.1) are computed based on the input frames (I.sub.0, I.sub.1). The bi-directional optical flows are linearly combined to approximate intermediate bi-directional optical flows ({circumflex over (F)}.sub.t.fwdarw.0, {circumflex over (F)}.sub.t.fwdarw.1) for at least one timestep t between the two input frames. The input frames are each warped (backward) by the optical flow warping units 105-0 and 105-1 according to the approximated intermediate bi-directional optical flows for each timestep to produce warped input frames Î.sub.0.fwdarw.t and Î.sub.1.fwdarw.t.”; and [0072], “Furthermore, intermediate frames may be predicted by frame interpolation systems 100, 150, and 200 for a plain video without requiring reference images or supervised training. Finally, the frame interpolation systems 100, 150, and 200 may be extended to predict multiple intermediate frames in parallel. In contrast, conventional frame interpolation techniques rely on recursion to compute more than one intermediate frame, and, therefore, can only predict one intermediate frame at a time”. Note that the repeated the same processes in software are not novel, Jiang teaches that multiple interpolated frames between two input frames (the previous frame and the current frame) are generated and inserted into the video frames with the recursive algorithms or the multiple timesteps from the first input frame to the second frame using the optical flow and motion vectors, thus, when the first interpolated frame is generated, the first optical flow and motion vector are generated and used to generate the first interpolated frame. When the repeated processes are used to generate the second or more interpolated frames, these repeated optical flow and motion vectors, etc., are mapped to the second interpolated optical flow and motion vectors); and
using the first interpolated optical flow data, the second interpolated optical flow data, the first interpolated motion vector data, and the second interpolated motion vector data in determining at least one closest motion vector or a closest optical flow, or a combination thereof, for one or more pixels of an interpolated frame and in selectively gathering one or more color signal values for the one or more pixels in the interpolated frame from the preceding frame or the following frame, or a combination thereof, based, at least in part, on the determined at least one closest motion vector or at least one closest optical flow, or a combination thereof (See Jiang: Figs. 1A-F and 2A-D, and [0048], “The soft visibility maps are applied to the warped input frames before the warped input frames are linearly fused by the image prediction unit 155 to produce the intermediate frame for each timestep. Combining the temporal consistency and occlusion reasoning, enables computation of the interpolated frame a”; and [0072], “Furthermore, intermediate frames may be predicted by frame interpolation systems 100, 150, and 200 for a plain video without requiring reference images or supervised training. Finally, the frame interpolation systems 100, 150, and 200 may be extended to predict multiple intermediate frames in parallel. In contrast, conventional frame interpolation techniques rely on recursion to compute more than one intermediate frame, and, therefore, can only predict one intermediate frame at a time”. Note that Jing teaches that multiple interpolated frames between two input frames are generated, and when the second or more interpolated frames between the two input frames are generated, the same processes and techniques are used, and they are mapped to the current claim cited limitations).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention was effectively filed to modify Luo to have the method of claim 1, further comprising: creating second interpolated optical flow data based, at least in part, on optical flow from a preceding frame or a following frame, such that one of the first interpolated optical flow data and second interpolated optical flow data are based on the preceding frame and the other of the first interpolated optical flow data and the second interpolated optical flow data are based on the following frame, the second interpolated optical flow data having a resolution reduced relative to the preceding frame or the following frame; and using the first interpolated optical flow data, the second interpolated optical flow data, the first interpolated motion vector data, and the second interpolated motion vector data in determining at least one closest motion vector or a closest optical flow, or a combination thereof, for one or more pixels of an interpolated frame and in selectively gathering one or more color signal values for the one or more pixels in the interpolated frame from the preceding frame or the following frame, or a combination thereof, based, at least in part, on the determined at least one closest motion vector or at least one closest optical flow, or a combination thereof. as taught by Jiang in order to enable distorting the first frame according to the forward optical flow data, and distorting the second frame according to the backward optical flow data, so that multiple intermediate single images of the video interpolation process can be produced in a simple manner (See Jiang: Fig. 1, and [0050], “Since none of parameters (e.g., weights) of the frame interpolation system 150 learned during training are time-dependent, the frame interpolation system 150 is able to produce as many intermediate frames as needed. In an embodiment, the frame interpolation system 150 can generate as many intermediate frames as needed in parallel”). Luo teaches a method and system that may generate the interpolated frames based on the depth information and the optical flow data and motion vector data; while Jiang teaches a system and method that may generate multiple interpolated frames between two input frames using recursive algorithms or the better timesteps (multiple timesteps from the first input frame to the second input frame) technique by calculating the optical flow and motion vector for these multiple interpolated frames. Therefore, it is obvious to one of ordinary skill in the art to modify Luo by Jiang to use the same processes (optical flow, motion vector, and pixel values selecting based on the nearest in depth optical flow and motion vector) to generate multiple interpolated frames. The motivation to modify Luo by Jiang is “Use of known technique to improve similar devices (methods, or products) in the same way”.
Regarding claim 19, Luo and Petersson teach all the features with respect to claim 11 as outlined above. Further, Luo and Jiang teach that the computing device of claim 11, the one or more processors further operable to execute instructions stored in the memory to:
create second interpolated optical flow data based, at least in part, on optical flow from a preceding frame or a following frame such that one of the first interpolated optical flow data and the second interpolated optical flow data are based on the preceding frame and the other of the first interpolated optical flow data and the second interpolated optical flow data are based on the following frame, the second interpolated optical flow data having a resolution reduced relative to the preceding frame or the following frame (See Jiang: Figs. 1A-F and 2A-D, and [0030], “Bi-directional optical flows (F.sub.1.fwdarw.0, F.sub.0.fwdarw.1) are computed based on the input frames (I.sub.0, I.sub.1). The bi-directional optical flows are linearly combined to approximate intermediate bi-directional optical flows ({circumflex over (F)}.sub.t.fwdarw.0, {circumflex over (F)}.sub.t.fwdarw.1) for at least one timestep t between the two input frames. The input frames are each warped (backward) by the optical flow warping units 105-0 and 105-1 according to the approximated intermediate bi-directional optical flows for each timestep to produce warped input frames Î.sub.0.fwdarw.t and Î.sub.1.fwdarw.t.”; and [0072], “Furthermore, intermediate frames may be predicted by frame interpolation systems 100, 150, and 200 for a plain video without requiring reference images or supervised training. Finally, the frame interpolation systems 100, 150, and 200 may be extended to predict multiple intermediate frames in parallel. In contrast, conventional frame interpolation techniques rely on recursion to compute more than one intermediate frame, and, therefore, can only predict one intermediate frame at a time”. Note that the repeated the same processes in software are not novel, Jiang teaches that multiple interpolated frames between two input frames (the previous frame and the current frame) are generated and inserted into the video frames with the recursive algorithms or the multiple timesteps from the first input frame to the second frame using the optical flow and motion vectors, thus, when the first interpolated frame is generated, the first optical flow and motion vector are generated and used to generate the first interpolated frame. When the repeated processes are used to generate the second or more interpolated frames, these repeated optical flow and motion vectors, etc., are mapped to the second interpolated optical flow and motion vectors);
create second interpolated motion vector data, at least in part, on motion vectors from a preceding frame or a following frame such that one of the first interpolated motion vector data and the second interpolated motion vector data are based on the preceding frame and the other of the first interpolated motion vector data and the second interpolated motion vector data are based on the following frame, the second interpolated motion vector data having a resolution reduced relative to the preceding frame or the following frame (See Luo: Figs. 1-2, and [0084], “In at least one embodiment, said processor or another processor generates, from upsampled current frame and from upsampled previous frame, a first plurality of frames and a second plurality of frames that have the same resolution as said upsampled previous and current frame and upsampled previous frame (e.g., 4K or 8K). In at least one embodiment. These frames of the first plurality of frames and second plurality of frames can be referred to as motion warped color frames (or high resolution (HR) motion warped color frames or otherwise) and these frames may have pixel values in an RGB or other color space. It should be noted that, despite this name of “motion warped,” one or more of these motion warped color frames may lack any motion warping, such as described in the next paragraph”. Note that the second motion warped color frames are mapped to the second interpolated motion vector data); and
use the first interpolated optical flow data, the second interpolated optical flow data, the first interpolated motion vector data, and the second interpolated motion vector data to determine at least one closest motion vector or a closest optical flow, or a combination thereof, for one or more pixels of an interpolated frame and in selectively gathering one or more color signal values for the one or more pixels in the interpolated frame from the preceding frame or the following frame, or a combination thereof, based, at least in part, on the determined at least one closest motion vector or at least one closest optical flow, or a combination thereof (See Jiang: Figs. 1A-F and 2A-D, and [0048], “The soft visibility maps are applied to the warped input frames before the warped input frames are linearly fused by the image prediction unit 155 to produce the intermediate frame for each timestep. Combining the temporal consistency and occlusion reasoning, enables computation of the interpolated frame a”; and [0072], “Furthermore, intermediate frames may be predicted by frame interpolation systems 100, 150, and 200 for a plain video without requiring reference images or supervised training. Finally, the frame interpolation systems 100, 150, and 200 may be extended to predict multiple intermediate frames in parallel. In contrast, conventional frame interpolation techniques rely on recursion to compute more than one intermediate frame, and, therefore, can only predict one intermediate frame at a time”. Note that Jing teaches that multiple interpolated frames between two input frames are generated, and when the second or more interpolated frames between the two input frames are generated, the same processes and techniques are used, and they are mapped to the current claim cited limitations).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GORDON G LIU whose telephone number is (571)270-0382. The examiner can normally be reached Monday - Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Devona E Faulk can be reached at 571-272-7515. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GORDON G LIU/Primary Examiner, Art Unit 2618