DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation - 35 USC § 101
The limitations “and for at least one of the multiple inference stages, processing image values of a predicted image frame computed in a previous inference stage to compute a prediction of image values of an image frame in the temporal sequence associated with the at least one of the multiple inference stages.” when considered in light of the rest of the limitations in the claim utilize historical intensity values to provide the practical application of improved temporal anti-aliasing.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 21, 23-34, 36, 38-40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shacklett et al. (US 2022/0108421)(Hereinafter referred to as Shacklett)
Regarding claim 21, Shacklett teaches A method of training a neural network (In at least one embodiment, an optical flow network can be pre-trained 304 in parallel an image reconstruction network that is also, but separately, pre-trained 306, although these pre-trainings do not need to be done in parallel in at least one embodiment. In at least one embodiment, these pre-trainings can be performed using respective loss functions with terms relevant to that type of
network, and each can have a specified convergence or target loss value to determine when each network has been successfully or adequately pre-trained. In at least one embodiment, after these networks are separately pre-trained, these networks can then be further trained 308 together in a co-training process. See paragraph [0058]), comprising:
executing a training iteration to update parameters of a neural network to predict image signal values of an image frame, wherein executing the training iteration (In at least one embodiment, an optical flow network can be pre-trained 304 in parallel an image reconstruction network that is also, but separately, pre-trained 306, although these pre-trainings do not need to be done in parallel in at least one embodiment. In at least one embodiment, these pre-trainings can be performed using respective loss functions with terms relevant to that type of network, and each can have a specified convergence or target loss value to determine when each network has been successfully or adequately pre-trained. In at least one embodiment, after these networks are separately pre-trained, these networks can then be further trained 308 together in a co-training process. See paragraph [0058]) comprises:
executing the neural network for multiple inferences stages, each inference stage to process image signal values of an associated input image frame in a temporal sequence of image frames to provide a prediction of image values of an associated predicted image frame in the temporal sequence (In at least one embodiment, these networks (which can be separate networks or part of a single, fused network) can be trained 404 using this shared loss function. In at least one embodiment, an image can then be generated 406 using these networks, where that image can be an anti-aliased image reconstructed from a current image and at least one prior image in an image sequence. See paragraph [0060]) (In at least one embodiment, this is represented by a'+' operator in this figure. In at least one embodiment, there will be several residuals predicted, up to a final decoder stage where a final delta is generated for each pixel, or a final refined output obtained for these stages, which account for final details of image reconstruction. In at least one embodiment, these predictions can be refined over multiple stages, and residuals predicted one after another as part of a progressive refinement process. In at least one embodiment, such a process can be used for optical flow as well. See paragraph [0056])(In at least one embodiment, real-time, temporal reconstruction of an image utilizes information from a prior frame after some warping to align to an image being generated for a current frame. See paragraph [0047]); and
for at least one of the multiple inference stages, processing image values of a predicted image frame computed in a previous inference stage to compute a prediction of image values of an image frame in the temporal sequence associated with the at least one of the multiple inference stages (In at least one embodiment, an optical flow network or image reconstruction network can be an autoencoder network, which can include both encoder and decoder portions. In at least one embodiment, there may be one or more steps within this autoencoder ( e.g., in one or more decoder stages) where an image is first upsampled, and the re-upsampled. In at least one embodiment, this is illustrated by configuration 260 of FIG. 2C. In at least one embodiment, this network will then operate at different scales. In at least one embodiment, there may be multiple decoder stages 264 where this network will output a partial position, such as a first rough position. In at least one embodiment, at a next decoder stage a predictor 262 will output a refined position, which can represent a delta over a previous position. In at least one embodiment, this is represented by a'+' operator in this figure. In at least one embodiment, there will be several residuals predicted, up to a final decoder stage where a final delta is generated for each pixel, or a final refined output obtained for these stages, which account for final details of image reconstruction. In at least one embodiment, these predictions can be refined over multiple stages, and residuals predicted one after another as part of a progressive refinement process. In at least one embodiment, such a process can be used for optical flow as well. See paragraph [0056])( In at least one embodiment, an attempt can be made to remove significant jitter and noise from output of at least a warp or optical flow network. In at least one embodiment, this can be addressed by including one or more terms in a loss function for this optical flow network that minimize error in both spatial and temporal gradients generated by this network. In at least one embodiment, this helps to minimize loss both in space and time. In at least one embodiment, a loss function used for co-training can then include these terms when optimizing warp and image reconstruction networks. See paragraph [0057])(See figure 2C), but is silent to image signal intensity. However, by predicting an image signal of an image as taught by Shacklett the intensity is included as it is part of the image, therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to predict and isolate particular elements such that the system could correct for artifact issues while upscaling the image.
Regarding claim 23, Shacklett teaches The method of claim 21, wherein the image frame in the temporal sequence associated with the at least one of the multiple inference stages comprises a sparse image frame, wherein the sparse image frame is at a resolution lower than a resolution of the associated predicted image frame (In at least one embodiment, this upscaled image 110 can be provided as input to an image reconstruction module 112 that can generate a high resolution, anti-aliased output image 116 using upscaled image 110 and previously generated image 122, as may be at least temporarily stored in a history buffer 120 or other such location. In at least one embodiment, this image reconstruction module 112 may include one or more neural networks 114 used as part of an image reconstruction process. See paragraph [0046], see figure 1).
Regarding claim 24, Shacklett teaches The method of claim 21, wherein the neural network is trained to implement a super sampling to increase image resolution (In at least one embodiment, this may include transmitting images of game content for a multiplayer game, where different client devices may display that content at different resolutions, including one or more super-resolutions. See paragraph [0062]).
Regarding claim 25, Shacklett teaches the method of claim 21, wherein the neural network is trained to remove aliased edges (In at least one embodiment, these networks (which can be separate networks or part of a single, fused network) can be trained 404 using this shared loss function. In at least one embodiment, an image can then be generated 406 using these networks, where that image can be an anti-aliased image reconstructed from a current image and at least one prior image in an image sequence. See paragraph [0060]).
Regarding claim 26, Shacklett teaches The method of claim 21, wherein the training iteration further comprises: computing a loss function based, at least in part, on a training set comprising at least a sparse image frame as an input to the neural network and a densely sampled image frame as a ground truth observation (In at least one embodiment, AI-assisted annotation 3110 may be used to aid in generating annotations corresponding to imaging data 3108 to be used as ground truth data for retraining or updating a machine learning model. See paragraph [0347])( In at least one embodiment, if a current rendered image is lower resolution then this low-resolution rendered image 106 can then be processed using an upscaler 108 to generate an upscaled image 110 that represents content of low resolution rendered image 106 at a resolution that equals ( or is at least closer to) a target output resolution. See paragraph [0045])( In at least one embodiment, this upscaled image 110 can be provided as input to an image reconstruction module 112 that can generate a high resolution, anti-aliased output image 116 using upscaled image 110 and previously generated image 122, as may be at least temporarily stored in a history buffer 120 or other such location. In at least one embodiment, this image reconstruction module 112 may include one or more neural networks 114 used as part of an image reconstruction process. In at least one embodiment, this may include at least a first optical flow network (OFN) for generating motion vectors or other information indicative or movement between adjacent frames in a sequence. In at least one embodiment, this can include an externally recurrent, pre-image reconstruction, unsupervised optical flow network. In at least one embodiment, this may also include at least a first image reconstruction network (RN) to utilize these motion vectors in order to correlate positions in a current image and a previous image and infer an output image from a blending of those images. In at least one embodiment, this blending of a current image with a prior ( or historical) image of a sequence can help with temporal convergence to a nice, sharp, high-resolution output image 116, which can then be provided for presentation via a display 118 or other such presentation mechanism. In at least one embodiment, a copy of this high resolution output image 116 can be stored to history buffer 120, or another such storage location, for blending with a subsequently-generated image in this sequence. See paragraph [0046])( In at least one embodiment, this can include using a single, combined loss function that includes terms for both optical flow or warp, as well as image reconstruction. In at least one embodiment, these networks can be trained together using this common loss function to determine network parameters for both network that provide optimal performance, or at least minimize
loss for this common loss function. In at least one embodiment, a learning rate can be controlled in order to ensure proper convergence. See paragraph [0050]);
and updating weights of the neural network based, at least in part, an application of backpropagation to the computed loss function (In at least one embodiment, neurons 2202 in second layer 2212 may fan out to neurons 2202 in multiple other layers, including to neurons 2202 in (same) second layer 2212. In at least one embodiment, second layer 2212 may be referred to as a "recurrent layer." See paragraph [0244]).
Regarding claim 27, Shacklett teaches the method of claim 26, wherein the densely sampled image frame and the sparse image frame are derived from a same rendered image frame (See figure 1, Low resolution rendered image fed to upscaler into image reconstruction for high resolution output image).
Regarding claim 28, Shacklett teaches the method of claim 27, wherein the sparse image frame is derived as a jitter encoding of a sparse sampling of the same rendered image frame (In at least one embodiment, an attempt can be made to remove significant jitter and noise from output of at least a warp or optical flow network. In at least one embodiment, this can be addressed by including one or more terms in a loss function for this optical flow network that minimize error in both spatial and temporal gradients generated by this network. In at least one embodiment, this helps to minimize loss both in space and time. In at least one embodiment, a loss function used for co-training can then include these terms when optimizing warp and image reconstruction networks. See paragraph [0057]).
Regarding claim 29, Shacklett teaches The method of claim 28, the jitter encoding of the sparse sampling comprises an encoding of one or more non-zero image signal intensity values in the sparse sampling according to pixel locations and magnitudes (In at least one embodiment, however,
proper image warping utilizes not only information from a prior frame, but also additional information about how objects move between these frames. In at least one embodiment, this can include computer vision or optical flow data, which may be represented by a set of motion vectors. In at least one embodiment, this may include motion vectors for each pixel location, or at least pixel locations for which there is movement. See paragraph [0047])( In at least one embodiment, U corresponds to a 2D vector for each pixel in a current view or frame, with both a magnitude and a direction (in two dimensions for a 2D image). See paragraph [0051]).
Regarding claim 30, Shacklett teaches A computing device (FIG. 8 is a block diagram illustrating an exemplary computer system, which may be a system with interconnected devices and components, a system-on-a-chip (SOC) or some combination thereof 800 formed with a processor that may include execution units to execute an instruction, according to at least one embodiment. In at least one embodiment, computer system 800 may include, without limitation, a component, such as a processor 802 to employ execution units including logic to perform algorithms for process data, in accordance with present disclosure, such as in embodiment described herein. In at least one embodiment, computer system 800 may include processors, such as PENTIUM® Processor family, Xeon™, Itanium®, XScale™ and/or StrongARMTM, Intel® Core™, or Intel® Nervana™ microprocessors available from Intel Corporation of Santa Clara, Calif., although other systems (including PCs having other microprocessors, engineering workstations, set-top boxes and like) may also be used. In at least one embodiment, computer system 800 may execute a version of WINDOWS' operating system available from Microsoft Corporation of Redmond, Wash., although other operating systems (UNIX and Linux for example), embedded software, and/or graphical user interfaces, may also be used. See paragraph [0085])(In at least one embodiment, computer system 800 may include, without limitation, processor 802 that may include, without limitation, one or more execution units 808 to perform machine learning model training and/or inferencing according to techniques described herein. In at least one embodiment, computer system 800 is a single processor desktop or server system, but in another embodiment computer system 800 may be a multiprocessor system. In at least one embodiment, processor 802 may include, without limitation, a complex instruction set computer ("CISC") microprocessor, a reduced instruction set computing ("RISC") microprocessor, a very long instruction word ("VLIW") microprocessor, a processor implementing a combination of instruction sets, or any other processor device, such as a digital signal processor, for example. In at least one embodiment, processor 802 may be coupled to a processor bus 810 that may transmit data signals between processor 802 and other components in computer system 800. See paragraph [0087]), comprising:
a memory device (In at least one embodiment, processor 802 may include, without limitation, a Level 1 ("Ll") internal cache memory ("cache") 804. In at least one embodiment, processor 802 may have a single internal cache or multiple levels of internal cache. In at least one embodiment, cache memory may reside external to processor 802. See paragraph [0088]); and
one or more processors couple to the memory device (In at least one embodiment, processor 802 may include, without limitation, a Level 1 ("Ll") internal cache memory ("cache") 804. In at least one embodiment, processor 802 may have a single internal cache or multiple levels of internal cache. In at least one embodiment, cache memory may reside external to processor 802. See paragraph [0088]) to:
execute a training iteration to update parameters of a neural network to predict image signal values of an image frame, wherein execution of the training iteration (In at least one embodiment, an optical flow network can be pre-trained 304 in parallel an image reconstruction network that is also, but separately, pre-trained 306, although these pre-trainings do not need to be done in parallel in at least one embodiment. In at least one embodiment, these pre-trainings can be performed using respective loss functions with terms relevant to that type of network, and each can have a specified convergence or target loss value to determine when each network has been successfully or adequately pre-trained. In at least one embodiment, after these networks are separately pre-trained, these networks can then be further trained 308 together in a co-training process. See paragraph [0058]) comprises: execution of the neural network for multiple inferences stages, each inference stage to process image signal values of an associated input image frame in a temporal sequence of image frames to provide a prediction of image values of an associated predicted image frame in the temporal sequence (In at least one embodiment, these networks (which can be separate networks or part of a single, fused network) can be trained 404 using this shared loss function. In at least one embodiment, an image can then be generated 406 using these networks, where that image can be an anti-aliased image reconstructed from a current image and at least one prior image in an image sequence. See paragraph [0060]) (In at least one embodiment, this is represented by a'+' operator in this figure. In at least one embodiment, there will be several residuals predicted, up to a final decoder stage where a final delta is generated for each pixel, or a final refined output obtained for these stages, which account for final details of image reconstruction. In at least one embodiment, these predictions can be refined over multiple stages, and residuals predicted one after another as part of a progressive refinement process. In at least one embodiment, such a process can be used for optical flow as well. See paragraph [0056])(In at least one embodiment, real-time, temporal reconstruction of an image utilizes information from a prior frame after some warping to align to an image being generated for a current frame. See paragraph [0047]); and
for at least one of the multiple inference stages, process image values of a predicted image frame computed in a previous inference stage to compute a prediction of image values of an image frame in the temporal sequence associated with the at least one of the multiple inference stages (In at least one embodiment, an optical flow network or image reconstruction network can be an autoencoder network, which can include both encoder and decoder portions. In at least one embodiment, there may be one or more steps within this autoencoder ( e.g., in one or more decoder stages) where an image is first upsampled, and the re-upsampled. In at least one embodiment, this is illustrated by configuration 260 of FIG. 2C. In at least one embodiment, this network will then operate at different scales. In at least one embodiment, there may be multiple decoder stages 264 where this network will output a partial position, such as a first rough position. In at least one embodiment, at a next decoder stage a predictor 262 will output a refined position, which can represent a delta over a previous position. In at least one embodiment, this is represented by a'+' operator in this figure. In at least one embodiment, there will be several residuals predicted, up to a final decoder stage where a final delta is generated for each pixel, or a final refined output obtained for these stages, which account for final details of image reconstruction. In at least one embodiment, these predictions can be refined over multiple stages, and residuals predicted one after another as part of a progressive refinement process. In at least one embodiment, such a process can be used for optical flow as well. See paragraph [0056])( In at least one embodiment, an attempt can be made to remove significant jitter and noise from output of at least a warp or optical flow network. In at least one embodiment, this can be addressed by including one or more terms in a loss function for this optical flow network that minimize error in both spatial and temporal gradients generated by this network. In at least one embodiment, this helps to minimize loss both in space and time. In at least one embodiment, a loss function used for co-training can then include these terms when optimizing warp and image reconstruction networks. See paragraph [0057])(See figure 2C), but is silent to image signal intensity. However, by predicting an image signal of an image as taught by Shacklett the intensity is included as it is part of the image, therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to predict and isolate particular elements such that the system could correct for artifact issues while upscaling the image.
Regarding claim 31, Shacklett teaches The computing device of claim 30, wherein the training iteration further comprises: computation of a loss function based, at least in part, on a training set comprising at least a sparse image as an input to the neural network and a densely sampled image frame as a ground truth observation (In at least one embodiment, AI-assisted annotation 3110 may be used to aid in generating annotations corresponding to imaging data 3108 to be used as ground truth data for retraining or updating a machine learning model. See paragraph [0347])( In at least one embodiment, if a current rendered image is lower resolution then this low-resolution rendered image 106 can then be processed using an upscaler 108 to generate an upscaled image 110 that represents content of low resolution rendered image 106 at a resolution that equals ( or is at least closer to) a target output resolution. See paragraph [0045])( In at least one embodiment, this upscaled image 110 can be provided as input to an image reconstruction module 112 that can generate a high resolution, anti-aliased output image 116 using upscaled image 110 and previously generated image 122, as may be at least temporarily stored in a history buffer 120 or other such location. In at least one embodiment, this image reconstruction module 112 may include one or more neural networks 114 used as part of an image reconstruction process. In at least one embodiment, this may include at least a first optical flow network (OFN) for generating motion vectors or other information indicative or movement between adjacent frames in a sequence. In at least one embodiment, this can include an externally recurrent, pre-image reconstruction, unsupervised optical flow network. In at least one embodiment, this may also include at least a first image reconstruction network (RN) to utilize these motion vectors in order to correlate positions in a current image and a previous image and infer an output image from a blending of those images. In at least one embodiment, this blending of a current image with a prior ( or historical) image of a sequence can help with temporal convergence to a nice, sharp, high-resolution output image 116, which can then be provided for presentation via a display 118 or other such presentation mechanism. In at least one embodiment, a copy of this high resolution output image 116 can be stored to history buffer 120, or another such storage location, for blending with a subsequently-generated image in this sequence. See paragraph [0046])( In at least one embodiment, this can include using a single, combined loss function that includes terms for both optical flow or warp, as well as image reconstruction. In at least one embodiment, these networks can be trained together using this common loss function to determine network parameters for both network that provide optimal performance, or at least minimize
loss for this common loss function. In at least one embodiment, a learning rate can be controlled in order to ensure proper convergence. See paragraph [0050]);
and update weights of the neural network based, at least in part, an application of backpropagation to the computed loss function (In at least one embodiment, neurons 2202 in second layer 2212 may fan out to neurons 2202 in multiple other layers, including to neurons 2202 in (same) second layer 2212. In at least one embodiment, second layer 2212 may be referred to as a "recurrent layer." See paragraph [0244]).
Regarding claim 32, Shacklett teaches The computing device of claim 31, wherein the densely sampled image frame and the sparse image frame are derived from a same rendered image frame (See figure 1, Low resolution rendered image fed to upscaler into image reconstruction for high resolution output image).
Regarding claim 33, Shacklett teaches The computing device of claim 32, wherein the sparse image frame is derived as a jitter encoding of a sparse sampling of the same rendered image frame (In at least one embodiment, an attempt can be made to remove significant jitter and noise from output of at least a warp or optical flow network. In at least one embodiment, this can be addressed by including one or more terms in a loss function for this optical flow network that minimize error in both spatial and temporal gradients generated by this network. In at least one embodiment, this helps to minimize loss both in space and time. In at least one embodiment, a loss function used for co-training can then include these terms when optimizing warp and image reconstruction networks. See paragraph [0057]).
Regarding claim 34, Shacklett teaches The computing device of claim 33, the jitter encoding of the sparse sampling comprises an encoding of one or more non-zero image signal intensity values in the sparse sampling according to pixel locations and magnitudes (In at least one embodiment, however,
proper image warping utilizes not only information from a prior frame, but also additional information about how objects move between these frames. In at least one embodiment, this can include computer vision or optical flow data, which may be represented by a set of motion vectors. In at least one embodiment, this may include motion vectors for each pixel location, or at least pixel locations for which there is movement. See paragraph [0047])( In at least one embodiment, U corresponds to a 2D vector for each pixel in a current view or frame, with both a magnitude and a direction (in two dimensions for a 2D image). See paragraph [0051]).
Regarding claim 36, Shacklett teaches An article (Operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. In at least one embodiment, a process such as those processes described herein ( or variations and/or combinations thereof) is performed under control of one or more computer systems configured with executable instructions and is implemented as code (e.g., executable instructions, one or more computer programs or one or more applications) executing collectively on one or more processors, by hardware or combinations thereof. In at least one embodiment, code is stored on a computer-readable storage medium, for example, in form of a computer program comprising a plurality of instructions executable by one or more processors. See paragraph [0394]), comprising:
a non-transitory storage medium having stored thereon computer-readable instructions that are executable by one or more processors of a computing device (Operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. In at least one embodiment, a process such as those processes described herein ( or variations and/or combinations thereof) is performed under control of one or more computer systems configured with executable instructions and is implemented as code (e.g., executable instructions, one or more computer programs or one or more applications) executing collectively on one or more processors, by hardware or combinations thereof. In at least one embodiment, code is stored on a computer-readable storage medium, for example, in form of a computer program comprising a plurality of instructions executable by one or more processors. See paragraph [0394]) to:
execute a training iteration to update parameters of a neural network to predict image signal intensity values of an image frame, wherein execution of the training iteration (In at least one embodiment, an optical flow network can be pre-trained 304 in parallel an image reconstruction network that is also, but separately, pre-trained 306, although these pre-trainings do not need to be done in parallel in at least one embodiment. In at least one embodiment, these pre-trainings can be performed using respective loss functions with terms relevant to that type of network, and each can have a specified convergence or target loss value to determine when each network has been successfully or adequately pre-trained. In at least one embodiment, after these networks are separately pre-trained, these networks can then be further trained 308 together in a co-training process. See paragraph [0058]) comprises:
execution of the neural network for multiple inferences stages, each inference stage to process image signal intensity values of an associated input image frame in a temporal sequence of image frames to provide a prediction of image values of an associated predicted image frame in the temporal sequence (In at least one embodiment, these networks (which can be separate networks or part of a single, fused network) can be trained 404 using this shared loss function. In at least one embodiment, an image can then be generated 406 using these networks, where that image can be an anti-aliased image reconstructed from a current image and at least one prior image in an image sequence. See paragraph [0060]) (In at least one embodiment, this is represented by a'+' operator in this figure. In at least one embodiment, there will be several residuals predicted, up to a final decoder stage where a final delta is generated for each pixel, or a final refined output obtained for these stages, which account for final details of image reconstruction. In at least one embodiment, these predictions can be refined over multiple stages, and residuals predicted one after another as part of a progressive refinement process. In at least one embodiment, such a process can be used for optical flow as well. See paragraph [0056])(In at least one embodiment, real-time, temporal reconstruction of an image utilizes information from a prior frame after some warping to align to an image being generated for a current frame. See paragraph [0047]); and
for at least one of the multiple inference stages, process image values of a predicted image frame computed in a previous inference stage to compute a prediction of image values of an image frame in the temporal sequence associated with the at least one of the multiple inference stages (In at least one embodiment, an optical flow network or image reconstruction network can be an autoencoder network, which can include both encoder and decoder portions. In at least one embodiment, there may be one or more steps within this autoencoder ( e.g., in one or more decoder stages) where an image is first upsampled, and the re-upsampled. In at least one embodiment, this is illustrated by configuration 260 of FIG. 2C. In at least one embodiment, this network will then operate at different scales. In at least one embodiment, there may be multiple decoder stages 264 where this network will output a partial position, such as a first rough position. In at least one embodiment, at a next decoder stage a predictor 262 will output a refined position, which can represent a delta over a previous position. In at least one embodiment, this is represented by a'+' operator in this figure. In at least one embodiment, there will be several residuals predicted, up to a final decoder stage where a final delta is generated for each pixel, or a final refined output obtained for these stages, which account for final details of image reconstruction. In at least one embodiment, these predictions can be refined over multiple stages, and residuals predicted one after another as part of a progressive refinement process. In at least one embodiment, such a process can be used for optical flow as well. See paragraph [0056])( In at least one embodiment, an attempt can be made to remove significant jitter and noise from output of at least a warp or optical flow network. In at least one embodiment, this can be addressed by including one or more terms in a loss function for this optical flow network that minimize error in both spatial and temporal gradients generated by this network. In at least one embodiment, this helps to minimize loss both in space and time. In at least one embodiment, a loss function used for co-training can then include these terms when optimizing warp and image reconstruction networks. See paragraph [0057])(See figure 2C), but is silent to image signal intensity. However, by predicting an image signal of an image as taught by Shacklett the intensity is included as it is part of the image, therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to predict and isolate particular elements such that the system could correct for artifact issues while upscaling the image.
Regarding claim 38, Shacklett teaches The article of claim 36, wherein the frame in the temporal sequence of image frames associated with the at least one of the multiple inference stages comprises a sparse image frame, wherein the sparse image frame is at a resolution lower than a resolution of the associated predicted image frame (In at least one embodiment, this upscaled image 110 can be provided as input to an image reconstruction module 112 that can generate a high resolution, anti-aliased output image 116 using upscaled image 110 and previously generated image 122, as may be at least temporarily stored in a history buffer 120 or other such location. In at least one embodiment, this image reconstruction module 112 may include one or more neural networks 114 used as part of an image reconstruction process. See paragraph [0046], see figure 1).
Regarding claim 39, Shacklett teaches The article of claim 36, wherein the neural network is trained to implement a super sampling to increase image resolution (In at least one embodiment, this may include transmitting images of game content for a multiplayer game, where different client devices may display that content at different resolutions, including one or more super-resolutions. See paragraph [0062]).
Regarding claim 40, Shacklett teaches The article of claim 36, wherein the neural network is trained to remove aliased edges (In at least one embodiment, these networks (which can be separate networks or part of a single, fused network) can be trained 404 using this shared loss function. In at least one embodiment, an image can then be generated 406 using these networks, where that image can be an anti-aliased image reconstructed from a current image and at least one prior image in an image sequence. See paragraph [0060]).
Allowable Subject Matter
Claims 22, 35, 37 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: The prior art of record alone or in combination is silent to the limitations “wherein the prediction of image values comprises a prediction of an array of accumulations of image signal intensity values of the image frame in the temporal sequence associated with the at least one of the multiple inference stages.” of claim 22 when read in light of the rest of the limitations in claim 22 and the claims to which claim 22 depends and thus claim 22 contains allowable subject matter.
The prior art of record alone or in combination is silent to the limitations “wherein the prediction of image values comprises a prediction of an array of accumulations of image signal intensity values of the image frame in the temporal sequence associated with the at least one of the multiple inference stages. ” of claim 35 when read in light of the rest of the limitations in claim 35 and the claims to which claim 35 depends and thus claim 35 contains allowable subject matter.
The prior art of record alone or in combination is silent to the limitations “wherein the prediction of image values comprises a prediction of an array of accumulations of image signal intensity values of the image frame in the temporal sequence associated with the at least one of the multiple inference stages. ” of claim 37 when read in light of the rest of the limitations in claim 37 and the claims to which claim 37 depends and thus claim 37 contains allowable subject matter.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS R WILSON whose telephone number is (571)272-0936. The examiner can normally be reached M-F 7:30-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (572)-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NICHOLAS R WILSON/Primary Examiner, Art Unit 2611