Last updated: April 19, 2026
Application No. 18/680,438
MACHINE LEARNING REFINEMENT NETWORKS FOR VIDEO POST-PROCESSING SCENARIOS

Final Rejection §103
Filed
May 31, 2024
Examiner
DANG, PHILIP
Art Unit
2488
Tech Center
2400 — Computer Networks
Assignee
Microsoft Technology Licensing, LLC
OA Round
2 (Final)
Interview Optional

— +33.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 470 resolved cases, 2023–2026
Examiner Intelligence

DANG, PHILIP View full profile →
Grants 77% — above average
Career Allow Rate
363 granted / 470 resolved
+19.2% vs TC avg
Strong +33% interview lift
Without
With
+33.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
49 currently pending
Career history
519
Total Applications
across all art units
Statute-Specific Performance

§101
4.5%
-35.5% vs TC avg
§103
48.6%
+8.6% vs TC avg
§102
11.1%
-28.9% vs TC avg
§112
25.5%
-14.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 470 resolved cases
Office Action

§103
DETAILED ACTIONNotice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Applicant Response to Official Action
The response filed on 11/4/2025 has been entered and made of record.
Acknowledgment 
Claims 1, 6, 10 and 19-20, amended on 11/4/2025, are acknowledged by the examiner.  
Response to Arguments
Applicant’s arguments with respect to claims 1, 19, 20, and their dependent claims have been considered but they are moot in view of the new grounds of rejection necessitated by amendments initiated by the applicant.  Examiner addresses the main arguments of the Applicant as below.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under pre-AIA  35 U.S.C. 103(a) are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
           This application currently names joint inventors. In considering patentability of the claims under pre-AIA  35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA  35 U.S.C. 103(c) and potential pre-AIA  35 U.S.C. 102(e), (f) or (g) prior art under pre-AIA  35 U.S.C. 103(a).

Claims 1,  3-6, 9-10, 13-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chou (US Patent 10,979,718 B2), (“Chou”),  in view of Vangala et al. (US Patent 9,298,453 B2), (“Vangala”), in view of Rozendaal et al. (US Patent 12,501,050 B2), (“Rozendaal”).
Regarding claim 1, Chou meets the claim limitations as follows:
A client computer system (machine learning video processing systems) [Chou: Abstract] comprising a processor system (a processor core complex) [Chou: col. 9, line 4; Fig. 1] and memory (local memory 20, a main memory storage device 22) [Chou: col. 9, line 20-21; Fig. 1], wherein the client computer system is configured to perform operations comprising (the local memory 20 and/or the main memory storage device 22 may be tangible, non-transitory, computer-readable media that store instructions executable by the processor core complex 18 and/or data to be processed by the processor core  complex 18) [Chou: col. 9, line 18-23]: receiving encoded data for a current unit of video ((receive encoded image data) [Chou: col. 9, line 49] ; (the processor core complex 18 may retrieve
encoded image data from memory) [Chou: col. 10, line 31-32]); decoding the encoded data ((decode the encoded image data) [Chou: col. 10, line 31-32]; (the electronic device may decode encoded image data and instruct the electronic display to adjust luminance of its display pixels based on the decoded image data) [Chou: col. 1, line 35-38]), thereby producing a decoded current unit (reconstructed image data)) [Chou: col. 12, line 18];
retrieving a given previous unit (Additionally, the machine learning block 34 may receive input image data 70 (process block 152). In some embodiments, the input image data 70 may be reconstructed image data previously determined by the reconstruction block 52. Thus, in such embodiments, the machine learning block 34 55 may receive the input image data 70 from an internal frame buffer or the video encoding pipeline 32. Additionally or alternatively, the input image data 70 may be source image data 36, encoded image data 38, and/or decoded image data 126. Thus, in some embodiments, the machine learning block 34 may receive the input image data 70 from an image data source, such as the processor core complex 18, the main memory storage device 22, the local memory 20, and/or the image sensor 13) [Chou: col. 26, line 50-63];warping the given previous unit to spatially align sample values of the given previous unit with locations in the decoded current unit, thereby producing a given warped previous unit;providing the given warped previous unit to a machine learning ("ML") refinement network;  with a machine learning (“ML”) refinement network (a machine learning block may be
implemented (e.g., in a video encoding pipeline and/or a video decoding pipeline) to leverage machine learning techniques, such as convolutional neural network (CNN)) [Chou: col. 2, line 41-45], refining the decoded current unit ((a post-processing block (e.g., after a video decoding pipeline). When implemented as a post-processing block, in some embodiments, video decoding may be performed relatively independent of the filtering process, for example, such that results of the filtering process are not used in the video decoding pipeline. To facilitate further improving encoding efficiency, the transcoder block 56 may entropy encode the prediction residual, the encoding parameters, and/or the filter parameters 74 based at least in part on transcoder parameters 76) [Chou: col. 13, line 44-53] – Note: Post-processing such as post-filtering is a refine technique for removing blocking artifacts) to mitigate compression artifacts (to facilitate improving video quality, decoded image data may be filtered before being used to display an image. In some embodiments, the filter block 54 may determine filter parameters 74 expected to reduce likelihood of perceivable visual artifacts when applied to decoded image data. For example, the filter block 54 may determine filter parameters 74 expected to reduce likelihood of producing perceivable block artifacts. Additionally or alternatively, the filter block 54 may determine filter parameters 74 expected to reduce likelihood of producing other types of perceivable visual artifacts, such as color fringing artifacts and/or ringing artifacts. In some embodiments, filter parameters 74 expected to reduce likelihood of producing different types of perceivable visual artifacts may be determined
in parallel) [Chou: col. 13, line 16-30], thereby producing a refined current unit ((Additionally or alternatively, the machine learning block 34 may be trained such that, when the machine learning block 34 analyzes input image data 70 (e.g., reconstructed image data), the filter parameters 7 4 determined based on resulting feature metrics 72 are expected to reduce likelihood of displaying perceivable visual artifacts) [Chou: col. 17, line 57-63] – Note: Reducing likelihood of displaying perceivable visual artifacts is a refine techniques in video processing) (In other words, utilizing the machine learning block, the video encoding pipeline may determine encoding parameters in a content dependent manner, which at least in some instances may facilitate improving the degree of matching between the prediction sample and the input image data (e.g., a coding group). As described above, in some embodiments, improving the matching degree may facilitate improving encoding efficiency, for example, by reducing number of bits included in encoded image data to indicate a prediction residual. Moreover, in some embodiments,) [Chou: col. 7, line 58-67]), wherein, as part of a temporal feedback loop, the refining the decoded current unit is based at least in part on the given warped previous unit ((a post-processing block (e.g., after a video decoding pipeline). When implemented as a post-processing block, in some embodiments, video decoding may be performed relatively independent of the filtering process, for example, such that results of the filtering process are not used in the video decoding pipeline. To facilitate further improving encoding efficiency, the transcoder block 56 may entropy encode the prediction residual, the encoding parameters, and/or the filter parameters 74 based at least in part on transcoder parameters 76) [Chou: col. 13, line 44-53] – Note: Post-processing such as post-filtering is a refine technique for removing blocking artifacts).
In the same field of endeavor, Vangala further discloses the claim limitation as follows:
refining the decoded current unit to mitigate compression artifacts ((The source code analytics platform may execute a program analysis module 510 to refine the node artifacts and a machine learning module 512 to correlate the node artifacts) [Vangala: col. 6, line 13-16]; (executing a program analysis module to perform at least one of a code analysis of the source code set and a metadata analysis of the code production data set to refine the node artifact) [Vangala: col. 10, line 10-13]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou with Vangala to program the system to implement the Vangala’s method.  
Therefore, the combination of Chou with Vangala will enable the source code analytics platform of the system to refine the node artifacts and a machine learning module to correlate the node artifacts [Vangala: col. 6, line 13-16].
Chou and Vangala do not explicitly disclose the warping operation using the ML neural network.
warping the given previous unit to spatially align sample values of the given previous unit with locations in the decoded current unit, thereby producing a given warped previous unit;providing the given warped previous unit to a machine learning ("ML") refinement network; and 
wherein, as part of a temporal feedback loop, the refining the decoded current unit is based at least in part on the given warped previous unit.
However, in the same field of endeavor Rozendaal further discloses the claim limitations and the deficient claim limitations as follows:
retrieving a given previous unit ((i.e. Video decoder may use reconstructed frame 510 as input to a warping function (warp) 512 for use with a future decoded P-frame) [Rozendaal: col. 15, line 44-46]; (i.e. Video encoder 200 or video decoder 300 may include FINT kernel 432, which may be implemented in NSP(s) 430, to perform block-based warping 612, which may be overlapped block-based warping, of reconstructed previous frame Xt-1 using the motion vector ft) [Rozendaal: col. 16, line 42-47]);
warping the given previous unit to spatially align sample values of the given previous unit with locations in the decoded current unit, thereby producing a given warped previous unit (i.e. warp the previous reconstructed video data with an overlapped block-based warp function using the block-based motion vector to generate predicted
current video data; and sum the predicted current video data with a residual block to generate the current reconstructed video data) [Rozendaal: col. 15, line 11-16; Figs. 3, 17];providing the given warped previous unit to a machine learning ("ML") refinement network ((i.e. This disclosure describes techniques for encoding and decoding media data (e.g., images or videos) using neural network-based media coding techniques. In particular, this disclosure describes techniques for using warping to decode encoded media data. In particular, this disclosure describes  techniques for neural-network-based media coding that uses block-based warping. Example techniques of this disclosure include a 1080p YUV420 architecture, predictive modeling to improve compression performance, quantization-aware training, parallel entropy coding (for example, on a GPU),
and/or pipelined inferencing. The techniques of this disclosure may improve the performance of a neural-network based media coder. Such an improved neural-network-based media coder may be utilized in a battery powered device such as a mobile device (e.g., a smartphone)) [Rozendaal: col. 4, line 12-24]; (i.e. Many neural video coders assume availability of pixel-based or feature-based warping operations) [Rozendaal: col. 5, line 8-10]).
wherein, as part of a temporal feedback loop (i.e. Feedback Recurrent Autoencoder for Video Compression") [Rozendaal: Other Publications], the refining the decoded current unit is based at least in part on the given warped previous unit (i.e. Feedback Recurrent Autoencoder for Video Compression") [Rozendaal: Other Publications; Figs. 4-5, 16 – Note: Please see the feedback loop and the given warped previous unit is used for the refining the decoded current unit  in Figs. 4 and 16].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou and Vangala with Rozendaal to program the system to implement of Rozendaal’s method.  
Therefore, the combination of Chou and Vangala with Rozendaal will enable the system to improve the performance of a neural-network-based media coder [Rozendaal: col. 4, line 12-24].

Regarding claim 3, Chou meets the claim limitations as set forth in claim 1.Chou further meets the claim limitations as follow.
wherein the compression artifacts include blocking artifacts, blurring artifacts, banding artifacts, and/or ringing artifacts (To facilitate further improving video quality, decoded image data may be filtered before being used to display a corresponding image, for example, to reduce likelihood of the image being displayed with perceivable ringing, blocking, color fringing, and/or other visual artifacts) [Chou: col. 8, line 5-9].
  
Regarding claim 4, Chou meets the claim limitations as set forth in claim 1.Chou further meets the claim limitations as follow.
wherein the current unit of video is a frame, a slice, or a tile ((the electronic display 12 may present visual representations of information by display image frames) [Chou: col. 10, line 10-12]; (To facilitate encoding, in some embodiments, an image may be divided into one or more coding groups. As used herein, a "coding group" is intended to describe a sample (e.g., block) from an image that is encoded as a unit and, thus, may be a coding tree unit (CTU), a coding unit (CU), a macro block, a slice, a picture, a Group of Pictures (GOP), or the like. In this manner, the image may be encoded by successively encoding source image data corresponding to each coding group in the image [Chou: col. 5, line 30-38]).

 Regarding claim 5, Chou meets the claim limitations as set forth in claim 1.Chou further meets the claim limitations as follow.
storing, in a decoded video buffer (Once generated or received, the encoded
image data may be stored in local memory 20 and/or the main memory storage device 22) [Chou: col. 10, line 27-28], the decoded current unit for use in providing temporal feedback to the ML refinement network ((Accordingly, to display image frames, the processor core complex 18 may retrieve encoded image data from memory, decode the encoded image data, and instruct the electronic display 12 to display image frames based on the decoded image data) [Chou: col. 10, line 30-33]; (Additionally, the inter prediction block 48 may determine one or more candidate inter prediction modes (e.g., motion vector and reference index) and implement associated inter parameters 58 to determine corresponding prediction samples. In other words, the inter parameters 58 may indicate operational parameters that the inter prediction block 46 implements to determine a prediction sample based at least in part on image data ( e.g., reconstructed image data) corresponding with a different image. Furthermore, the mode decision block 50 may either select one or more prediction modes from the candidate prediction modes or a skip mode, for example, based at least in part on corresponding rate-distortion metrics. In some embodiments, the mode decision block 50 may determine a rate-distortion metric based at least in part on a weighting factor (e.g., A) indicated by mode decision parameters 61) [Chou: col. 12, line 41-56]).

Regarding claim 6, Chou meets the claim limitations as set forth in claim 1.Chou further meets the claim limitations as follow.
wherein the given previous unit is a given decoded previous unit retrieved from a decoded video buffer (Accordingly, to display image frames, the processor core complex 18 may retrieve encoded image data from memory, decode the encoded image data, and instruct the electronic display 12 to display image frames based on the decoded image data) [Chou: col. 10, line 30-33], andwherein the given warped previous unit is a given warped, decoded previous unit (Additionally, the machine learning block 34 may receive input image data 70 (process block 152). In some embodiments, the input image data 70 may be reconstructed image data previously determined by the reconstruction block 52. Thus, in such embodiments, the machine learning block 34 55 may receive the input image data 70 from an internal frame buffer or the video encoding pipeline 32. Additionally or alternatively, the input image data 70 may be source image data 36, encoded image data 38, and/or decoded image data 126. Thus, in some embodiments, the machine learning block 34 may receive the input image data 70 from an image data source, such as the processor core complex 18, the main memory storage device 22, the local memory 20, and/or the image sensor 13) [Chou: col. 26, line 50-63].
Chou and Vangala do not explicitly disclose the follow limitation.
wherein the given warped previous unit is a given warped.
However, in the same field of endeavor Rozendaal further discloses the claim limitations and the deficient claim limitations as follows:
wherein the given warped previous unit is a given warped (((i.e. Video decoder may use reconstructed frame 510 as input to a warping function (warp) 512 for use with a future decoded P-frame) [Rozendaal: col. 15, line 44-46]; (i.e. Video encoder 200 or video decoder 300 may include FINT kernel 432, which may be implemented in NSP(s) 430, to perform block-based warping 612, which may be overlapped block-based warping, of reconstructed previous frame Xt-1 using the motion vector ft) [Rozendaal: col. 16, line 42-47]; (i.e. warp the previous reconstructed video data with an overlapped block-based warp function using the block-based motion vector to generate predicted current video data; and sum the predicted current video data with a residual block to generate the current reconstructed video data) [Rozendaal: col. 15, line 11-16]; (i.e. Many neural video coders assume availability of pixel-based or feature-based warping operations) [Rozendaal: col. 5, line 8-10]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou and Vangala with Rozendaal to program the system to implement of Rozendaal’s method.  
Therefore, the combination of Chou and Vangala with Rozendaal will enable the system to improve the performance of a neural-network-based media coder [Rozendaal: col. 4, line 12-24].


Regarding claim 9, Chou meets the claim limitations as set forth in claim 1.Chou further meets the claim limitations as follow.
storing, in a buffer (Once generated or received, the encoded image data may be stored in local memory 20 and/or the main memory storage device 22) [Chou: col. 10, line 27-28], the refined current unit for use in providing temporal feedback to the ML refinement network ((Accordingly, to display image frames, the processor core complex 18 may retrieve encoded image data from memory, decode the encoded image data, and instruct the electronic display 12 to display image frames based on the decoded image data) [Chou: col. 10, line 30-33]; (Additionally, the inter prediction block 48 may determine one or more candidate inter prediction modes (e.g., motion vector and reference index) and implement associated inter parameters 58 to determine corresponding prediction samples. In other words, the inter parameters 58 may indicate operational parameters that the inter prediction block 46 implements to determine a prediction sample based at least in part on image data ( e.g., reconstructed image data) corresponding with a different image. Furthermore, the mode decision block 50 may either select one or more prediction modes from the candidate prediction modes or a skip mode, for example, based at least in part on corresponding rate-distortion metrics. In some embodiments, the mode decision block 50 may determine a rate-distortion metric based at least in part on a weighting factor (e.g., A) indicated by mode decision parameters 61) [Chou: col. 12, line 41-56]).

Regarding claim 10, Chou meets the claim limitations as set forth in claim 1.Chou further meets the claim limitations as follow.
wherein the given previous unit is a given refined previous unit retrieved from a buffer ((Accordingly, to display image frames, the processor core complex 18 may retrieve encoded image data from memory, decode the encoded image data, and instruct the electronic display 12 to display image frames based on the decoded image data) [Chou: col. 10, line 30-33]; (When the filter block 54 is implemented in the video encoding pipeline 32, the filter parameters 74 may be communicated along with the encoded image data 38. As will be described in more detail below, in some embodiments, the filter block 54 may additionally or alternatively be implemented on the decoding side, for example, in a video decoding pipeline. In other words, in some embodiments, a filter block 54 may be implemented in loop to enable results of the filtering process to be used in subsequent processing, for example, by the video encoding pipeline 32 or a video decoding pipeline) [Chou: col. 13, line 31-41]), andwherein the given warped previous unit is a given warped, refined previous unit (Additionally, the machine learning block 34 may receive input image data 70 (process block 152). In some embodiments, the input image data 70 may be reconstructed image data previously determined by the reconstruction block 52. Thus, in such embodiments, the machine learning block 34 55 may receive the input image data 70 from an internal frame buffer or the video encoding pipeline 32. Additionally or alternatively, the input image data 70 may be source image data 36, encoded image data 38, and/or decoded image data 126. Thus, in some embodiments, the machine learning block 34 may receive the input image data 70 from an image data source, such as the processor core complex 18, the main memory storage device 22, the local memory 20, and/or the image sensor 13) [Chou: col. 26, line 50-63] ; (When the filter block 54 is implemented in the video encoding pipeline 32, the filter parameters 74 may be communicated along with the encoded image data 38. As will be described in more detail below, in some embodiments, the filter block 54 may additionally or alternatively be implemented on the decoding side, for example, in a video decoding pipeline. In other words, in some embodiments, a filter block 54 may be implemented in loop to enable results of the filtering process to be used in subsequent processing, for example, by the video encoding pipeline 32 or a video decoding pipeline) [Chou: col. 13, line 31-41]).
Chou and Vangala do not explicitly disclose the follow limitation.
wherein the given warped previous unit is a given warped.
However, in the same field of endeavor Rozendaal further discloses the claim limitations and the deficient claim limitations as follows:
wherein the given warped previous unit is a given warped (((i.e. Video decoder may use reconstructed frame 510 as input to a warping function (warp) 512 for use with a future decoded P-frame) [Rozendaal: col. 15, line 44-46]; (i.e. Video encoder 200 or video decoder 300 may include FINT kernel 432, which may be implemented in NSP(s) 430, to perform block-based warping 612, which may be overlapped block-based warping, of reconstructed previous frame Xt-1 using the motion vector ft) [Rozendaal: col. 16, line 42-47]; (i.e. warp the previous reconstructed video data with an overlapped block-based warp function using the block-based motion vector to generate predicted current video data; and sum the predicted current video data with a residual block to generate the current reconstructed video data) [Rozendaal: col. 15, line 11-16]; (i.e. Many neural video coders assume availability of pixel-based or feature-based warping operations) [Rozendaal: col. 5, line 8-10]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou and Vangala with Rozendaal to program the system to implement of Rozendaal’s method.  
Therefore, the combination of Chou and Vangala with Rozendaal will enable the system to improve the performance of a neural-network-based media coder [Rozendaal: col. 4, line 12-24].

Regarding claim 13, Chou meets the claim limitations as set forth in claim 1.Chou further meets the claim limitations as follow.
for each of one or more subsequent units as the current unit, repeating the receiving, the decoding, and the refining (When the filter block 54 is implemented in the video encoding pipeline 32, the filter parameters 74 may be communicated along with the encoded image data 38. As will be described in more detail below, in some embodiments, the filter block 54 may additionally or alternatively be implemented on the decoding side, for example, in a video decoding pipeline. In other words, in some embodiments, a filter block 54 may be implemented in loop to enable results of the filtering process to be used in subsequent processing, for example, by the video encoding pipeline 32 or a video decoding pipeline.) [Chou: col. 13, line 31-41] – Note: In-loop filtering requires repeating the receiving, the decoding, the refining, the processing, and the outputting coding data).
In the same field of endeavor, Vangala further discloses the claim limitation as follows:
repeating the receiving, the decoding, and the refining ((A value object 204 may be a constant or a variable that may be passed to other functions or classes. The class object 202 may receive the value object 204 from the application that activated the class object 202, self generate the value object 204, or return the value object 204 to the activating application. The code line 206 may describe an operation performed on a value 204, execute a branching instruction, or describe a loop. A function 208 is a subroutine that may perform a series of operations on a value 204 passed to the function 208) [Vangala: col. 4, line 12-21]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou with Vangala to program the system to implement the Vangala’s method.  
Therefore, the combination of Chou with Vangala will enable the source code analytics platform of the system to refine the node artifacts and a machine learning module to correlate the node artifacts [Vangala: col. 6, line 13-16].

Regarding claim 14, Chou meets the claim limitations as set forth in claim 13.Chou further meets the claim limitations as follow.
wherein at least some operations for the decoding and the refining are performed ((a post-processing block (e.g., after a video decoding pipeline). When implemented as a post-processing block, in some embodiments, video decoding may be performed relatively independent of the filtering process, for example, such that results of the filtering process are not used in the video decoding pipeline. To facilitate further improving encoding efficiency, the transcoder block 56 may entropy encode the prediction residual, the encoding parameters, and/or the filter parameters 74 based at least in part on transcoder parameters 76) [Chou: col. 13, line 44-53] – Note: Post-processing such as post-filtering is a refine technique for removing blocking artifacts) in parallel for different units  (In some embodiments, the pipeline 142 may be included in a video decoding pipeline 124, for example, in the main decoding pipeline 130 or parallel to the main decoding pipeline 130) [Chou: col. 26, line 5-8]).

Regarding claim 15, Chou meets the claim limitations as set forth in claim 1.Chou further meets the claim limitations as follow.
wherein the current unit of video is a group of pictures or a sequence (To facilitate encoding, in some embodiments, an image may be divided into one or more coding groups. As used herein, a "coding group" is intended to describe a sample (e.g., block) from an image that is encoded as a unit and, thus, may be a coding tree unit (CTU), a coding unit (CU), a macro block, a slice, a picture, a Group of Pictures (GOP), or the like. In this manner, the image may be encoded by successively encoding source image data corresponding to each coding group in the image [Chou: col. 5, line 30-38]).

Regarding claim 16, Chou meets the claim limitations as set forth in claim 1.Chou further meets the claim limitations as follow.
wherein the decoding is performed using a video decoder (implemented in the main encoding or decoding pipeline) [Chou: col. 30, line 4-5; Fig. 1] for a codec standard or format (In some of the above embodiments, the machine learning techniques may be implemented to facilitate improving video processing, for example, by using identified feature metrics in the coding decision process for a codec, such as AVC, HEVC, or the like. For example, with regard to intra prediction, a machine learning block may select an intra prediction mode from among multiple candidate intra prediction modes provided by a codec. Similarly, with regard to inter prediction, the machine learning block may select an inter prediction mode from among multiple candidate intra prediction modes provided by the codec) [Chou: col. 29, line 38-48; Fig. 1], and wherein the ML refinement network has been trained for the codec standard or format (In fact, in some embodiments, a machine learning system ( e.g., block) may implement machine learning techniques to tone map content, for example, from a standard dynamic range (SDR) to a high dynamic range (HDR). As described above, machine learning techniques may be trained before deployment. Thus, to facilitate tone mapping content using machine learning techniques, the machine learning system may be trained based on SDR versions of previously graded content and corresponding HDR versions of the content) [Chou: col. 27, line 65 -  col. 28, line 6]).

Regarding claim 17, Chou meets the claim limitations as set forth in claim 1.Chou further meets the claim limitations as follow.
wherein the ML refinement network has been trained for a target level of quality and/or bitrate (To facilitate improving transcoding, it is desirable to transcode a bit stream using a lower bitrate, possible lower resolution and lower frame rate. By analyzing using machine learning techniques, characteristics of a bit stream may be determined and, for example, indicated in a metadata file and/or derived at the decoder stage of the transcoder.) [Chou: col. 29, line 8-27; Fig. 1].

Regarding claim 18, Chou meets the claim limitations as set forth in claim 1.Chou further meets the claim limitations as follow.
processing the refined current unit for display (To facilitate improving transcoding, it is desirable to transcode a bit stream using a lower bitrate, possible lower resolution and lower frame rate. By analyzing using machine learning techniques, characteristics of a bit stream may be determined and, for example, indicated in a metadata file and/or derived at the decoder stage of the transcoder. Given analysis of the bit stream and/or other relevant conditions (e.g., network information, display/ device capabilities, current power usage, resource information, applications running on the device, and/or expected user streaming behaving), a convolutional neural network may determine "coding control" parameters for encoding the video. These could include QP for the frame or regions, GOP structure, intra frames, lambda parameters for mode decision and motion estimation (e.g., if used by transcoder), downscaling ratio as well as filters for the downscaling. In some embodiments, a convolutional neural network may be implemented to facilitate performing motion compensated temporal filtering, which at least in some instances may further assist with compression through pre-filtering) [Chou: col. 29, line 8-27; Fig. 1], and 
outputting results of the processing the refined current unit (To help illustrate, an example of a video decoding pipeline 124 including a machine learning block 34, which may implement machine learning techniques, is shown in FIG. 12. In operation, the video decoding pipeline 124 may process encoded image data 38 and output decoded image data 126) [Chou: col. 23, line 36-41; Fig. 12] for display (To facilitate improving transcoding, it is desirable to transcode a bit stream using a lower bitrate, possible lower resolution and lower frame rate. By analyzing using machine learning techniques, characteristics of a bit stream may be determined and, for example, indicated in a metadata file and/or derived at the decoder stage of the transcoder. Given analysis of the bit stream and/or other relevant conditions (e.g., network information, display/ device capabilities, current power usage, resource information, applications running on the device, and/or expected user streaming behaving), a convolutional neural network may determine "coding control" parameters for encoding the video. These could include QP for the frame or regions, GOP structure, intra frames, lambda parameters for mode decision and motion estimation (e.g., if used by transcoder), downscaling ratio as well as filters for the downscaling. In some embodiments, a convolutional neural network may be implemented to facilitate performing motion compensated temporal filtering, which at least in some instances may further assist with compression through pre-filtering) [Chou: col. 29, line 8-27; Fig. 1].

Regarding claim 19, Chou meets the claim limitations as follows:
One or more computer-readable media (local memory 20, a main memory storage device 22) [Chou: col. 9, line 20-21; Fig. 1] having stored thereon computer-executable instructions for causing a processor system, when programmed thereby, to perform operations comprising (the local memory 20 and/or the main memory storage device 22 may be tangible, non-transitory, computer-readable media that store instructions executable by the processor core complex 18 and/or data to be processed by the processor core  complex 18) [Chou: col. 9, line 18-23]:receiving encoded data for a current unit of video ((receive encoded image data) [Chou: col. 9, line 49] ; (the processor core complex 18 may retrieve
encoded image data from memory) [Chou: col. 10, line 31-32]); decoding the encoded data ((decode the encoded image data) [Chou: col. 10, line 31-32]; (the electronic device may decode encoded image data and instruct the electronic display to adjust luminance of its display pixels based on the decoded image data) [Chou: col. 1, line 35-38]), thereby producing a decoded current unit (reconstructed image data)) [Chou: col. 12, line 18]; 
retrieving a given previous unit (Additionally, the machine learning block 34 may receive input image data 70 (process block 152). In some embodiments, the input image data 70 may be reconstructed image data previously determined by the reconstruction block 52. Thus, in such embodiments, the machine learning block 34 55 may receive the input image data 70 from an internal frame buffer or the video encoding pipeline 32. Additionally or alternatively, the input image data 70 may be source image data 36, encoded image data 38, and/or decoded image data 126. Thus, in some embodiments, the machine learning block 34 may receive the input image data 70 from an image data source, such as the processor core complex 18, the main memory storage device 22, the local memory 20, and/or the image sensor 13) [Chou: col. 26, line 50-63];warping the given previous unit to spatially align sample values of the given previous unit with locations in the decoded current unit, thereby producing a given warped previous unit;providing the given warped previous unit to a machine learning ("ML") refinement network;  with the ML refinement network (a machine learning block may be
implemented (e.g., in a video encoding pipeline and/or a video decoding pipeline) to leverage machine learning techniques, such as convolutional neural network (CNN)) [Chou: col. 2, line 41-45], refining the decoded current unit ((a post-processing block (e.g., after a video decoding pipeline). When implemented as a post-processing block, in some embodiments, video decoding may be performed relatively independent of the filtering process, for example, such that results of the filtering process are not used in the video decoding pipeline. To facilitate further improving encoding efficiency, the transcoder block 56 may entropy encode the prediction residual, the encoding parameters, and/or the filter parameters 74 based at least in part on transcoder parameters 76) [Chou: col. 13, line 44-53] – Note: Post-processing such as post-filtering is a refine technique for removing blocking artifacts) to mitigate compression artifacts (to facilitate improving video quality, decoded image data may be filtered before being used to display an image. In some embodiments, the filter block 54 may determine filter parameters 74 expected to reduce likelihood of perceivable visual artifacts when applied to decoded image data. For example, the filter block 54 may determine filter parameters 74 expected to reduce likelihood of producing perceivable block artifacts. Additionally or alternatively, the filter block 54 may determine filter parameters 74 expected to reduce likelihood of producing other types of perceivable visual artifacts, such as color fringing artifacts and/or ringing artifacts. In some embodiments, filter parameters 74 expected to reduce likelihood of producing different types of perceivable visual artifacts may be determined
in parallel) [Chou: col. 13, line 16-30], thereby producing a refined current unit ((Additionally or alternatively, the machine learning block 34 may be trained such that, when the machine learning block 34 analyzes input image data 70 (e.g., reconstructed image data), the filter parameters 74 determined based on resulting feature metrics 72 are expected to reduce likelihood of displaying perceivable visual artifacts) [Chou: col. 17, line 57-63] – Note: Reducing likelihood of displaying perceivable visual artifacts is a refine techniques in video processing) (In other words, utilizing the machine learning block, the video encoding pipeline may determine encoding parameters in a content dependent manner, which at least in some instances may facilitate improving the degree of matching between the prediction sample and the input image data (e.g., a coding group). As described above, in some embodiments, improving the matching degree may facilitate improving encoding efficiency, for example, by reducing number of bits included in encoded image data to indicate a prediction residual. Moreover, in some embodiments,) [Chou: col. 7, line 58-67]), wherein, as part of a temporal feedback loop, the refining the decoded current unit is based at least in part on the given warped previous unit ((a post-processing block (e.g., after a video decoding pipeline). When implemented as a post-processing block, in some embodiments, video decoding may be performed relatively independent of the filtering process, for example, such that results of the filtering process are not used in the video decoding pipeline. To facilitate further improving encoding efficiency, the transcoder block 56 may entropy encode the prediction residual, the encoding parameters, and/or the filter parameters 74 based at least in part on transcoder parameters 76) [Chou: col. 13, line 44-53] – Note: Post-processing such as post-filtering is a refine technique for removing blocking artifacts).
In the same field of endeavor, Vangala further discloses the claim limitation as follows:
refining the decoded current unit to mitigate compression artifacts ((The source code analytics platform may execute a program analysis module 510 to refine the node artifacts and a machine learning module 512 to correlate the node artifacts) [Vangala: col. 6, line 13-16]; (executing a program analysis module to perform at least one of a code analysis of the source code set and a metadata analysis of the code production data set to refine the node artifact) [Vangala: col. 10, line 10-13]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou with Vangala to program the system to implement the Vangala’s method.  
Therefore, the combination of Chou with Vangala will enable the source code analytics platform of the system to refine the node artifacts and a machine learning module to correlate the node artifacts [Vangala: col. 6, line 13-16].
Chou and Vangala do not explicitly disclose the warping operation using the ML neural network.
warping the given previous unit to spatially align sample values of the given previous unit with locations in the decoded current unit, thereby producing a given warped previous unit;providing the given warped previous unit to a machine learning ("ML") refinement network; and 
wherein, as part of a temporal feedback loop, the refining the decoded current unit is based at least in part on the given warped previous unit.
However, in the same field of endeavor Rozendaal further discloses the claim limitations and the deficient claim limitations as follows:
retrieving a given previous unit ((i.e. Video decoder may use reconstructed frame 510 as input to a warping function (warp) 512 for use with a future decoded P-frame) [Rozendaal: col. 15, line 44-46]; (i.e. Video encoder 200 or video decoder 300 may include FINT kernel 432, which may be implemented in NSP(s) 430, to perform block-based warping 612, which may be overlapped block-based warping, of reconstructed previous frame Xt-1 using the motion vector ft) [Rozendaal: col. 16, line 42-47]);
warping the given previous unit to spatially align sample values of the given previous unit with locations in the decoded current unit, thereby producing a given warped previous unit (i.e. warp the previous reconstructed video data with an overlapped block-based warp function using the block-based motion vector to generate predicted
current video data; and sum the predicted current video data with a residual block to generate the current reconstructed video data) [Rozendaal: col. 15, line 11-16; Figs. 3, 17];providing the given warped previous unit to a machine learning ("ML") refinement network ((i.e. This disclosure describes techniques for encoding and decoding media data (e.g., images or videos) using neural network-based media coding techniques. In particular, this disclosure describes techniques for using warping to decode encoded media data. In particular, this disclosure describes  techniques for neural-network-based media coding that uses block-based warping. Example techniques of this disclosure include a 1080p YUV420 architecture, predictive modeling to improve compression performance, quantization-aware training, parallel entropy coding (for example, on a GPU),
and/or pipelined inferencing. The techniques of this disclosure may improve the performance of a neural-network based media coder. Such an improved neural-network-based media coder may be utilized in a battery powered device such as a mobile device (e.g., a smartphone)) [Rozendaal: col. 4, line 12-24]; (i.e. Many neural video coders assume availability of pixel-based or feature-based warping operations) [Rozendaal: col. 5, line 8-10]).
wherein, as part of a temporal feedback loop (i.e. Feedback Recurrent Autoencoder for Video Compression") [Rozendaal: Other Publications], the refining the decoded current unit is based at least in part on the given warped previous unit (i.e. Feedback Recurrent Autoencoder for Video Compression") [Rozendaal: Other Publications; Figs. 4-5, 16 – Note: Please see the feedback loop and the given warped previous unit is used for the refining the decoded current unit  in Figs. 4 and 16].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou and Vangala with Rozendaal to program the system to implement of Rozendaal’s method.  
Therefore, the combination of Chou and Vangala with Rozendaal will enable the system to improve the performance of a neural-network-based media coder [Rozendaal: col. 4, line 12-24]. 

Regarding claim 20, Chou meets the claim limitations as follows:
In a computer system (machine learning video processing systems) [Chou: Abstract], a method (method) [Chou: Abstract] of training a machine learning (“ML”) refinement network (a machine learning block may be implemented (e.g., in a video encoding pipeline and/or a video decoding pipeline) to leverage machine learning techniques, such as convolutional neural network (CNN)) [Chou: col. 2, line 41-45]  for post-processing of video (e.g., after a video decoding pipeline). When implemented as a post-processing block, in some embodiments, video decoding may be performed relatively independent of the filtering process, for example, such that results of the filtering process are not used in the video decoding pipeline. To facilitate further improving encoding efficiency, the transcoder block 56 may entropy encode the prediction residual, the encoding parameters, and/or the filter parameters 74 based at least in part on transcoder parameters 76) [Chou: col. 13, line 44-53] – Note: Post-processing such as post-filtering is a refine technique for removing blocking artifacts), the method comprising (method) [Chou: Abstract]:  
receiving a current unit of input video (image data may be received from another electronic device and/or stored in the electronic device) [Chou: col. 1, line 30-31]; encoding the current unit of input video (image data may be encoded ( e.g., compressed) to reduce size ( e.g., number of bits)) [Chou: col. 10, line 31-32], thereby producing encoded data for the current unit of input video (image data may be encoded based at least in part on prediction techniques. For example, image data corresponding with a portion (e.g., block) of an image frame may be encoded based on a prediction sample, which indicates a prediction of at least the portion of the image frame) [Chou: col. 1, line 39-44];decoding the encoded data ((decode the encoded image data) [Chou: col. 1, line 32-33] ]; (the electronic device may decode encoded image data and instruct the electronic display to adjust luminance of its display pixels based on the decoded image data) [Chou: col. 1, line 35-38]), thereby producing a decoded current unit (reconstructed image data)) [Chou: col. 12, line 18]; 
retrieving a given previous unit (Additionally, the machine learning block 34 may receive input image data 70 (process block 152). In some embodiments, the input image data 70 may be reconstructed image data previously determined by the reconstruction block 52. Thus, in such embodiments, the machine learning block 34 55 may receive the input image data 70 from an internal frame buffer or the video encoding pipeline 32. Additionally or alternatively, the input image data 70 may be source image data 36, encoded image data 38, and/or decoded image data 126. Thus, in some embodiments, the machine learning block 34 may receive the input image data 70 from an image data source, such as the processor core complex 18, the main memory storage device 22, the local memory 20, and/or the image sensor 13) [Chou: col. 26, line 50-63];warping the given previous unit to spatially align sample values of the given previous unit with locations in the decoded current unit, thereby producing a given warped previous unit;providing the given warped previous unit to a machine learning ("ML") refinement network;  with the ML refinement network (a machine learning block may be implemented (e.g., in a video encoding pipeline and/or a video decoding pipeline) to leverage machine learning techniques, such as convolutional neural network (CNN)) [Chou: col. 2, line 41-45], refining the decoded current unit ((a post-processing block (e.g., after a video decoding pipeline). When implemented as a post-processing block, in some embodiments, video decoding may be performed relatively independent of the filtering process, for example, such that results of the filtering process are not used in the video decoding pipeline. To facilitate further improving encoding efficiency, the transcoder block 56 may entropy encode the prediction residual, the encoding parameters, and/or the filter parameters 74 based at least in part on transcoder parameters 76) [Chou: col. 13, line 44-53] – Note: Post-processing such as post-filtering is a refine technique for removing blocking artifacts) to mitigate compression artifacts (to facilitate improving video quality, decoded image data may be filtered before being used to display an image. In some embodiments, the filter block 54 may determine filter parameters 74 expected to reduce likelihood of perceivable visual artifacts when applied to decoded image data. For example, the filter block 54 may determine filter parameters 74 expected to reduce likelihood of producing perceivable block artifacts. Additionally or alternatively, the filter block 54 may determine filter parameters 74 expected to reduce likelihood of producing other types of perceivable visual artifacts, such as color fringing artifacts and/or ringing artifacts. In some embodiments, filter parameters 74 expected to reduce likelihood of producing different types of perceivable visual artifacts may be determined
in parallel) [Chou: col. 13, line 16-30], thereby producing a refined current unit ((Additionally or alternatively, the machine learning block 34 may be trained such that, when the machine learning block 34 analyzes input image data 70 (e.g., reconstructed image data), the filter parameters 7 4 determined based on resulting feature metrics 72 are expected to reduce likelihood of displaying perceivable visual artifacts) [Chou: col. 17, line 57-63] – Note: Reducing likelihood of displaying perceivable visual artifacts is a refine techniques in video processing);determining feedback based at least in part on differences between the current unit of input video and the refined current unit ((Based on the prediction sample, a prediction residual, which indicates difference between the prediction sample and the source image data, may be determined) [Chou: col. 2, line 20-23] – Note: 
Application’s specification the feedback is determined based on the differences the current unit and the refined current unit [para. 004]); and adjusting the ML refinement network (Additionally or alternatively, the machine learning block 34 may be trained such that, when the machine learning block 34 analyzes input image data 70 (e.g., reconstructed image data), the filter parameters 74 determined based on resulting feature metrics 72 are expected to reduce likelihood of displaying perceivable visual artifacts) [Chou: col. 17, line 57-63]; (In other words, utilizing the machine learning block, the video encoding pipeline may determine encoding parameters in a content dependent manner, which at least in some instances may facilitate improving the degree of matching between the prediction sample and the input image data (e.g., a coding group). As described above, in some embodiments, improving the matching degree may facilitate improving encoding efficiency, for example, by reducing number of bits included in encoded image data to indicate a prediction residual) [Chou: col. 7, line 58-67]) based at least in part on the feedback (After training, the machine learning block may enable the video encoding pipeline to determine encoding parameters based at least in part on content analysis provided by the machine learning block. For example, when a prediction technique is to be implemented and the machine learning block is enabled, the video encoding pipeline may determine a prediction sample by down-sampling (e.g., source image data and/or prediction residuals), applying a forward transform and a forward quantization, applying an inverse quantization and an inverse transform, and upscaling. In some embodiments, the video encoding pipeline may down-scale and/or up-scale based on encoding parameters, for example, which indicate target aspect ratio, target filter ( e.g., interpolation) weighting, and/or target filter mode. By leveraging the feature metrics determined by the machine learning block, the video encoding pipeline may adaptively adjust the encoding parameters used to determine the prediction sample. In other words, utilizing the machine learning block, the video encoding pipeline may determine encoding parameters in a content dependent manner, which at least in some instances may facilitate improving the degree of matching between the prediction sample and the input image data (e.g., a coding group). As described above, in some embodiments, improving the matching degree may facilitate improving encoding efficiency, for example, by reducing number of bits included in encoded image data to indicate a prediction residual.) [Chou: col. 7, line 40-67], wherein, as part of a temporal feedback loop, the refining the decoded current unit is based at least in part on the given warped previous unit ((a post-processing block (e.g., after a video decoding pipeline). When implemented as a post-processing block, in some embodiments, video decoding may be performed relatively independent of the filtering process, for example, such that results of the filtering process are not used in the video decoding pipeline. To facilitate further improving encoding efficiency, the transcoder block 56 may entropy encode the prediction residual, the encoding parameters, and/or the filter parameters 74 based at least in part on transcoder parameters 76) [Chou: col. 13, line 44-53] – Note: Post-processing such as post-filtering is a refine technique for removing blocking artifacts).
In the same field of endeavor, Vangala further discloses the claim limitation as follows:
refining the decoded current unit to mitigate compression artifacts ((The source code analytics platform may execute a program analysis module 510 to refine the node artifacts and a machine learning module 512 to correlate the node artifacts) [Vangala: col. 6, line 13-16]; (executing a program analysis module to perform at least one of a code analysis of the source code set and a metadata analysis of the code production data set to refine the node artifact) [Vangala: col. 10, line 10-13]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou with Vangala to program the system to implement the Vangala’s method.  
Therefore, the combination of Chou with Vangala will enable the source code analytics platform of the system to refine the node artifacts and a machine learning module to correlate the node artifacts [Vangala: col. 6, line 13-16].
Chou and Vangala do not explicitly disclose the warping operation using the ML neural network.
warping the given previous unit to spatially align sample values of the given previous unit with locations in the decoded current unit, thereby producing a given warped previous unit;providing the given warped previous unit to a machine learning ("ML") refinement network; and 
wherein, as part of a temporal feedback loop, the refining the decoded current unit is based at least in part on the given warped previous unit.
However, in the same field of endeavor Rozendaal further discloses the claim limitations and the deficient claim limitations as follows:
retrieving a given previous unit ((i.e. Video decoder may use reconstructed frame 510 as input to a warping function (warp) 512 for use with a future decoded P-frame) [Rozendaal: col. 15, line 44-46]; (i.e. Video encoder 200 or video decoder 300 may include FINT kernel 432, which may be implemented in NSP(s) 430, to perform block-based warping 612, which may be overlapped block-based warping, of reconstructed previous frame Xt-1 using the motion vector ft) [Rozendaal: col. 16, line 42-47]);
warping the given previous unit to spatially align sample values of the given previous unit with locations in the decoded current unit, thereby producing a given warped previous unit (i.e. warp the previous reconstructed video data with an overlapped block-based warp function using the block-based motion vector to generate predicted
current video data; and sum the predicted current video data with a residual block to generate the current reconstructed video data) [Rozendaal: col. 15, line 11-16; Figs. 3, 17];providing the given warped previous unit to a machine learning ("ML") refinement network ((i.e. This disclosure describes techniques for encoding and decoding media data (e.g., images or videos) using neural network-based media coding techniques. In particular, this disclosure describes techniques for using warping to decode encoded media data. In particular, this disclosure describes  techniques for neural-network-based media coding that uses block-based warping. Example techniques of this disclosure include a 1080p YUV420 architecture, predictive modeling to improve compression performance, quantization-aware training, parallel entropy coding (for example, on a GPU),
and/or pipelined inferencing. The techniques of this disclosure may improve the performance of a neural-network based media coder. Such an improved neural-network-based media coder may be utilized in a battery powered device such as a mobile device (e.g., a smartphone)) [Rozendaal: col. 4, line 12-24]; (i.e. Many neural video coders assume availability of pixel-based or feature-based warping operations) [Rozendaal: col. 5, line 8-10]).
wherein, as part of a temporal feedback loop (i.e. Feedback Recurrent Autoencoder for Video Compression") [Rozendaal: Other Publications], the refining the decoded current unit is based at least in part on the given warped previous unit (i.e. Feedback Recurrent Autoencoder for Video Compression") [Rozendaal: Other Publications; Figs. 4-5, 16 – Note: Please see the feedback loop and the given warped previous unit is used for the refining the decoded current unit  in Figs. 4 and 16].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou and Vangala with Rozendaal to program the system to implement of Rozendaal’s method.  
Therefore, the combination of Chou and Vangala with Rozendaal will enable the system to improve the performance of a neural-network-based media coder [Rozendaal: col. 4, line 12-24].

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Chou (US Patent 10,979,718 B2), (“Chou”), in view of Vangala et al. (US Patent 9,298,453 B2), (“Vangala”), in view of Rozendaal et al. (US Patent 12,501,050 B2), (“Rozendaal”), in view of Zhang et al. (US Patent Application Publication US 2024/0386704 A1), (“Zhang”).
Regarding claim 2, Chou meets the claim limitations as set forth in claim 1. Chou further meets the claim limitations as follow.
wherein the ML refinement network  is a convolutional neural network (a machine learning block may be implemented (e.g., in a video encoding pipeline and/or a video decoding pipeline) to leverage machine learning techniques, such as convolutional neural network (CNN)) [Chou: col. 2, line 41-45] having a U-Net architecture.
Chou, Vangala and Rozendaal do not explicitly disclose the following claim limitations (Emphasis added).
wherein the ML refinement network is a convolutional neural network having a U-Net architecture. 
However, in the same field of endeavor Zhang further discloses the deficient claim limitations as follows:
a convolutional neural network (For example, a convolutional neural network (CNN) can be trained to perform semantic image segmentation by inputting into the CNN many training images and providing a known output (or label) for each training image. In some cases, visual transformers may be utilized to perform semantic image segmentation, among various other machine learning and/or neural network architectures. The known output for each training image can include a groundtruth segmentation mask corresponding to a given training image) [Zhang: para. 0065] having a U-Net architecture (In some cases, one or more image processing (e.g., image2image) machine learning models can be based on Unet (e.g., a fully convolutional neural network implementing a U-shaped encoder-decoder network architecture). For example, one or more (or both) of the low precision model inference associated with block 312 and the high precision model inference associated with block 314 can be performed using an image2image model based on Unet) [Zhang: para. 0082].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou, Vangala and Rozendaal with Zhang to program the system to implement of Zhang’s method.  
Therefore, the combination of Chou, Vangala and Rozendaal with Zhang will enable the system to support image/video quality enhancements [Zhang: para. 0030].

Claims 7-8 and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Chou (US Patent 10,979,718 B2), (“Chou”), in view of Vangala et al. (US Patent 9,298,453 B2), (“Vangala”), in view of Rozendaal et al. (US Patent 12,501,050 B2), (“Rozendaal”), in view of Mao et al. (US Patent 11,809,998 B2), (“Mao”).
Regarding claim 7, Chou meets the claim limitations as set forth in claim 6.Chou further meets the claim limitations as follow.
wherein the warping uses motion estimation and/or forward projection of motion (In some embodiments, the inter parameters 58 may indicate whether to determine the prediction sample by implementing base inter prediction techniques (e.g., motion estimation) or by processing (e.g., down-scale, FTQ, ITQ, and up-scale) image data corresponding to a different image using the machine learning block 34.) [Chou: col. 16, line 25-30] from the given decoded previous unit (Accordingly, to display image frames, the processor core complex 18 may retrieve encoded image data from memory, decode the encoded image data, and instruct the electronic display 12 to display image frames based on the decoded image data) [Chou: col. 10, line 30-33]; (Generally, the convolutional neural network block 34A includes one or more convolution layers 66, which each implements convolution weights 68, connected via layer interconnections 71. For example, in the depicted embodiment, the convolutional neural network block 34A includes a first convolution layer 66A that implements first convolution weights 68A, a second convolution layer 66B that implements second convolution weights 68B, and so. Additionally, outputs of the first convolution layer 66Amay be connected to inputs of the second convolution layer 66B via one or more layer interconnections 71. It should be appreciated that the depicted convolutional neural network block 34A is merely intended to be illustrative and not limiting. In particular, the machine learning parameters 64 of the convolutional neural network block 34A may be adjusted by adjusting the number of convolution layers 66, associated convolution weights 68, and/or configuration (e.g., number and/or interconnected nodes) of the layer interconnections 71. In other words, a convolutional neural network block 34A may include any suitable number of convolution layers 66, for example, one convolution layer 66, two convolution layers 66, or more. Additionally, in some embodiments, a convolutional neural network block 34A may include additional layers, such as one or more pooling layers (not depicted).) [Chou: col. 15, line 1-26]).
Chou and Vangala do not explicitly disclose the following claim limitations (Emphasis added).
wherein the warping uses motion estimation and/or forward projection of motion from the given decoded previous unit. 
However, in the same field of endeavor Rozendaal further discloses the claim limitations and the deficient claim limitations as follows:
the warping ((i.e. Video decoder may use reconstructed frame 510 as input to a warping function (warp) 512 for use with a future decoded P-frame) [Rozendaal: col. 15, line 44-46]; (i.e. Video encoder 200 or video decoder 300 may include FINT kernel 432, which may be implemented in NSP(s) 430, to perform block-based warping 612, which may be overlapped block-based warping, of reconstructed previous frame Xt-1 using the motion vector ft) [Rozendaal: col. 16, line 42-47]) (i.e. warp the previous reconstructed video data with an overlapped block-based warp function using the block-based motion vector to generate predicted current video data; and sum the predicted current video data with a residual block to generate the current reconstructed video data) [Rozendaal: col. 15, line 11-16].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou and Vangala with Rozendaal to program the system to implement of Rozendaal’s method.  
Therefore, the combination of Chou and Vangala with Rozendaal will enable the system to improve the performance of a neural-network-based media coder [Rozendaal: col. 4, line 12-24].
In the same field of endeavor Mao further discloses the claim limitations as follows:
wherein the warping uses motion estimation and/or forward projection of motion from the given decoded previous unit (In some examples, the system 2400 can warp each region of pixels (e.g., each 8x8 pixel region) in an input image. The region can be considered as one pixel after the warping, in which case each region of pixels can be represented by a particular pixel in a feature map with 64 channels, followed by one dustbin channel. If there is no point of interest (e.g., keypoint) detected in a particular 8x8 region, the dustbin can have a high activation. If a keypoint is detected in an 8x8 region, the 64 other channels can pass through a softmax architecture to find the key point in the 8x8 region. In some cases, the system 2400 can compute 2D point of interest locations and descriptors in a single forward pass and can run at 70 frames per second (fps) on 480x640 images with
a Titan X graphics processing unit (GPU)) [Mao: col. 56, line 30-43]; (The image
stabilization process includes tracking one or more feature points between two consecutive frames. The tracked features allow the system to estimate the motion between frames and compensate for the motion. An input frame sequence 1202 including a sequence of frames is provided as input to the process 1200. The input frame sequence 1202 can include the output frames 814. At block 1204, the process 1200 includes performing saliency points detection using optical flow. The saliency detection is performed to determine feature points in a current frame. Any suitable type of optical flow technique or algorithm can be used at block 1204. The optical flow motion estimation can be performed on a pixel-by-pixel basis in some cases. For instance, for each pixel in a current frame y, the motion estimation f defines the location of the corresponding pixel in the previous frame x. The motion estimation f for each pixel can include an optical flow vector that indicates a movement of the pixel between the frames. In some cases, the optical flow vector for a pixel can be a displacement vector (e.g., indicating horizontal and vertical displacements, such as x- and y-displacements) showing the movement of a pixel from a first frame to a second frame) [Mao: col. 43, line 36-53]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou, Vangala and Rozendaal with Mao to program the system to implement of Mao’s method.  
Therefore, the combination of Chou, Vangala and Rozendaal with Mao will enable the system to scale the portion of the second frame based on the size of the object in the first frame [Mao: col. 2, line 2-4], and reducing the labor cost [Mao: col. 14, line 46-49].

Regarding claim 8, Chou meets the claim limitations as set forth in claim 6.Chou further meets the claim limitations as follow.
for each of one or more subsequent units as the current unit, repeating the receiving, the decoding, the refining, the processing, and the outputting (When the filter block 54 is implemented in the video encoding pipeline 32, the filter parameters 74 may be communicated along with the encoded image data 38. As will be described in more detail below, in some embodiments, the filter block 54 may additionally or alternatively be implemented on the decoding side, for example, in a video decoding pipeline. In other words, in some embodiments, a filter block 54 may be implemented in loop to enable results of the filtering process to be used in subsequent processing, for example, by the video encoding pipeline 32 or a video decoding pipeline.) [Chou: col. 13, line 31-41] – Note: In-loop filtering requires repeating the receiving, the decoding, the refining, the processing, and the outputting coding data).
In the same field of endeavor, Vangala further discloses the claim limitation as follows:
repeating the receiving, the decoding, the refining, the processing, and the outputting ((A value object 204 may be a constant or a variable that may be passed to other functions or classes. The class object 202 may receive the value object 204 from the application that activated the class object 202, self generate the value object 204, or return the value object 204 to the activating application. The code line 206 may describe an operation performed on a value 204, execute a branching instruction, or describe a loop. A function 208 is a subroutine that may perform a series of operations on a value 204 passed to the function 208) [Vangala: col. 4, line 12-21]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou with Vangala to program the system to implement the Vangala’s method.  
Therefore, the combination of Chou with Vangala will enable the source code analytics platform of the system to refine the node artifacts and a machine learning module to correlate the node artifacts [Vangala: col. 6, line 13-16].
In the same field of endeavor Mao further discloses the deficient claim limitations as follows:
repeating the receiving, the decoding, the refining, the processing, and the outputting  ((For example, the object detection and tracking system can extract points of interest from one or more input frames. The points of interest can include two-dimensional (2D) locations in a frame that are stable and repeatable from different lighting conditions and viewpoints. The points of interest can also be referred to as keypoints or landmarks) [Mao: col. 55, line 47-53]; (In some cases, the neural network 4200 can adjust the weights of the nodes using a training process called backpropagation. Backpropagation can include a forward pass, a loss function, a backward pass, and a weight update. The forward pass, loss function, backward pass, and parameter update is performed for one training iteration. The process can be repeated for a certain number of iterations for each set of training images until the neural network 4200 is trained well enough so that the weights of the layers are accurately tuned) [Mao: col. 64, line 19-28]; (Repeat operations 3 and 4 until center xy=(w/2, h/2)) [Mao: col. 59, line 33-34]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou, Vangala and Rozendaal with Mao to program the system to implement of Mao’s method.  
Therefore, the combination of Chou, Vangala and Rozendaal with Mao will enable the system to scale the portion of the second frame based on the size of the object in the first frame [Mao: col. 2, line 2-4], and reducing the labor cost [Mao: col. 14, line 46-49].

Regarding claim 11, Chou meets the claim limitations as set forth in claim 10.Chou further meets the claim limitations as follow.
wherein the warping uses motion estimation and/or forward projection of motion (In some embodiments, the inter parameters 58 may indicate whether to determine the prediction sample by implementing base inter prediction techniques (e.g., motion estimation) or by processing (e.g., down-scale, FTQ, ITQ, and up-scale) image data corresponding to a different image using the machine learning block 34.) [Chou: col. 16, line 25-30] from the given refined previous unit (Accordingly, to display image frames, the processor core complex 18 may retrieve encoded image data from memory, decode the encoded image data, and instruct the electronic display 12 to display image frames based on the decoded image data) [Chou: col. 10, line 30-33]; (Generally, the convolutional neural network block 34A includes one or more convolution layers 66, which each implements convolution weights 68, connected via layer interconnections 71. For example, in the depicted embodiment, the convolutional neural network block 34A includes a first convolution layer 66A that implements first convolution weights 68A, a second convolution layer 66B that implements second convolution weights 68B, and so. Additionally, outputs of the first convolution layer 66Amay be connected to inputs of the second convolution layer 66B via one or more layer interconnections 71. It should be appreciated that the depicted convolutional neural network block 34A is merely intended to be illustrative and not limiting. In particular, the machine learning parameters 64 of the convolutional neural network block 34A may be adjusted by adjusting the number of convolution layers 66, associated convolution weights 68, and/or configuration (e.g., number and/or interconnected nodes) of the layer interconnections 71. In other words, a convolutional neural network block 34A may include any suitable number of convolution layers 66, for example, one convolution layer 66, two convolution layers 66, or more. Additionally, in some embodiments, a convolutional neural network block 34A may include additional layers, such as one or more pooling layers (not depicted).) [Chou: col. 15, line 1-26]).
Chou and Vangala do not explicitly disclose the following claim limitations (Emphasis added).
wherein the warping uses motion estimation and/or forward projection of motion from the given decoded previous unit. 
However, in the same field of endeavor Rozendaal further discloses the claim limitations and the deficient claim limitations as follows:
the warping ((i.e. Video decoder may use reconstructed frame 510 as input to a warping function (warp) 512 for use with a future decoded P-frame) [Rozendaal: col. 15, line 44-46]; (i.e. Video encoder 200 or video decoder 300 may include FINT kernel 432, which may be implemented in NSP(s) 430, to perform block-based warping 612, which may be overlapped block-based warping, of reconstructed previous frame Xt-1 using the motion vector ft) [Rozendaal: col. 16, line 42-47]) (i.e. warp the previous reconstructed video data with an overlapped block-based warp function using the block-based motion vector to generate predicted current video data; and sum the predicted current video data with a residual block to generate the current reconstructed video data) [Rozendaal: col. 15, line 11-16].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou and Vangala with Rozendaal to program the system to implement of Rozendaal’s method.  
Therefore, the combination of Chou and Vangala with Rozendaal will enable the system to improve the performance of a neural-network-based media coder [Rozendaal: col. 4, line 12-24].
In the same field of endeavor Mao further discloses the claim limitations as follows:
wherein the warping uses motion estimation and/or forward projection of motion from the given decoded previous unit (In some examples, the system 2400 can warp each region of pixels (e.g., each 8x8 pixel region) in an input image. The region can be considered as one pixel after the warping, in which case each region of pixels can be represented by a particular pixel in a feature map with 64 channels, followed by one dustbin channel. If there is no point of interest (e.g., keypoint) detected in a particular 8x8 region, the dustbin can have a high activation. If a keypoint is detected in an 8x8 region, the 64 other channels can pass through a softmax architecture to find the key point in the 8x8 region. In some cases, the system 2400 can compute 2D point of interest locations and descriptors in a single forward pass and can run at 70 frames per second (fps) on 480x640 images with
a Titan X graphics processing unit (GPU)) [Mao: col. 56, line 30-43]; (The image
stabilization process includes tracking one or more feature points between two consecutive frames. The tracked features allow the system to estimate the motion between frames and compensate for the motion. An input frame sequence 1202 including a sequence of frames is provided as input to the process 1200. The input frame sequence 1202 can include the output frames 814. At block 1204, the process 1200 includes performing saliency points detection using optical flow. The saliency detection is performed to determine feature points in a current frame. Any suitable type of optical flow technique or algorithm can be used at block 1204. The optical flow motion estimation can be performed on a pixel-by-pixel basis in some cases. For instance, for each pixel in a current frame y, the motion estimation f defines the location of the corresponding pixel in the previous frame x. The motion estimation f for each pixel can include an optical flow vector that indicates a movement of the pixel between the frames. In some cases, the optical flow vector for a pixel can be a displacement vector (e.g., indicating horizontal and vertical displacements, such as x- and y-displacements) showing the movement of a pixel from a first frame to a second frame) [Mao: col. 43, line 36-53]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou, Vangala and Rozendaal with Mao to program the system to implement of Mao’s method.  
Therefore, the combination of Chou, Vangala and Rozendaal with Mao will enable the system to scale the portion of the second frame based on the size of the object in the first frame [Mao: col. 2, line 2-4], and reducing the labor cost [Mao: col. 14, line 46-49].

Regarding claim 12, Chou meets the claim limitations as set forth in claim 10.Chou further meets the claim limitations as follow.
for each of one or more subsequent units as the current unit, repeating the receiving, the decoding, the refining, the processing, and the outputting (When the filter block 54 is implemented in the video encoding pipeline 32, the filter parameters 74 may be communicated along with the encoded image data 38. As will be described in more detail below, in some embodiments, the filter block 54 may additionally or alternatively be implemented on the decoding side, for example, in a video decoding pipeline. In other words, in some embodiments, a filter block 54 may be implemented in loop to enable results of the filtering process to be used in subsequent processing, for example, by the video encoding pipeline 32 or a video decoding pipeline.) [Chou: col. 13, line 31-41] – Note: In-loop filtering requires repeating the receiving, the decoding, the refining, the processing, and the outputting coding data).
In the same field of endeavor, Vangala further discloses the claim limitation as follows:
repeating the receiving, the decoding, the refining, the processing, and the outputting ((A value object 204 may be a constant or a variable that may be passed to other functions or classes. The class object 202 may receive the value object 204 from the application that activated the class object 202, self generate the value object 204, or return the value object 204 to the activating application. The code line 206 may describe an operation performed on a value 204, execute a branching instruction, or describe a loop. A function 208 is a subroutine that may perform a series of operations on a value 204 passed to the function 208) [Vangala: col. 4, line 12-21]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou with Vangala to program the system to implement the Vangala’s method.  
Therefore, the combination of Chou with Vangala will enable the source code analytics platform of the system to refine the node artifacts and a machine learning module to correlate the node artifacts [Vangala: col. 6, line 13-16].
In the same field of endeavor Mao further discloses the deficient claim limitations as follows:
repeating the receiving, the decoding, the refining, the processing, and the outputting  ((For example, the object detection and tracking system can extract points of interest from one or more input frames. The points of interest can include two-dimensional (2D) locations in a frame that are stable and repeatable from different lighting conditions and viewpoints. The points of interest can also be referred to as keypoints or landmarks) [Mao: col. 55, line 47-53]; (In some cases, the neural network 4200 can adjust the weights of the nodes using a training process called backpropagation. Backpropagation can include a forward pass, a loss function, a backward pass, and a weight update. The forward pass, loss function, backward pass, and parameter update is performed for one training iteration. The process can be repeated for a certain number of iterations for each set of training images until the neural network 4200 is trained well enough so that the weights of the layers are accurately tuned) [Mao: col. 64, line 19-28]; (Repeat operations 3 and 4 until center xy=(w/2, h/2)) [Mao: col. 59, line 33-34]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chou, Vangala and Rozendaal with Mao to program the system to implement of Mao’s method.  
Therefore, the combination of Chou, Vangala and Rozendaal with Mao will enable the system to scale the portion of the second frame based on the size of the object in the first frame [Mao: col. 2, line 2-4], and reducing the labor cost [Mao: col. 14, line 46-49].                                                                                                                                                                                  
Reference Notice 
Additional prior arts, included in the Notice of Reference Cited, made of record and not relied upon is considered pertinent to applicant's disclosure.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Philip Dang whose telephone number is (408) 918-7529.  The examiner can normally be reached on Monday-Thursday between 8:30 am - 5:00 pm (PST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sath Perungavoor can be reached on 571-272-7455.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Philip P. Dang/Primary Examiner, Art Unit 2488
Read full office action
Prosecution Timeline

May 31, 2024
Application Filed
Jun 02, 2025
Non-Final Rejection — §103
Nov 04, 2025
Response Filed
Jan 07, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/672,798
Patent 12602837
ON SUB-DIVISION OF MESH SEQUENCES
2y 5m to grant Granted Apr 14, 2026
18/983,497
Patent 12593116
IMAGING MEASUREMENT DEVICE USING GAS ABSORPTION IN THE MID-INFRARED BAND AND OPERATING METHOD OF IMAGING MEASUREMENT DEVICE
2y 5m to grant Granted Mar 31, 2026
18/935,098
Patent 12581069
METHOD FOR ENCODING/DECODING VIDEO SIGNAL, AND APPARATUS THEREFOR
2y 5m to grant Granted Mar 17, 2026
18/943,680
Patent 12581106
IMAGE DECODING METHOD AND DEVICE THEREFOR
2y 5m to grant Granted Mar 17, 2026
18/660,193
Patent 12574557
SCALABLE VIDEO CODING USING BASE-LAYER HINTS FOR ENHANCEMENT LAYER MOTION PARAMETERS
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
77%
Grant Probability
99%
With Interview (+33.2%)
2y 10m
Median Time to Grant
Moderate
PTA Risk
Based on 470 resolved cases by this examiner. Grant probability derived from career allow rate.