DETAILED ACTION
Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Status
2. Claims 1-20 are currently pending.
Claims 1 and 18-20 have been amended.
Response to Arguments
3. Applicant’s arguments with respect to the rejection(s) of claims 1-20 have been fully considered but are found unpersuasive.
However, Examiner considers pertinent to emphasize the rationale for this decision, where is determined that Chou teaches the claimed "filtering the current video block based on first information associated with one ….. previously coded frames of the video" and Cho teaches the alternative recited portion, "filtering the current video block based on first information associated with …. multiple previously coded frames of the video" as originally mapped at Claim 1.
3.1 Applicants Argument.
The argument in chief addressing both arts to Chou and Cho, alleges that the limitation reciting; “In this event, Cho also at least fails to disclose, teach or suggest the feature "filtering the current video block based on first information associated with one or multiple previously coded frames of the video" as recited…”.
3.2 Examiner’s Rebuttal
It is remarked that from the amended scope of claim 1 reciting;
“1. (Currently Amended) A method for video processing, comprising:
filtering a current video block of a video[[,]] according to a machine learning model during a conversion between [[a]]the current video block of a video and a bitstream of the video,
the filtering of the current video block being performed based on first information associated with one or multiple previously coded frames of the video; and
performing the conversion based on the filtered current video block.”
An analysis of Claim 1; identifies a “filtering” method (by pre-processing or post-processing) of the reconstructed video data (Id., previously coded frames) by using “a machine learning model” (i.e., neural networks) performing a “conversion” (presumably a using a coding method of prediction and reconstruction of video data) based on “first information associated …. with previously coded frames”, being stored as a “reference picture List” or “reference picture set”, indicating a neural network used in the prediction and filtering based on the recited; “….. one or multiple previously coded frames of the video;”.
Examiner reserves the choice to select one of the; “one” OR “multiple previously coded frames” for examination, as indicated at Claim 1.
To be remarked from Specification, that “filtering” by “machine learning” methods are known in the art as disclosed by Chou and Cho, under the premise that the recited “first information associated …. with previously coded frames”, remains vague and obvious to interpretation by the one skilled in the art.
No clear determination of the “first information” may be given where the terminology used directs to multiple coding processes and methods as determined from Specification at Pars.[0204, 0229, 0230-0259], fig.16 where the claimed “information” extends its applicability to numerous embodiments related to video coding process including block prediction and reconstruction at multiple levels from partitioning in different slice-type, block regions, intra/inter prediction modes, index attributes, collocated blocks, block matching, distortion [0242], motion estimation at different precisions etc., to which the filtering is applied.
In this regard, in response to applicant's argument, the fact that applicant has recognized another advantage which would flow naturally from following the suggestion of the prior art cannot be the basis for patentability when the differences would otherwise be obvious. See Ex parte Obiaya, 227 USPQ 58, 60 (Bd. Pat. App. & Inter. 1985).
In conclusion to the herein rebuttal, Examiner, recognizes the lack of specific reference to multiple reconstructed frames in the trained filtering process but, it is rather relevant from Cho that, contrary to the alleged interpretation in the Remarks, Cho teaches the limitation;
filtering a current video block of a video[[,]] according to a machine learning model (filtering a block based prediction, based on convolution filter operation at Sub.(2-2) Col.69 Lin.31-37, per Fig.40 at Col.69 Lin.38-45 etc.) during a conversion between [[a]]the current video block of a video and a bitstream of the video (during the process of a motion prediction and compensation for a block prediction, Sub.(2-2) Col.69 Lin.31-37),
the filtering of the current video block being performed (the current block prediction being based on vector information from encoding apparatus 1600 associated with the reconstructed i.e., previously coded frames, hat-Xn, Col.69 Lin.9-12, or predicting the current video block per information at Col.70 Lin.10-27) based on first information associated with one or multiple previously coded frames of the video (filtering the current block based on referencing multiple reconstructed frames where according to step 3710 in Fig.37, citing; “the processing unit may select a video generation network for inter prediction from among multiple video generation networks….”, by using interpolation and extrapolation of current frame and frames previous to the current flame, among video frames, Col.66 Lin.24-67 generated for a block, Col.67Lin.1-5 and applied to the CNN filtering process per Col.68 Lin.1-18 using buffered reference pictures, to which filtering is applied by considering multiple frames xn-1, xn-2 ….etc., per Fig.38, Col68 Lin.19-24, as earlier being associated with previously decoded frames Xn-1 -hat, Col.69 Lin.12-15, and where the filtering is performed, etc.); and
performing the conversion based on the filtered current video block (performing video coding i.e., conversion, by using convolution filter-based prediction, Col.70 Lin.11-21, etc.).
Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
In response, Examiner reiterates the obviousness rejection under Chou in view of Cho along with the new art resulting from a subsequent search and consideration performed.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application does not currently name joint inventors.
4. Claims 1-20 are rejected under 35 U.S.C. 103 as being obvious over Jim Chou et al., (hereinafter Chou) (US 11,616,960) and Seung-Hyun Cho et al., (hereinafter Cho) (US 10,841,577) in view of Tomohiro Ikai et al., (hereinafter Ikai) (US 11,889,070).
Re Claim 1. (Currently Amended) Chou discloses, a method for video processing, comprising:
filtering (where filter parameters may be determined by leveraging machine learning techniques, Col.3 Lin.22-34, 48-61 etc.),
according to a machine learning model (a “machine learning block model based on feature metric indicative of resulting distortion.”, Col.30 Lin.1-7, Col.30 Lin.16-25) during a conversion between a current video block of a video and a bitstream of the video, the current video block based on first information (considering the “conversion” as representing a video coding process, based on and according to a machine learning model information related to the coded frames as explained, at least at Col.28 Lin.58-67, “provide information” for video quality, Col.29 Lin.1-37, or Lin.59-66 and obviating the machine learning filtering by leveraging machine learning at higher performance Col.30 Lin.16-25, through pre-filtering out or in-loop of a pre-processing or post-processing in some embodiments, at Col.13 Lin.36-55, according to filter parameters per Col.13 Lin.22-35 and filtering by applying machine learning using target filter weights or a filter mode, Col. Lin.45-51.
In consideration to the difference in comprehending the ”filter” notion as being equally applied to Neural Network decision type of filtering or filtering the reconstructed video data for artifacts as claimed, it is determined that Chou teaches the claimed filtering matter as disclosed at least at Col.17 Lin.55-67 to Col.18 Lin.1-3)
associated with one or multiple previously coded frames of the video (during the coding process, filtering according to previously coded video image data based on video information, Col.5 Lin.6-10, type of data Col.5 Lin.36-44, or previously encoded images, Lin.51-55, Col.14 Lin.59-63, previously decoded image data 126, Col.25 Lin.29-31) of the video (as applied to the cited “filter parameters 74 may be determined using machine learning techniques…”, Col.26 Lin.14-15, at a machine learning block 34, Fig.14, is enabled to determine target filter parameters i.e., weights or coefficients and generate a target filter mode, algorithm, by adjusting inter/intra parameters based on feature metrics information associated with video data, Col.16 Lin.45-67 and Col.17 Lin.1-21); and
performing the conversion based on the filtered current video block (performing coding i.e., conversion of the filtered video block, Col.17 Lin.1-6 based on the filtered image data per Fig.13, Col.24 Lin.61-67, to Col.25 Lin.1-24, etc.).
Chou does not expressly teach the use of multiple reference frames within the CNN filtering process, where,
The analogous art to Cho, teaches the method and apparatus applying filtering to the coded data e.g., at encoder and decoder based on multiple previously coded frames of video data, reciting,
filtering a current video block of a video[[,]] according to a machine learning model (filtering a block based prediction, based on convolution filter operation at Sub.(2-2) Col.69 Lin.31-37, per Fig.40 at Col.69 Lin.38-45 etc.) during a conversion between [[a]]the current video block of a video and a bitstream of the video (during the process of a motion prediction and compensation for a block prediction, Sub.(2-2) Col.69 Lin.31-37),
the filtering of the current video block being performed (the current block prediction being based on vector information from encoding apparatus 1600 associated with the reconstructed i.e., previously coded frames, hat-Xn, Col.69 Lin.9-12, or predicting the current video block per information at Col.70 Lin.10-27) based on first information associated with one or multiple previously coded frames of the video (filtering the current block based on referencing multiple reconstructed frames where according to step 3710 in Fig.37, citing; “the processing unit may select a video generation network for inter prediction from among multiple video generation networks….”, by using interpolation and extrapolation of current frame and frames previous to the current flame, among video frames, Col.66 Lin.24-67 generated for a block, Col.67Lin.1-5 and applied to the CNN filtering process per Col.68 Lin.1-18 using buffered reference pictures, to which filtering is applied by considering multiple frames xn-1, xn-2 ….etc., per Fig.38, Col68 Lin.19-24, as earlier being associated with previously decoded frames Xn-1 -hat, Col.69 Lin.12-15, and where the filtering is performed, etc.); and
performing the conversion based on the filtered current video block (performing video coding i.e., conversion, by using convolution filter-based prediction, Col.70 Lin.11-21, etc.).
Based on the common coding methods taught by Chou and Cho, where in both cases the neural network trained machine is used in filtering the previously coded video frames in order to obtain a more efficient prediction by improving the subjective image quality and coding efficiency (Chou: Col.12 Lin.32, Col.15 Lin.25-26), the ordinary skilled in the art would have found obvious before the effective filing date of invention to consider the known methods of prediction and machine learning filtering as being commonly applied, hence deeming the combination predictable.
Furthermore, in order to emphasize the application of the neural network filtering process based on multiple previously coded frames, the art to Ikai is identified for teaching,
filtering a current video block of a video[[,]] (filtering an image, Abstract) according to a machine learning model (by using machine training using neural network Col.2 Lin.34-45, per unit 3051 in Fig.6 including neural networks, e.g., convolutional neural networks (CNN) or deep neural network (DNN) filtering, Col.13 Lin.1-37 or Fig.15 Col.22 Lin.31-67, as detailed at Col.23 Lin.1-24, … etc. ) during a conversion between [[a]]the current video block of a video and a bitstream of the video (within the inter prediction image processing of a video bitstream, per Figs. 4 and 5, according to signaled syntax for multiple layers at Col.5 Lin.25-65 and utilizing multi frame flags i.e., predflagL0 and predFlagL1, Col.7 Lin.59-67, representing reference picture list indices refIdxL0, refIdxL1, derived from a Reference Picture List, Col.8 Lin.1-20),
the filtering of the current video block (filtering the current block of a PU, based on coefficients, depicted in Figs.8, Col.15 Lin.1-13 or at Fig.14 using deep neural networks (DNN) per Col.20 Lin.17-44 according to DNN filters Col.23 Lin.4- Col.24 Lin.8-20) being performed based on first information associated with one (the first information being associated with a uni-prediction derived from the inter_pred_idc, indicating the number of reference pictures, e.g., indicating the use of one reference picture, L0 or L1, at Col.8 Lin.61-67) or multiple previously coded frames of the video (using multiple previously coded reference picture frames, in a bi-prediction BiPred, derived from both, L0 and L1 reference picture list, Col.9 Lin.1-2);
According to the common coding methods taught by Chou and Cho, where in both cases the neural network trained machine is used in filtering the previously coded video frames in order to obtain a more efficient prediction by improving the subjective image quality and coding efficiency (Chou: Col.12 Lin.32, Col.15 Lin.25-26), the ordinary skilled in the art would have found the incentive to associate the art to Ikai teaching in detail the advantage of neural network based filtering method leading to enhanced image quality (Col.2 Lin.48-50) to find obvious before the effective filing date of invention to consider previously applied methods of prediction and machine learning filtering as being commonly applied, hence deeming the combination predictable.
Re Claim 2. (Original) Chou, Cho and Ikai disclose, the method of claim 1, wherein the one or multiple previously coded frames comprise a reference frame in at least one of:
Cho teaches about, a reference picture list (RPL) associated with the current video block (Col.9 Lin.4-8),
a RPL associated with a current slice comprising the current video block (associated with the current slice, Col.42 Lin.55-57),
a RPL associated with a current frame comprising the current video block (associated with the current frame, Col.54 Lin.15),
a reference picture set (RPS) associated with the current video block (associated with the current block, Col.10 Lin.44-46, Col.36 Lin.39-42),
a RPS associated with the current slice (associated with the current slice, Col.42 Lin.55-57), or
a RPS associated with the current frame (associated with the current frame, Col.54 Lin.15); and
Cho teaches, wherein the one or multiple previously coded frames comprise at least one of:
a short-term reference frame of the current video block,
a short-term reference frame of the current slice, or
a short-term reference frame of the current frame; or
wherein the one or multiple previously coded frames comprise at least one of:
along-term reference frame of the current video block,
a long-term reference frame of the current slice, or
a long-term reference frame of the current frame (the sort term and long term storage, Col.57 Lin.28-30, regarding a slice, Col.26 Lin.46-48).
Ikai teaches the application of the Reference Picture List indices in the uni-prediction and bi-prediction mode (Col.8 Lin.61-67 and Col.9 Lin1-2).
Re Claim 3. (Original) Chou, Cho and Ikai disclose, the method of claim 1, wherein the one or multiple previously coded frames comprise a frame stored in a decoded picture buffer (DPB) that is not a reference frame;
Cho teaches, wherein at least one indicator is indicated in the bitstream to indicate the one or multiple previously coded frames;
wherein the method further comprising:
determining the one or multiple previously coded frames for the current video block (determining the coded frames from a decoded picture buffer (DPB), Col.68 Lin.16-18).
Re Claim 4. (Original) Chou, Cho and Ikai disclose, the method of claim 3,
Cho teaches, wherein the at least one indicator comprises an indicator to indicate a reference picture list comprising the one or multiple previously coded frames; or wherein the at least one indicator is indicated in the bitstream based on a condition; and wherein the condition comprises at least one of:
the number of reference pictures included in a RPL associated with the current video block, the number of reference pictures included in a RPL associated with a current slice comprising the current video block, the number of reference pictures included in a RPL associated with a current frame comprising the current video block, the number of reference pictures included in a RPS associated with the current video block, the number of reference pictures included in a RPS associated with the current slice, or the number of reference pictures included in a RPS associated with the current frame; or
wherein the condition comprises the number of decoded pictures included on a DPB (indicating the number of reference images in the RPL list, used to generate a prediction unit, Col.10 Lin.55-60, stored in the buffer, 190, DPB, Col.68 Lin.12-18).
Re Claim 5. (Original) Chou, Cho and Ikai disclose, the method of claim 3, wherein determining the one or multiple previously coded frames comprises:
Cho teaches, determining the one or multiple previously coded frames from at least one previously coded frame in a DPB; wherein determining the one or multiple previously coded frames comprises: determining the one or multiple previously coded frames from at least one reference frame in list 0;
wherein determining the one or multiple previously coded frames comprises: determining the one or multiple previously coded frames from at least one reference frame in list 1;
wherein determining the one or multiple previously coded frames comprises:
determining the one or multiple previously coded frames from reference frames in both list 0 and list 1 (the processing unit uses the reference list L0, or the list L1 or both L0 and L1, in prediction, Col.63 Lin.4-9);
wherein determining the one or multiple previously coded frames comprises: determining the one or multiple previously coded frames from a reference frame closest to a current frame comprising the current video block (the ordinary skilled would have found obvious to consider the closest reference image in the L0 or L1, to the current block for the lowest rate-distortion in the prediction process Col.9 Lin.48-67 Eq.[1], or Col.10 Lin.1-12); or
wherein determining the one or multiple previously coded frames comprises: determining the one or multiple previously coded frames from a collocated frame (collocated reconstructed pictures are used, Col.8 Lin.66-67 and Col.9 Lin.1-8, Col.33 Lin.13-15).
Re Claim 6. (Original) Chou, Cho and Ikai disclose, the method of claim5, wherein determining the one or multiple previously coded frames comprises:
Cho teaches, determining the one or multiple previously coded frames from a reference frame with a reference index equal to K in a reference list; and wherein the value of K is predefined; or wherein the value of K is determined based on reference picture information (indicating an index for the reference picture specific to the information in a reference picture list Col.10 Lin.61-63).
Re Claim 7. (Original) Chou, Cho and Ikai disclose, the method of claim 5, wherein determining the one or multiple previously coded frames comprises:
Cho teaches, determining the one or multiple previously coded frames based on decoded information; wherein determining the one or multiple previously coded frames based on decoded information comprises:
determining the one or multiple previously coded frames as the top N most-frequently used reference frames for samples within at least one of:
a current slice comprising the current video block, or
a current frame comprising the current video block,
wherein N is a positive integer; wherein determining the one or multiple previously coded frames based on decoded information comprises: determining the one or multiple previously coded frames as the top N most-frequently used reference frames of each reference picture list for samples within at least one of:
a current slice comprising the current video block, or
a current frame comprising the current video block,
wherein N is a positive integer (as part of the entropy coding most frequently occurring symbols are assigned short code words in the inter or intra prediction process, for predicting the target picture, based on the reference frames of each picture list, Col.1 Lin.41-56); or
wherein determining the one or multiple previously coded frames based on decoded information comprises: determining the one or multiple previously coded frames as frames with top N smallest picture order count (POC) distances or absolute POC distances relative to a current frame comprising the current video block, wherein N is a positive integer (same rationale applied as above, where the reference picture lists are orderly addressed according to a picture order count, i.e., POC, per Col.2 Lin.55-65).
Re Claim 8. (Original) Chou, Cho and Ikai disclose, the method of claim 1, wherein whether the first information is used to filter the current video block depends on decoded information of at least one region of the current video block; or
Cho teaches, wherein the first information comprises at least one of:
reconstruction samples in the one or multiple previously coded frames, or
motion information associated with the one or multiple previously coded frames (the prediction is based on motion information, Col.9 Lin.48-53, or Col.10 Lin.5, or Col.11 Lin.33-35).
Re Claim 9. (Original) Chou, Cho and Ikai disclose, the method of claim 8, wherein whether the first information is used to filter the current video block depends on at least one of:
Cho teaches, a type of a current slice comprising the current video block, or a type of a current frame comprising the current video block; wherein whether the first information is used to filter the current video block depends on an availability of reference frames for the current video block (the neural network filter per Fig.23, Fig.29 or Fig.40 filtering the current block of a slice etc., is based on partition or mode coding information for reference sample filtering, Col.16 Lin.1-28… of a slice type, Lin.53-59);
wherein whether the first information is used to filter the current video block depends on at least one of:
reference picture information, or picture information in a DPB; wherein whether the first information is used to filter the current video block depends on a temporal layer index associated with the current video block (filtering depending on the reference picture of temporal layers Col.8 Lin.63-65);
wherein the first information is used to filter the current video block if the current video block does not comprise a sample coded in a non-inter mode (coding in intra prediction mode i.e., a non-inter mode, Fig.8 Col.12 Lin.47-62); or
wherein whether the first information is used to filter the current video block depends on at least one of:
a distortion between the current video block and a matching block for the current video block, or a distortion between the current video block and a collocated block in a previously coded frame of the video (based on rate-distortion Col.9 Lin.54-67 and Col.10 Lin.1-14).
Re Claim 10. (Original) Chou, Cho and Ikai disclose, the method of claim 9, wherein the first information is used to filter the current video block if at least one of the following is met:
Chou teaches, the type of the current slice indicates an inter-coded slice (current slice type is inter-coded Col.27 Lin.48-50), or
Cho teaches about, the type of the current frame indicates an inter-coded frame; wherein the first information is used to filter the current video block if a smallest POC distance associated with the current video block is not greater than a threshold; wherein the first information is used to filter the current video block if the current video block has a given temporal layer index (the filtering is applied to/based-on the reference slices/frames/blocks according to the inter-predicted sample mode, listed as L0, L1 or L0 and L1, and an index indicated for the temporal layer, as the index for the reference picture specific to the information in a reference picture list Col.10 Lin.61-63 ;
wherein the non-inter mode comprises an intramode; wherein the non-inter mode comprises at least one of a set of coding modes consisting of: an intramode, an intra block copy (IBC) mode, or a Palette mode; wherein the method further comprises: performing motion estimation to determine the matching block from at least one previously coded frame of the video; or
wherein the first information is used to filter the current video block if the distortion is not larger than a threshold (the same filtering process is applied to the intra predicted samples, based on the minimum rate-distortion, Col.22 Lin.7-9, or Lin.54-67 where the minimum vale is obviously compared to a threshold value to be evaluates as “minimum”).
Re Claim 11. (Original) Chou, Cho and Ikai disclose, the method of claim 8,
Cho teaches about, wherein the reconstruction samples comprise at least one of:
samples in at least one reference block for the current video block, or
samples in at least one collocated block for the current video block; or wherein the reconstruction samples comprise samples in a region pointed by a motion vector; and wherein the motion vector is different from a decoded motion vector associated with the current video block (the video block reconstruction is based on motion vector points of an affine prediction mode, different than the block motion derivation for a vector, Col.38 Lin.42-43).
Re Claim 12. (Original) Chou, Cho and Ikai disclose, the method of claim 11,
Cho teaches about, wherein a center of a collocated block of the at least one collocated block is located at the same horizontal and vertical position in a previously coded frame as that of the current video block in a current frame; wherein the at least one reference block is determined by motion estimation;
wherein a reference block of the at least one reference block is determined by reusing at least one motion vector included in the current video block;
wherein at least one block of the at least one reference block and/or the at least one collocated block is the same size as the current video block; or
wherein at least one block of the at least one reference block and/or the at least one collocated block is larger than the current video block (the same process is applied to the filtering process including collocated reconstructed pictures as used, at Col.8 Lin.66-67 and Col.9 Lin.1-8, Col.33 Lin.13-15).
Re Claim 13. (Original) Chou, Cho and Ikai disclose, the method of claim 12,
Chou teaches about, wherein the motion estimation is performed at an integer precision;
wherein the at least one motion vector is rounded to an integer precision (per interpolation at integer precision, Col.30 Lin.18-22);
Cho teaches about, wherein the reference block is located by adding an offset to the position of the current video block, wherein the offset is determined by the at least one motion vector; wherein the at least one motion vector points to a previously coded frame comprising the reference block (applying an offset between the target image and the reference image, Col.10 Lin.64-67);
wherein the at least one motion vector is scaled to a previously coded frame comprising the reference block;
wherein the at least one block with the same size as the current video block is rounded and extended at at least one boundary to include more samples from a previously code frame (applying the boundary filtering method, Col.16 Lin.10-14); or wherein a size of the extended area is indicated in the bitstream or is derived during decoding the current video block from the bitstream (applying scaling Col.11 Lin.50-51) or Col.12 Lin.33-34).
Re Claim 14. (Currently Amended) Chou, Cho and Ikai disclose, the method of claim[[s]] 1, wherein the first information comprises at least one of:
Cho teaches about, two reference blocks for the current video block with one of the two reference blocks from the first reference frame in list 0 and the other one from the first reference frame in list 1 (using List0 and List1, Col.10 Lin.44-51), or
two collocated blocks for the current video block with one of the two collocated blocks from the first reference frame in list 0 and the other one from the first reference frame in list 1 (as the previously reconstructed co-located picture Col.9 Lin1-8);
wherein the current video block is filtered further based on second information different from the first information, and the first and second information is fed to the machine learning model together or separately (current block is filtered on information from first, second etc., information is fed to the input layer of the machine learning model, Col.10 Lin.44-67]); or
wherein filtering the current video block is used for at least one of:
compression, super-resolution, inter prediction, or virtual reference frame generation.
Re Claim 15. (Original) Chou, Cho and Ikai disclose, the method of claim 14,
Cho teaches about, wherein the first and second information is organized to have the same size and concatenated together to be fed to the machine learning model;
wherein features are extracted from the first information through a separate convolutional branch of the machine learning model and the extracted features are combined with the second information or features extracted from the second information (the features are extracted at and concatenated at the last convolution layer to determine the feature metrics at element 72 in Fig.7 and fed to the machine learning in Fig.8 and Fig.9 to be further filtered per Fig.13, or Fig.26 Col.57 Lin.1-17);
wherein the first information comprises at least one reference block and/or at least one collocated block for the current video block in the one or multiple previously coded frames, and the at least one reference block and/or at least one collocated block have a spatial dimension different from the second information (as the previously reconstructed co-located picture reference blocks, Col.8 Lin.63-67 and Col.9 Lin1-8, having different block sizes according to the partition level and geometry, Col.7 Lin.25-67 etc.);
wherein the machine learning model has a separate convolutional branch for extracting, from the at least one reference block and/or at least one collocated block, features with the same spatial dimension as the second information (separate convolution layers are set for each block per Fig.26 Col.57 Lin.3-11 or Fig.31 Col.59 Lin.26-67);
wherein the current video block together with at least one reference block and/or at least one collocated block in the one or multiple previously coded frames are fed to a motion alignment branch of the machine learning model and an output of the motion alignment branch is combined with the second information (combining the convolution layer outputs at summer (+) generating the feature vector in Fig.20 Col.54 Lin.34-63); or
wherein the current video block is super-resolved by using the machine learning model (resolving the video by using hyper-parameters of the first and second CNNs, at convolution layers, pooling layers and ReLU layers. Col.57 Lin.13-17).
Re Claim 16. (Original) Chou, Cho and Ikai disclose, the method of claim 1, wherein usage of the first information by the machine learning model is indicated in the bitstream;
Cho teaches about, wherein usage of the first information by the machine learning model depends on coding information (learning model depends on the coding mode information inter/intra prediction Col.1 Lin.41-56);
wherein the machine learning model comprises a neural network (machine learning is using neural network(s), Summary Col.1 Lin.60-67 and Col.2 Lin.1-16 etc. Figs.26, or 31 etc.);
wherein the conversion includes encoding the target video block into the bitstream (encoder at Figs.18, 19, 22, 26 Col.6 Lin.54-55); or
wherein the conversion includes decoding the target video block from the bitstream (decoder at Col.6 Lin.56-57 and Fig.2 Col.18 Lin.42-62).
Re Claim 17. (Previously Presented) Chou, Cho and Ikai disclose, the method of claim 16, wherein usage of the first information by the machine learning model is indicated in at least one of:
Cho teaches about, sequence parameter set (SPS), picture parameter set (SPS), adaptation parameter set (APS), slice header, picture header, coding tree unit (CTU), or coding unit (CU); wherein the first information is applied to a luma component of the current video block by the machine learning model without be applied to a chroma component; or wherein the first information is applied to both a luma component and a chroma component of the current video block by the machine learning model (Col.62 Lin.23-39).
Re Claim 18. (Currently Amended) This claim represents the apparatus for processing video data comprising a processor and a non-transitory memory (Cho: processor Col.51 Lin.14-24 and memory at Lin.16) implementing each and every limitation of the method claim 1, hence it is rejected on the same evidentiary probe, mutatis mutandis.
Re Claim 19. (Currently Amended) This claim represents the non-transitory computer-readable storage medium storing instructions that cause a processor to perform a method performed by a video processing apparatus (Cho: an autoencoder per Fig.18), implementing along with the processor each and every limitation of the apparatus claim 1, hence it is rejected on the same evidentiary probe, mutatis mutandis.
Re Claim 20. (Currently Amended) This claim represents the non-transitory computer-readable storage medium storing instructions that cause a processor to perform a method performed by a video processing apparatus, (Cho: an autoencoder per Fig.18) implementing along with the processor each and every limitation of the apparatus claim 18, hence it is rejected on the same evidentiary probe, mutatis mutandis.
Conclusion
5. The prior art made of record and not relied upon, is considered pertinent to applicant's disclosure.
Other art considered:
US-11,631,199; US-2021/0099710; US-2024/0031611.
See PTO-892 form. Applicant is required under 37 C.F.R. 1.111(c) to consider these references when responding to this action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DRAMOS KALAPODAS whose telephone number is (571)272-4622. The examiner can normally be reached on Monday-Friday 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Czekaj can be reached on 571-272-7327. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DRAMOS KALAPODAS/Primary Examiner, Art Unit 2487