DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 22 December 2025 has been entered.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-3, 5-8, 10-20 and 22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
On pages 9-10, applicant argues that the art of record does not include a coding parameter input that includes a partitioning scheme for a video unit, a prediction mode of the video unit or a boundary strength parameter for a boundary of the video unit as currently amended because Wang does not teach that the NN filter model is configured to obtain an attention based on a coding parameter input and because Zhu teaches the use of a quantization parameter input and not a coding parameter input including a partitioning scheme for a video unit, a prediction mode of the video unit or a boundary strength parameter for a boundary of the video unit as currently claimed in the amended claims. While applicant’s arguments are understood, examiner respectfully disagrees. Examine relies on a combination of Wang in Zhu in rejecting the newly amended claims.
One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., Inc., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Rather, “the test for obviousness is what the combined teachings of the references would have suggested to [a PHOSITA]." In re Mouttet, 686 F.3d 1322, 1333, 103 USPQ2d 1219, 1226 (Fed. Cir. 2012). At present, the combined teachings of Wang and Zhu reasonably suggest to a person having ordinary skill in the art a coding parameter input including a partitioning scheme for a video unit, a prediction mode of the video unit or a boundary strength parameter for a boundary of the video unit as currently claimed in the amended claims.
Wang first teaches applying a neural network (NN) filter to an unfiltered sample of a video unit to generate a filtered sample. See, Wang, e.g. Fig. 10 and pars. 163 – 167: depicting and describing that the system applies a NN filter to a sample of video data to generate a filtered sample. Wang next teaches that the NN filter obtains coding information of the sample to be filtered. See, Wang, e.g. Fig. 10 and pars. 163 – 168: depicting and describing that the NN filter obtains additional information, wherein the additional information is the equivalent of the coding information. Wang then teaches that this obtained coding information is used as input by the NN filter to modify or adjust filtering of the sample to be filtered. See, Wang, e.g. par. 91: describing that the NN filter uses the additional information as addition input into the NN filter for filtering the decoded sample, wherein the additional information is the equivalent of the coding information. Wang finally teaches that the coding information includes a partitioning scheme for the video unit, a prediction mode of the video unit or a boundary strength parameter for a boundary of the video unit. See, Wang, e.g. par. 167: describing that the additional data includes one or more of partitioning data, prediction data, or boundary strength values, wherein partitioning data reasonably suggests a partitioning scheme for the video unit, wherein prediction data reasonably suggests a prediction mode of the video unit, wherein boundary strength values reasonably suggest the boundary strength parameter for a boundary of the video unit, and wherein the additional data is the equivalent of the coding information. Wang does not explicitly teach wherein the NN filter model is configured to obtain an attention based on the coding parameter input, the attention obtained by concatenating the coding parameter input with an intermediate feature map to provide a concatenated result and feeding the concatenated result into convolutional layers of the NN filter. Zhu, however, teaches this at least at Figs. 9 and 12, and pars. 97 – 103 and 107 – 117. There Zhu teaches that the system extracts coding information from the reconstructed sample and takes it as input into a NN filter, generates a feature map of the reconstructed sample, fusing the feature map with the input coding information, the feeding the fused feature map into the remainder of the convolutional layers of the NN filter, wherein generating a feature map using the nn filter is the equivalent of the intermediate feature map of the NN filter model, wherein fusing the generated feature map with the coding information is the equivalent of concatenating the coding parameter with the intermediate feature map, and wherein the fused feature map is the equivalent of the concatenated result. The mere exemplification of a quantization parameter being the coding parameter does not take away from this broader teaching. See In re Susi, 440 F.2d 442, 169 USPQ 423 (CCPA 1971) (determining that disclosed examples and preferred embodiments do not constitute a teaching away from a broader disclosure). The combined teachings of Wang and Zhu therefore reasonably suggest to a person having ordinary skill in the art a coding parameter input including a partitioning scheme for a video unit, a prediction mode of the video unit or a boundary strength parameter for a boundary of the video unit as currently claimed in the amended claims.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 2, 5 – 8, 10- 12, 16 – 20, and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 2022/0215593) (hereinafter Wang) in view of Zhang et al. (US 2018/0192050) (hereinafter Zhang) in view of Zhu et al. (CN 111711824) (hereinafter Zhu) in view of Zhang et al. (US 2023/0336788) (hereinafter Zhang 2).
Regarding claims 1, 19, and 20, Wang teaches a method implemented by a video coding apparatus, an apparatus for coding video data comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor cause the processor to perform the method, and a non-transitory computer readable medium storing a bitstream of a video that is generated by the method performed by the video processing apparatus, the method comprising:
applying a neural network (NN) filter to an unfiltered sample of a video unit to generate a filtered sample, wherein the NN filter includes an NN filter model generated based on partitioning information of the video unit (Fig. 10 and pars. 163 – 167: depicting and describing that the system applies a neural network filter to a sample of video data to generate a filtered sample, the NN filter determining a NN filter model based on partitioning information of the video data); and
performing a conversion between a video media file and a bitstream based on the filtered sample (e.g., Fig. 6 and 7, and pars. 129 – 133 and 152 – 157: depicting and describing that the system performs a conversion between a video bitstream and reconstructed video based on the filtered sample),
wherein the partitioning information includes sample boundary values, a sample at a block boundary is represented by a first sample boundary value, and a sample not at the block boundary is represented by a second sample boundary value, the first sample boundary value being equal to zero and the second sample boundary value being equal to one (e.g. Fig. 5 and pars. 93 – 94: depicting and describing that a NN filter is generated using partitioning information, the partitioning information including boundary sample values of N and M, where N indicates that a sample is at a boundary and M indicates the sample is not at a boundary [internal sample], the values of N and M being any value as long as M and N are different from each other, the values of N and M exemplified 1 or 0), and
wherein the NN filter model is configured to obtain a coding parameter input, the coding parameter input comprises one or more selected from a group consisting of: a partitioning scheme for the video unit, a prediction mode of the video unit, or a boundary strength parameter for a boundary of the video unit (e.g. Fig. 10, and pars. 91 and 163 – 168: depicting and describing that the selected NN filter model obtains additional data as input to filter a sample, the additional data including one or more of partitioning data, prediction data, and boundary strength values, wherein the additional data is the equivalent of the coding parameter input, wherein the partitioning data reasonably suggests a partitioning scheme for the video unit, wherein the prediction data reasonably suggests a prediction mode of the video unit, and wherein the boundary strength values is the equivalent of the boundary strength parameter for a boundary of the video unit).
Wang does not explicitly teach:
wherein the NN filter model is further based on a temporal layer identifier of the video unit,
wherein the NN filter model is configured to obtain an attention based on a coding parameter input, wherein an intermediate feature map of the NN filter model is to be recalibrated by the attention, and wherein the attention is obtained by concatenating the coding parameter input with the intermediate feature map to provide a concatenated result and feeding the concatenated result into convolutional layers of the NN filter, and
Wherein an indication of the NN filter model is binarized based on a slice type of the video unit.
Zhang, however, teaches a method, apparatus, and non-transitory computer readable medium:
wherein the NN filter model is further based on a temporal layer identifier of the video unit (e.g., par. 146: describing that NN filter model is determined based on the temporal layer index).
Zhu, however, teaches a method, apparatus, and non-transitory computer readable medium:
wherein the NN filter model is configured to obtain an attention based on a coding parameter input, wherein an intermediate feature map of the NN filter model is to be recalibrated by the attention, and wherein the attention is obtained by concatenating the coding parameter input with the intermediate feature map to provide a concatenated result and feeding the concatenated result into convolutional layers of the NN filter (e.g., Figs. 9 and 12, and pars. 97-103 and 107 – 117 [Machine Translation]: depicting and describing that the system extracts quantization parameter [QP] information from the reconstructed frame, generates a feature map of the reconstructed frame using the neural network, fusing the feature map of the reconstructed frame with the QP information, then feeding the fused feature map into the remainder of the convolutional layers of the neural network filter, wherein the QP information is the equivalent of the coding parameter input, wherein generating a feature map using the neural network is the equivalent of the intermediate feature map of the NN filter model, wherein fusing the generated feature map with the QP information is the equivalent of concatenating the coding parameter input with the intermediate feature map [see, e.g. par. 80: describing that fusing is achieved by concatenation], and wherein the fused feature map is the equivalent of the concatenated result).
Zhang 2, however, teaches a method:
Wherein an indication of the NN filter model is binarized based on a slice type of the video unit (e.g., pars. 171-172: describing that the filter index is binarized based on the slice, wherein the filter index is the equivalent of the indication of the NN filter model).
It therefore would have been obvious to one of ordinary skill in the art to modify the teachings of Wang by adding the teachings of Zhang in order for the NN filter model is further based on a temporal layer identifier of the video unit, by adding the teachings of Zhu in order for the NN filter model is configured to obtain an attention based on a coding parameter input, wherein an intermediate feature map of the NN filter model is to be recalibrated by the attention, and wherein the attention is obtained by concatenating the coding parameter input with the intermediate feature map to provide a concatenated result and feeding the concatenated result into convolutional layers of the NN filter, and by adding the teachings of Zhang 2 in order for an indication of the NN filter model is binarized based on a slice type of the video unit. One of ordinary skill in the art would have been motivated to make such a modification because the modification provides techniques for signaling filter parameter information without being dependent on a picture in a temporal layer with a higher temporal identifier than the temporal layer of the current picture (Zhang, e.g., par. 30: describing a desire to provide techniques for filter information without being dependent on a picture in a temporal layer with a higher temporal identifier than the temporal layer of the current picture), because the modification allows neural network filtering models to consider quantization loss of different degrees to improve the filter quality of the model (Zhu, e.g. par. 32 [Machine translation]: describing a desire to consider quantization loss of different degrees to improve the filter quality of the model), and because the modification improves coding efficiency.
Turning to claim 2, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claim 1, as discussed above. Wang further teaches:
wherein the NN filter model is configured to obtain an attention based on the partitioning information (E.g., par. 95: describing that the system determines input planes for the NN filter based on the partitioning information, wherein determining input planes for the NN filter based on the partitioning information is the equivalent of determining an attention based on the partitioning information).
Regarding claim 5, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claim 1, as discussed above. Wang further teaches:
wherein the partitioning information comprises pixel values for each of one or more coding units of the video unit, wherein a value of a pixel in a coding unit is based on a proximity of the pixel to a boundary of the coding unit (e.g., Fig. 5 and pars. 93 – 94: depicting and describing that the partitioning information includes pixel values of each coding unit of the video unit, the value of a pixel in a coding unit based on whether the pixel is a boundary pixel or a non-boundary pixel).
Turning to claim 6, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claim 1, as discussed above. Wang further teaches:
wherein the partitioning information comprises a MxN array of pixel values, wherein M is a width of the video unit to be coded, and wherein N is a height of the video unit to be coded (e.g., Fig. 5 and pars. 93 – 94: depicting and describing that the partitioning information includes an array of pixel values, the array of pixel values including a length and a height of the video unit to be coded).
Regarding claim 7, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claim 1, as discussed above. Wang further teaches:
wherein the partitioning information comprises a MxN array of pixel values, wherein M is a number of columns of the video unit to be coded, and wherein N is a number of rows of the video unit to be coded (e.g., Fig. 5, and pars. 93 – 94: depicting and describing that the partitioning information includes an array of pixels values, the array of pixel values including a number of columns and a number of rows of the video data to be coded).
Turning to claim 8, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claim 1, as discussed above. Wang further teaches:
wherein the unfiltered sample comprises a luma component and a chroma component, and wherein the NN filter model is generated based on the luma component and the chroma component (e.g., par. 95: describing that the video data includes chroma and luma components, and that the NN filter model is determined based on luma and chroma component partitioning information).
Regarding claim 10, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claim 1, as discussed above. Wang further teaches:
wherein the partitioning information includes luma component values, color component values, or combinations thereof (e.g. pars. 93 – 96: depicting and describing that partitioning information includes partitioning values for chroma components and luma component, wherein partitioning values for chroma components is the equivalent of color component values).
Turning to claim 11, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claim 1, as discussed above. Wang further teaches:
wherein the NN filter model is generated based on: a quantization parameter (QP) of the video unit, a boundary strength of the video unit, a motion vector of the video unit, a prediction mode of the video unit, an intra-prediction mode of the video unit, a scaling factor of the video unit, or combinations thereof (e.g. par. 89: describing that the system determines the NN filter model based on one or more of quantization parameter of the video unit, boundary strength of the video unit, motion information of the video unit, inter prediction mode information, intra prediction mode information, and distance information between the current picture and reference picture; par. 92: describing that the NN filter model also includes scaling information).
Regarding claim 12, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claims 1 and 11, as discussed above. Wang further teaches:
wherein the scaling factor of the video unit comprises a factor that scales a difference between a reconstruction frame and an output of the NN filter model (e.g., par. 92: describing that the scaling information scales a differences between the reconstructed frame and the output of the NN filter model).
Turning to claim 16, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claim 1, as discussed above. Wang further teaches:
wherein the NN filter is implemented in an adaptive loop filter, a deblocking filter, a sample adaptive offset filter, or combinations thereof (e.g., Fig. 2, element 148, and pars. 70 – 74 and 82: depicting and describing that the NN filter is implemented in an adaptive loop filter, a deblocking filter, and/or a sample adaptive offset filter).
Regarding claim 17, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claim 1, as discussed above. Wang further teaches:
wherein the conversion comprises generating the bitstream according to the video media file (e.g., Fig. 6: depicting that the conversion includes generating a bitstream according to input video data).
Turning claim 18, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claim 1, as discussed above. Wang further teaches:
wherein the conversion comprises parsing the bitstream to obtain the video media file (e.g., Fig. 7: depicting that the conversion includes parsing a bitstream to obtain the video data).
Regarding claim 22, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claim 1, as discussed above. Wang does not explicitly teach:
wherein the indication of the NN filter model is binarized based on the temporal layer identifier of the video unit.
Zhang, however, teaches a method:
wherein the indication of the NN filter model is binarized based on the temporal layer identifier of the video unit (e.g., par. 146: describing that the system binarizes the filter index based on the temporal layer index, wherein the filter index is the equivalent of the indication of the NN filter model, and wherein the temporal layer index is the equivalent of the temporal layer identifier).
It therefore would have been obvious to one of ordinary skill in the art to modify the teachings of Wang by adding the teachings of Zhang in order for the indication of the NN filter model is binarized based on the temporal layer identifier of the video unit. One of ordinary skill in the art would have been motivated to make such a modification because the modification provides techniques for signaling filter parameter information without being dependent on a picture in a temporal layer with a higher temporal identifier than the temporal layer of the current picture (Zhang, e.g. par. 30: describing a desire to provide techniques for filter information without being dependent on a picture in a temporal layer with a higher temporal identifier than the temporal layer of the current picture).
Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 2022/0215593) (hereinafter Wang) in view of Zhang et al. (US 2018/0192050) (hereinafter Zhang) in view of Zhu et al. (CN 111711824) (hereinafter Zhu) in view of Zhang et al. (US 2023/0336788) (hereinafter Zhang 2) as applied to claim 1 above, and further in view of Kisilev et al. (US 2011/0091127) (hereinafter Kisilev).
Regarding claim 3, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claim 1, as discussed above. Wang does not explicitly teach:
wherein the partitioning information comprises a mean pixel value for each of one or more coding units of the video unit.
Kisilev, however, teaches a method implemented by a video coding apparatus:
wherein the partitioning information comprises a mean pixel value for each of one or more coding units of the video unit (e.g., par. 49: describing that the partition information includes an average grayscale value of each block).
It therefore would have been obvious to one of ordinary skill in the art to modify the teachings of Wang by adding the teachings of Kisilev in order for the partitioning information includes a mean pixel value for each one of one or more coding units of the video unit. One of ordinary skill in the art would have been motivated to make such a modification because the modification provides efficient denoising, sharpening, contrast enhancement, deblurring, and other spatial and temporal processing of a stream of video frames (Kisilev, e.g. par. 4: describing the desire to provide efficient video processing methods and systems for computationally efficient denoising, sharpening, contrast enhancement, deblurring, and other spatial and temporal processing of a stream of video frames).
Claim(s) 13 - 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 2022/0215593) (hereinafter Wang) in view of Zhang et al. (US 2018/0192050) (hereinafter Zhang) in view of Zhu et al. (CN 111711824) (hereinafter Zhu) in view of Zhang et al. (US 2023/0336788) (hereinafter Zhang 2) as applied to claim 1 above, and further in view of Wang et al. (US 2022/0103864) (hereinafter Wang 2).
Regarding claim 13, Wang, Zhang, Zhu, and Zhang teach all of the limitations of claim 1, as discussed above. Wang does not explicitly teach:
deriving a NN filter model granularity that specifies a size of a video unit to which the NN filter model can be applied.
Wang 2, however, teaches a method implemented by a video coding apparatus:
deriving a NN filter model granularity that specifies a size of a video unit to which the NN filter model can be applied (e.g., par. 86: describing that the system specifies a size of a video unit to which the NN filter model can be applied).
It therefore would have been obvious to one of ordinary skill in the art to modify the teachings of Wang by adding the teachings of Wang 2 in order to derive a NN filter model granularity that specifies a size of a video unit to which the NN filter model can be applied. One of ordinary skill in the art would have been motivated to make such a modification because the modification improves coding efficiency.
Turning to claim 14, Wang, Zhang, Zhu, and Zhang teach all of the limitations of claim 1 as discussed above. Wang does not explicitly teach:
signaling a NN filter model granularity in the bitstream within a sequence header, a picture header, a slice header, a sequence parameter set (SPS), a picture parameter set (PPS), an adaptation parameter set (APS), or combinations thereof, wherein the NN filter model granularity specifies a size of a video unit to which the NN filter model can be applied.
Wang 2, however, teaches a method implemented by a video coding apparatus:
signaling a NN filter model granularity in the bitstream within a sequence header, a picture header, a slice header, a sequence parameter set (SPS), a picture parameter set (PPS), an adaptation parameter set (APS), or combinations thereof, wherein the NN filter model granularity specifies a size of a video unit to which the NN filter model can be applied (e.g., pars. 86 and 88: describing that the system signals the NN filter model granularity in the bitstream in at least one of a sequence parameter set, a picture level header, a slice level header, a picture parameter set, and an adaptation parameter set, the granularity specifying a size of a video unit to which the NN filter model can be applied).
It therefore would have been obvious to one of ordinary skill in the art to modify the teachings of Wang by adding the teachings of Wang 2 in order to signal a NN filter model granularity in the bitstream within a sequence header, a picture header, a slice header, a sequence parameter set (SPS), a picture parameter set (PPS), an adaptation parameter set (APS), or combinations thereof, wherein the NN filter model granularity specifies a size of a video unit to which the NN filter model can be applied. One of ordinary skill in the art would have been motivated to make such a modification because the modification improves coding efficiency.
Regarding claim 15, Wang, Zhang, Zhu, and Zhang 2 teach all of the limitations of claim 1, as discussed above. Wang does not explicitly teach:
wherein the indication of the NN filter model is binarized based on a slice type of the video unit, a color component of the video unit, a temporal layer identifier of the video unit, or combinations thereof.
Wang 2, however, teaches a method implemented by a coding apparatus:
wherein the indication of the NN filter model is binarized based on the slice type of the video unit, a color component of the video unit, the temporal layer identifier of the video unit, or combinations thereof (e.g., par. 101: describing that an indication of the NN filter model is based on a color component of the video unit)
It therefore would have been obvious to one of ordinary skill in the art to modify the teachings of Wang by adding the teachings of Wang 2 in order for an indication of the NN filter model is binarized based on a slice type of the video unit, a color component of the video unit, a temporal layer identifier of the video unit, or combinations thereof. One of ordinary skill in the art would have been motivated to make such a modification because the modification improves coding efficiency.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHANIKA M BRUMFIELD whose telephone number is (571)270-3700. The examiner can normally be reached M-F 8:30 - 5 PM AWS.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Czekaj can be reached at 571-272-7327. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
SHANIKA M. BRUMFIELD
Examiner
Art Unit 2487
/SHANIKA M BRUMFIELD/Examiner, Art Unit 2487
/Dave Czekaj/Supervisory Patent Examiner, Art Unit 2487