DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Acknowledgment of Amendments
Applicant’s amendments filed 10/16/2025 overcomes the following objection(s)/rejection(s):
The objection to the specification has been withdrawn in view of Applicant’s amendment.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, and 18-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 7-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding newly amended claim 7, which recites the limitation, “… wherein the second distortion first distortion is determined with based on one of: …” it is unclear what is to be considered as a second distortion first distortion.
Regarding newly amended claim 9, which recites the limitation, “… wherein the second distortion cost of the target video block based on the third distortion first distortion comprises: determining the third distortion first distortion as the second distortion cost …” It is unclear what is to be considered as a third distortion first distortion.
Claim 8 and 10 are rejected based upon claim dependency.
Claim 9 recites the limitation “… the third distortion first distortion” in claim 9 lines 1-3.
There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 20 is/are rejected under 35 U.S.C. 102(a1) as being anticipated by Galpin et al., (U.S. Pub. No. 2020/0244997 A1).
Regarding claim 20, the recitation of “a non-transitory computer-readable recording medium storing a bitstream of a video which is generated by a method …” is a product by process claim limitation where the product is the bitstream and the process is the video processing. MPEP 2113 recites “Product-by-Process claims are not limited to the manipulations of the recited steps, only the structure implied by the steps”. Thus, the scope of the claim is the storage medium storing the bitstream (with the structure implied by the method performed by a video processing apparatus). The structure includes the data in compressed form manipulated by the steps. “To be given patentable weight, the printed matter and associated product must be in a functional relationship. A functional relationship can be found where the printed matter performs some function with respect to the product to which it is associated.” MPEP 2111.05 (I)(A). When a claimed “non-transitory computer readable recording medium merely serves as a support for information or data, no functional relationship exists. MEPEP 2111.05(III). The non-transitory computer-readable recording medium storing a bitstream as recited in claim 20 merely serves as support for the storage of the bitstream and provides no functional relationship between the stored bitstream and storage medium. Therefore, the bitstream, which scope is implied by the method steps, is non-functional descriptive material and given no patentable weight. MPEP 2111.05 (III). Thus, the scope of claim 20 is just a storage medium storing data and is anticipated by Galpin et al., (U.S. Pub. No. 2020/0244997 A1) para [0015], “… a computer readable storage medium having stored thereon a bitstream generated according to the methods described above.”
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-3, 5-7, 11, 14-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Joshi et al., (U.S. Pub. No. 2020/018808 A1) and in view of Karczewicz et al., (U.S. Pub. No. 2022/0103816 A1).
As per claim 1, Joshi teaches a method for video processing, comprising: determining, during a conversion between a target video block of a video and a bitstream of the video, a target coding tool for the target video block by using a machine learning model (fig. 10 el. 1002-1028 “at 1028, the process 1000 selects, based on the respective encoding cost of the at least some encoding modes, a best mode for encoding the block”, [0049], [0056-0057]); and performing the conversion by using the target coding tool (fig. 10 el. 1030 and [0163], “the process 1000, encodes in a compressed bitstream, the block using the best mode”). Joshi does not explicitly disclose wherein determining the target coding tool by using the machine learning model comprises: determining, based on reconstruction samples of the target video block, first filtered reconstruction samples of the target video block by using the machine learning model; determining a first distortion between the first filtered reconstruction samples and original samples of the target video block; and determining the target coding tool based on the first distortion, as recited in claim 1.
However, Karczewicz teaches wherein determining the target coding tool by using the machine learning model comprises: determining, based on reconstruction samples of the target video block, first filtered reconstruction samples of the target video block by using the machine learning model ([0182-0183], “.. filter unit 216 may be configured to apply at least one of a neural network-based filter, a neural network-based loop filter, a neural network-based post loop filter, an adaptive in-loop filter, or a pre-defined adaptive in-loop filter to a decoded block of video data to form one or more filtered decoded blocks”); determining a first distortion between the first filtered reconstruction samples and original samples of the target video block ([0119], [0183], [0223-0224], “… video encoder 200 may apply each of the scaling factors to the filtered decoded block and compare the resulting refined filtered decoded block to an original, uncoded block to calculate an RDO value.”); and determining the target coding tool based on the first distortion ([0169-0170], [0183-0184] “… Mode selection unit 202 may ultimately select the combination of encoding parameters having rate-distortion values that are better than the other tested combinations”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Karczewicz with Joshi for the benefit of providing improved filtering results.
As per claim 2, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein the machine learning model is used for neural network (NN) filtering during the determination of the target coding tool, wherein the machine learning model is obtained by an encoder used during the conversion, wherein determining the target coding tool comprises: applying the machine learning model in a rate-distortion optimization (RDO) process on the target video block to obtain the coding tool ([0036], [0044], [0046], [0048], [0056-0059], [0118], [0122] and figs. 8-10), wherein the machine learning model is not obtained by a decoder used during the conversion, or wherein the machine learning model comprises at least one of: a neural network (NN) model ([0049], “in an example, the machine-learning model can be a neural-network model”), a convolution neural network (CNN) model ([0049], “…which can be a convolution neural-network (CNN) model”), or a non-NN based model (abstract, [0004-0005], [0026]).
As per claim 3, Joshi does not explicitly disclose wherein a further model different from the machine learning model is obtained by an encoder used during the conversation, wherein the machine learning model is combined with the further model, wherein the further model comprises at least one of: a convolution neural network (CNN) model, a deblocking filter, a sample adaptive offset (SAO) filter, an adaptive loop filter (ALF), a cross-component SAO (CCSAO) filter, or a cross-component ALF (CCALF).
However Karczewicz teaches wherein a further model different from the machine learning model is obtained by an encoder during the conversion (fig. 2; [0023], [0085], [0087]), wherein the further model comprises at least one of: a convolutional neural network (CNN) model, a deblocking filter, a sample adaptive offset (SAO) filter, an adaptive loop filter (ALF), a cross-component SAO (CCSAO) filter, or a cross-component ALF (CCALF) ([0023],[0085], [0087], [0182]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Karczewicz with Joshi for the benefit of providing improved filtering results.
As per claim 5, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein determining the target coding tool comprises: determining filtered reconstruction information of the target video block by using the machine learning model (fig. 4 el. 414, 416) and determining the target coding tool based on the filtered reconstructed information (fig. 4).
As per claim 6, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein determining the target coding tool by using the machine learning model comprises at least one of: determining a target intra mode by using the machine learning model ([0108], [0155]); determining a target coded intra tool by using the machine learning model; determining a target inter mode by using the machine learning model ([0155]); determining a target coded inter tool by using the machine learning model; determining a target partitioning model by using the machine learning model ([0027], ; determining a target transform core by using the machine learning model; or determining a target coded tool by using the machine learning model, wherein determining the target partitioning mode comprises: determining the target partitioning mode from a quad-tree (QT) partitioning mode, a binary-tree (BT) partitioning mode, a ternary-tree (TT) partitioning mode, or a non-split mode.
As per claim 7, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein a cost of the target video block is determined based on the first distortion, and/or wherein determining the target coding tool by using the machine learning model comprises: determining a second distortion of the target video block based at least in part on the machine learning model; and determining the target coding tool based on the second distortion, wherein the second distortion comprises a cost of the target video block ([0134-0137]) or wherein the second distortion first distortion is determined with based on one of: a sum of square error (SSE) matrix ([0136]]), a mean square error (MSE) matrix ([0138]), a structural similarity (SSIM) matrix, a multi-scale structural similarity (MS-SSIM) matrix, or an information content weighted SSIM (IW-SSIM) matrix.
As per claim 11, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein a filtering process is applied to reconstruction samples of the target video block by using the machine learning model during the determination of the target coding tool (fig. 4, fig. 9), wherein the filtering process is different from an in-loop filtering process or a post-processing process applied to the target video block, wherein the machine learning model used in the filtering process is different from a further filtering model used in the in-loop filtering process or the post-processing process, wherein a first network structure of the machine learning model is different from a second network structure of the further filtering model, wherein the filtering process is applied to a sub-region of the target block, wherein the sub-region of the target video block comprises at least one of: boundary samples of the target video block, or inner samples of the target video block, or wherein the filtering process is applied to down-sampled version of the target video.
As per claim 14, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches determining second information regarding the machine learning model based on coding information of the target video block ([0059], [0152]; parameters of the ML model are generated such that, for at least some of the training data 1012, the ML model can infer, for a training datum, the corresponding encoding coding cost”) , or wherein the second information comprises at least one of where to use the machine learning model in the determination of the target coding tool, or how to use the machine learning model in the determination of the target coding tool.
As per claim 15, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 14. In addition, Joshi teaches wherein the coding information comprises at least one of: a coding mode of the target video block (abstract, [0049], [0055-0056], [0120] and figs. 8-10); or wherein the coding information comprises at least one of a prediction mode of the target video block ([0026], [0034],[0107-0108], [0155]), a quantization parameter (QP) of the target video block (fig. el. 910), a temporal layer of target video block, or a slice type of the target video block, a block size of the target video block ([0073], [0097], [0135] and fig. 7), a color component of the target video block ([0042], [0087]), or rate distortion cost of the target video block without using the machine learning model (fig. 8).
As per claim 16, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein the conversion included encoding the target video block into the bitstream (fig. 4, 8-10).
As per claim 17, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein the conversion includes decoding the target video block from the bitstream (fig. 5).
As per claim 18, which is the corresponding apparatus for processing video data comprising a processor and non-transitory memory with the limitations of the method as recited in claim 1, thus the rejection and analysis made for claim 1 also applies here.
As per claim 19, which is the corresponding non-transitory computer-readable storage medium storing instructions that cause a processor to perform a method with the limitations of the method as recited in claim 1, thus the rejection and analysis made for claim 1 also applies here.
As per claim 20, which is the corresponding non-transitory computer readable medium with the limitations of the method as recited in claim 1, thus the rejection and analysis made for claim 1 also applies here.
Claim(s) 3, 12-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Joshi et al., (U.S. Pub. No. 2020/0186808 A1) in view of Karczewicz et al., (U.S. Pub. No. 2022/0103816 A1) and further in view of Galpin et al., (U.S. Pub. No. 2020/0244997 A1).
As per claim 12, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. Joshi does not explicitly disclose wherein the machine learning model is the same as a further a machine learning model obtained by a decoder used during the conversion, wherein a first number of residual blocks of the machine learning model is the same as a second number of residual blocks of the further machine learning model.
However, Galpin teaches wherein the machine learning model is the same as a further machine learning model obtained by a decoder used during the conversion ([0009], [0058], “symmetrically, the decoder as shown in FIG. 6C receives the bitstream, reconstructs the images and restores the images using the same CNN”) wherein a first number of residual blocks of the machine learning model is same as a second number of residual blocks of the further machine learning model ([0009], [0058] and fig. 6C).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Galpin with Joshi (modified by Karczewicz) in order to improve image quality and improve coding efficiency.
As per claim 13, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. Joshi does not explicitly disclose wherein the machine learning model is different from a further machine learning model obtained by a decoder used during the conversion wherein the machine learning model is simpler than the further machine learning model, or wherein a first depth of the machine learning model is different from a second depth of the machine learning model, or wherein the first depth is shallower than the second depth, or wherein a first feature map of the machine model is different that a second feature map of the of the further machine learning model, or wherein a first number of feature maps of the machine learning model is less than a second number of feature maps of the further machine learning model, or wherein a first number of residual blocks of the machine learning model is different from a second number of residual blocks of the further machine learning model, or wherein the first number of residual blocks of the machine learning model is less than the second number of residual blocks of the further machine learning model, or wherein a first convolution kernel of the machine learning model is different from a second convolutional kernel of the further machine learning model.
However, Galpin teaches wherein the machine learning model is different from a further machine learning model obtained by a decoder used during the conversion (figs. 7A-7C and [0075]), wherein the machine learning model is simpler than the further machine learning model ([0072], [0075], [0078], [0080]; “The best branch (761, 762), for example, according to a rate-distortion (RD) metric (760, 765), is selected (770) and the branch index i is encoded (725) in the bitstream per CU. It should be noted that the selector (770) in the encoder may not be the same as the selector (740) used during training. During training, we select the best branch based only on the MSE, while during encoding, we may use a RD cost (e.g., MSE+coding cost of the index)”), or wherein a first depth of the machine learning model is different from a second depth of the machine learning model, or wherein the first depth is shallower than the second depth, or wherein a first feature map of the machine model is different that a second feature map of the of the further machine learning model, or wherein a first number of feature maps of the machine learning model is less than a second number of feature maps of the further machine learning model, or wherein a first number of residual blocks of the machine learning model is different from a second number of residual blocks of the further machine learning model, or wherein the first number of residual blocks of the machine learning model is less than the second number of residual blocks of the further machine learning model, or wherein a first convolution kernel of the machine learning model is different from a second convolutional kernel of the further machine learning model.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Galpin with Joshi (modified by Karczewicz) in order to improve image quality and improve coding efficiency.
Allowable Subject Matter
Claims 4, 8-10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Ceolho et al., (U.S. Pub. No. 2021/0051322 A1) “Receptive-Field-Conforming Convolutional Models for Video Coding”
Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA PRINCE whose telephone number is (571)270-1821. The examiner can normally be reached M-F 7:30-3:30 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jamie Atala can be reached at 571-272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
JESSICA PRINCE
Examiner
Art Unit 2486
/JESSICA M PRINCE/ Primary Examiner, Art Unit 2486