Prosecution Insights
Last updated: April 19, 2026
Application No. 18/622,793

DISTORTION-BASED CODING TOOL DETERMINATION

Non-Final OA §102§103§112
Filed
Mar 29, 2024
Examiner
PRINCE, JESSICA MARIE
Art Unit
2486
Tech Center
2400 — Computer Networks
Assignee
Bytedance Inc.
OA Round
2 (Non-Final)
76%
Grant Probability
Favorable
2-3
OA Rounds
3y 1m
To Grant
93%
With Interview

Examiner Intelligence

Grants 76% — above average
76%
Career Allow Rate
535 granted / 700 resolved
+18.4% vs TC avg
Strong +16% interview lift
Without
With
+16.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
37 currently pending
Career history
737
Total Applications
across all art units

Statute-Specific Performance

§101
6.0%
-34.0% vs TC avg
§103
45.8%
+5.8% vs TC avg
§102
14.5%
-25.5% vs TC avg
§112
17.5%
-22.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 700 resolved cases

Office Action

§102 §103 §112
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Acknowledgment of Amendments Applicant’s amendments filed 10/16/2025 overcomes the following objection(s)/rejection(s): The objection to the specification has been withdrawn in view of Applicant’s amendment. Response to Arguments Applicant’s arguments with respect to claim(s) 1, and 18-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 7-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Regarding newly amended claim 7, which recites the limitation, “… wherein the second distortion first distortion is determined with based on one of: …” it is unclear what is to be considered as a second distortion first distortion. Regarding newly amended claim 9, which recites the limitation, “… wherein the second distortion cost of the target video block based on the third distortion first distortion comprises: determining the third distortion first distortion as the second distortion cost …” It is unclear what is to be considered as a third distortion first distortion. Claim 8 and 10 are rejected based upon claim dependency. Claim 9 recites the limitation “… the third distortion first distortion” in claim 9 lines 1-3. There is insufficient antecedent basis for this limitation in the claim. Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. Claim(s) 20 is/are rejected under 35 U.S.C. 102(a1) as being anticipated by Galpin et al., (U.S. Pub. No. 2020/0244997 A1). Regarding claim 20, the recitation of “a non-transitory computer-readable recording medium storing a bitstream of a video which is generated by a method …” is a product by process claim limitation where the product is the bitstream and the process is the video processing. MPEP 2113 recites “Product-by-Process claims are not limited to the manipulations of the recited steps, only the structure implied by the steps”. Thus, the scope of the claim is the storage medium storing the bitstream (with the structure implied by the method performed by a video processing apparatus). The structure includes the data in compressed form manipulated by the steps. “To be given patentable weight, the printed matter and associated product must be in a functional relationship. A functional relationship can be found where the printed matter performs some function with respect to the product to which it is associated.” MPEP 2111.05 (I)(A). When a claimed “non-transitory computer readable recording medium merely serves as a support for information or data, no functional relationship exists. MEPEP 2111.05(III). The non-transitory computer-readable recording medium storing a bitstream as recited in claim 20 merely serves as support for the storage of the bitstream and provides no functional relationship between the stored bitstream and storage medium. Therefore, the bitstream, which scope is implied by the method steps, is non-functional descriptive material and given no patentable weight. MPEP 2111.05 (III). Thus, the scope of claim 20 is just a storage medium storing data and is anticipated by Galpin et al., (U.S. Pub. No. 2020/0244997 A1) para [0015], “… a computer readable storage medium having stored thereon a bitstream generated according to the methods described above.” Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claim(s) 1-3, 5-7, 11, 14-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Joshi et al., (U.S. Pub. No. 2020/018808 A1) and in view of Karczewicz et al., (U.S. Pub. No. 2022/0103816 A1). As per claim 1, Joshi teaches a method for video processing, comprising: determining, during a conversion between a target video block of a video and a bitstream of the video, a target coding tool for the target video block by using a machine learning model (fig. 10 el. 1002-1028 “at 1028, the process 1000 selects, based on the respective encoding cost of the at least some encoding modes, a best mode for encoding the block”, [0049], [0056-0057]); and performing the conversion by using the target coding tool (fig. 10 el. 1030 and [0163], “the process 1000, encodes in a compressed bitstream, the block using the best mode”). Joshi does not explicitly disclose wherein determining the target coding tool by using the machine learning model comprises: determining, based on reconstruction samples of the target video block, first filtered reconstruction samples of the target video block by using the machine learning model; determining a first distortion between the first filtered reconstruction samples and original samples of the target video block; and determining the target coding tool based on the first distortion, as recited in claim 1. However, Karczewicz teaches wherein determining the target coding tool by using the machine learning model comprises: determining, based on reconstruction samples of the target video block, first filtered reconstruction samples of the target video block by using the machine learning model ([0182-0183], “.. filter unit 216 may be configured to apply at least one of a neural network-based filter, a neural network-based loop filter, a neural network-based post loop filter, an adaptive in-loop filter, or a pre-defined adaptive in-loop filter to a decoded block of video data to form one or more filtered decoded blocks”); determining a first distortion between the first filtered reconstruction samples and original samples of the target video block ([0119], [0183], [0223-0224], “… video encoder 200 may apply each of the scaling factors to the filtered decoded block and compare the resulting refined filtered decoded block to an original, uncoded block to calculate an RDO value.”); and determining the target coding tool based on the first distortion ([0169-0170], [0183-0184] “… Mode selection unit 202 may ultimately select the combination of encoding parameters having rate-distortion values that are better than the other tested combinations”). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Karczewicz with Joshi for the benefit of providing improved filtering results. As per claim 2, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein the machine learning model is used for neural network (NN) filtering during the determination of the target coding tool, wherein the machine learning model is obtained by an encoder used during the conversion, wherein determining the target coding tool comprises: applying the machine learning model in a rate-distortion optimization (RDO) process on the target video block to obtain the coding tool ([0036], [0044], [0046], [0048], [0056-0059], [0118], [0122] and figs. 8-10), wherein the machine learning model is not obtained by a decoder used during the conversion, or wherein the machine learning model comprises at least one of: a neural network (NN) model ([0049], “in an example, the machine-learning model can be a neural-network model”), a convolution neural network (CNN) model ([0049], “…which can be a convolution neural-network (CNN) model”), or a non-NN based model (abstract, [0004-0005], [0026]). As per claim 3, Joshi does not explicitly disclose wherein a further model different from the machine learning model is obtained by an encoder used during the conversation, wherein the machine learning model is combined with the further model, wherein the further model comprises at least one of: a convolution neural network (CNN) model, a deblocking filter, a sample adaptive offset (SAO) filter, an adaptive loop filter (ALF), a cross-component SAO (CCSAO) filter, or a cross-component ALF (CCALF). However Karczewicz teaches wherein a further model different from the machine learning model is obtained by an encoder during the conversion (fig. 2; [0023], [0085], [0087]), wherein the further model comprises at least one of: a convolutional neural network (CNN) model, a deblocking filter, a sample adaptive offset (SAO) filter, an adaptive loop filter (ALF), a cross-component SAO (CCSAO) filter, or a cross-component ALF (CCALF) ([0023],[0085], [0087], [0182]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Karczewicz with Joshi for the benefit of providing improved filtering results. As per claim 5, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein determining the target coding tool comprises: determining filtered reconstruction information of the target video block by using the machine learning model (fig. 4 el. 414, 416) and determining the target coding tool based on the filtered reconstructed information (fig. 4). As per claim 6, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein determining the target coding tool by using the machine learning model comprises at least one of: determining a target intra mode by using the machine learning model ([0108], [0155]); determining a target coded intra tool by using the machine learning model; determining a target inter mode by using the machine learning model ([0155]); determining a target coded inter tool by using the machine learning model; determining a target partitioning model by using the machine learning model ([0027], ; determining a target transform core by using the machine learning model; or determining a target coded tool by using the machine learning model, wherein determining the target partitioning mode comprises: determining the target partitioning mode from a quad-tree (QT) partitioning mode, a binary-tree (BT) partitioning mode, a ternary-tree (TT) partitioning mode, or a non-split mode. As per claim 7, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein a cost of the target video block is determined based on the first distortion, and/or wherein determining the target coding tool by using the machine learning model comprises: determining a second distortion of the target video block based at least in part on the machine learning model; and determining the target coding tool based on the second distortion, wherein the second distortion comprises a cost of the target video block ([0134-0137]) or wherein the second distortion first distortion is determined with based on one of: a sum of square error (SSE) matrix ([0136]]), a mean square error (MSE) matrix ([0138]), a structural similarity (SSIM) matrix, a multi-scale structural similarity (MS-SSIM) matrix, or an information content weighted SSIM (IW-SSIM) matrix. As per claim 11, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein a filtering process is applied to reconstruction samples of the target video block by using the machine learning model during the determination of the target coding tool (fig. 4, fig. 9), wherein the filtering process is different from an in-loop filtering process or a post-processing process applied to the target video block, wherein the machine learning model used in the filtering process is different from a further filtering model used in the in-loop filtering process or the post-processing process, wherein a first network structure of the machine learning model is different from a second network structure of the further filtering model, wherein the filtering process is applied to a sub-region of the target block, wherein the sub-region of the target video block comprises at least one of: boundary samples of the target video block, or inner samples of the target video block, or wherein the filtering process is applied to down-sampled version of the target video. As per claim 14, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches determining second information regarding the machine learning model based on coding information of the target video block ([0059], [0152]; parameters of the ML model are generated such that, for at least some of the training data 1012, the ML model can infer, for a training datum, the corresponding encoding coding cost”) , or wherein the second information comprises at least one of where to use the machine learning model in the determination of the target coding tool, or how to use the machine learning model in the determination of the target coding tool. As per claim 15, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 14. In addition, Joshi teaches wherein the coding information comprises at least one of: a coding mode of the target video block (abstract, [0049], [0055-0056], [0120] and figs. 8-10); or wherein the coding information comprises at least one of a prediction mode of the target video block ([0026], [0034],[0107-0108], [0155]), a quantization parameter (QP) of the target video block (fig. el. 910), a temporal layer of target video block, or a slice type of the target video block, a block size of the target video block ([0073], [0097], [0135] and fig. 7), a color component of the target video block ([0042], [0087]), or rate distortion cost of the target video block without using the machine learning model (fig. 8). As per claim 16, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein the conversion included encoding the target video block into the bitstream (fig. 4, 8-10). As per claim 17, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. In addition, Joshi teaches wherein the conversion includes decoding the target video block from the bitstream (fig. 5). As per claim 18, which is the corresponding apparatus for processing video data comprising a processor and non-transitory memory with the limitations of the method as recited in claim 1, thus the rejection and analysis made for claim 1 also applies here. As per claim 19, which is the corresponding non-transitory computer-readable storage medium storing instructions that cause a processor to perform a method with the limitations of the method as recited in claim 1, thus the rejection and analysis made for claim 1 also applies here. As per claim 20, which is the corresponding non-transitory computer readable medium with the limitations of the method as recited in claim 1, thus the rejection and analysis made for claim 1 also applies here. Claim(s) 3, 12-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Joshi et al., (U.S. Pub. No. 2020/0186808 A1) in view of Karczewicz et al., (U.S. Pub. No. 2022/0103816 A1) and further in view of Galpin et al., (U.S. Pub. No. 2020/0244997 A1). As per claim 12, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. Joshi does not explicitly disclose wherein the machine learning model is the same as a further a machine learning model obtained by a decoder used during the conversion, wherein a first number of residual blocks of the machine learning model is the same as a second number of residual blocks of the further machine learning model. However, Galpin teaches wherein the machine learning model is the same as a further machine learning model obtained by a decoder used during the conversion ([0009], [0058], “symmetrically, the decoder as shown in FIG. 6C receives the bitstream, reconstructs the images and restores the images using the same CNN”) wherein a first number of residual blocks of the machine learning model is same as a second number of residual blocks of the further machine learning model ([0009], [0058] and fig. 6C). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Galpin with Joshi (modified by Karczewicz) in order to improve image quality and improve coding efficiency. As per claim 13, Joshi (modified by Karczewicz) as a whole teaches everything as claimed above, see claim 1. Joshi does not explicitly disclose wherein the machine learning model is different from a further machine learning model obtained by a decoder used during the conversion wherein the machine learning model is simpler than the further machine learning model, or wherein a first depth of the machine learning model is different from a second depth of the machine learning model, or wherein the first depth is shallower than the second depth, or wherein a first feature map of the machine model is different that a second feature map of the of the further machine learning model, or wherein a first number of feature maps of the machine learning model is less than a second number of feature maps of the further machine learning model, or wherein a first number of residual blocks of the machine learning model is different from a second number of residual blocks of the further machine learning model, or wherein the first number of residual blocks of the machine learning model is less than the second number of residual blocks of the further machine learning model, or wherein a first convolution kernel of the machine learning model is different from a second convolutional kernel of the further machine learning model. However, Galpin teaches wherein the machine learning model is different from a further machine learning model obtained by a decoder used during the conversion (figs. 7A-7C and [0075]), wherein the machine learning model is simpler than the further machine learning model ([0072], [0075], [0078], [0080]; “The best branch (761, 762), for example, according to a rate-distortion (RD) metric (760, 765), is selected (770) and the branch index i is encoded (725) in the bitstream per CU. It should be noted that the selector (770) in the encoder may not be the same as the selector (740) used during training. During training, we select the best branch based only on the MSE, while during encoding, we may use a RD cost (e.g., MSE+coding cost of the index)”), or wherein a first depth of the machine learning model is different from a second depth of the machine learning model, or wherein the first depth is shallower than the second depth, or wherein a first feature map of the machine model is different that a second feature map of the of the further machine learning model, or wherein a first number of feature maps of the machine learning model is less than a second number of feature maps of the further machine learning model, or wherein a first number of residual blocks of the machine learning model is different from a second number of residual blocks of the further machine learning model, or wherein the first number of residual blocks of the machine learning model is less than the second number of residual blocks of the further machine learning model, or wherein a first convolution kernel of the machine learning model is different from a second convolutional kernel of the further machine learning model. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Galpin with Joshi (modified by Karczewicz) in order to improve image quality and improve coding efficiency. Allowable Subject Matter Claims 4, 8-10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Ceolho et al., (U.S. Pub. No. 2021/0051322 A1) “Receptive-Field-Conforming Convolutional Models for Video Coding” Contact Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA PRINCE whose telephone number is (571)270-1821. The examiner can normally be reached M-F 7:30-3:30 P.M.. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jamie Atala can be reached at 571-272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. JESSICA PRINCE Examiner Art Unit 2486 /JESSICA M PRINCE/ Primary Examiner, Art Unit 2486
Read full office action

Prosecution Timeline

Mar 29, 2024
Application Filed
Jul 12, 2025
Non-Final Rejection — §102, §103, §112
Oct 16, 2025
Response Filed
Jan 22, 2026
Non-Final Rejection — §102, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12603998
Configurable Neural Network Model Depth In Neural Network-Based Video Coding
2y 5m to grant Granted Apr 14, 2026
Patent 12603990
METHOD, APPARATUS, AND MEDIUM FOR VIDEO PROCESSING
2y 5m to grant Granted Apr 14, 2026
Patent 12598322
IMAGE PROCESSING APPARATUS USING NEURAL NETWORK AND IMAGE PROCESSING METHOD USING THE SAME
2y 5m to grant Granted Apr 07, 2026
Patent 12598299
LOSSLESS MODE FOR VERSATILE VIDEO CODING
2y 5m to grant Granted Apr 07, 2026
Patent 12593076
Transform Unit Partition Method for Video Coding
2y 5m to grant Granted Mar 31, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

2-3
Expected OA Rounds
76%
Grant Probability
93%
With Interview (+16.2%)
3y 1m
Median Time to Grant
Moderate
PTA Risk
Based on 700 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month