Last updated: April 19, 2026
Application No. 19/033,120
NEURAL NETWORK-BASED ADAPTIVE IMAGE AND VIDEO COMPRESSION METHOD WITH VARIABLE RATE

Non-Final OA §102§103
Filed
Jan 21, 2025
Examiner
HODGES, SUSAN E
Art Unit
2425
Tech Center
2400 — Computer Networks
Assignee
Bytedance Inc.
OA Round
1 (Non-Final)
Interview Optional

— +14.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 375 resolved cases, 2023–2026
Examiner Intelligence

HODGES, SUSAN E View full profile →
Grants 67% — above average
Career Allow Rate
250 granted / 375 resolved
+8.7% vs TC avg
Moderate +14% lift
Without
With
+14.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
31 currently pending
Career history
406
Total Applications
across all art units
Statute-Specific Performance

§101
6.0%
-34.0% vs TC avg
§103
48.7%
+8.7% vs TC avg
§102
20.9%
-19.1% vs TC avg
§112
22.6%
-17.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 375 resolved cases
Office Action

§102 §103
DETAILED ACTION
This office action is in response to the application filed on January 21, 2025. Claims 1 – 20 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant's claim for priority based on U.S. provisional applications 63/390,614 filed on July 19, 2022.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on March 19, 2025.  The submissions are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the Examiner.

Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.

The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words.  The form and legal phraseology often used in patent claims, such as "means" and "said," should be avoided.  The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.

The language should be clear and concise and should not repeat information given in the title.  It should avoid using phrases which can be implied, such as, "The disclosure concerns," "The disclosure defined by this invention," "The disclosure describes," etc. In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.

The abstract of the disclosure is objected to because it uses the phrase ”A mechanism for processing video data is in a neural network disclosed”, which can be implied.  Correction is required.  See MPEP § 608.01(b).

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because it includes the following reference character(s) not mentioned in the description: 
Figures 1 - 8 should be designated by a legend such as --Prior Art-- because only that which is old is illustrated.  See MPEP § 608.02(g).  
The description provides a Fig. 4.4-2 and Fig. 4.5-2 in paragraph [0252] which is not found in the drawings.
The description does not provide a Fig. No. in paragraph [0178].
Corrected drawings in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. The replacement sheet(s) should be labeled “Replacement Sheet” in the page header (as per 37 CFR 1.84(c)) so as not to obstruct any portion of the drawing figures. If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
In addition to Replacement Sheets containing the corrected drawing figure(s), applicant is required to submit a marked-up copy of each Replacement Sheet including annotations indicating the changes made to the previous version.  The marked-up copy must be clearly labeled as “Annotated Sheets” and must be presented in the amendment or remarks section that explains the change(s) to the drawings.  See 37 CFR 1.121(d)(1).  Failure to timely submit the proposed drawing and marked-up copy will result in the abandonment of the application.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1 - 5, 8 - 13, 15 and 18 - 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ikonin et al., (US 2023/0353764 A1) referred to as Ikonin hereinafter.
Regarding Claim 1, Ikonin discloses a method for visual data processing (Fig. 1, Par. [0090] improving the quality of encoded and decoded picture or video data and/or reducing the amount of data required to represent the encoded picture or video data) comprising:
obtaining a quantized residual latent sample for each component of visual data (Par. [0107], FIG. 1 shows the data flow in a typical convolutional neural network. First, the input image (i.e. visual data) is passed through convolutional layers. For the first layer which processes input data, e.g. an image, the number of input channels is normally equal to the number of channels of data representation, for instance 3 channels for RGB or YUV (i.e. each component) representation of images or video. Par. [0109], This image h is usually referred to as code, latent variables, or latent representation. Par. [0118], In FIG. 3A, the encoder 101 maps an input image x into a latent representation (denoted by y) via the function y = f (x). This latent representation may also be referred to as a part of or a point within a “latent space” in the following. The quantizer 102 transforms (i.e. obtaining) the latent representation y into the quantized latent representation ŷ (i.e. quantized latent sample) with (discrete) values by ŷ = Q(y), with Q representing the quantizer function. Par. [0152] The residual information between the original frame and the predicted frame is encoded by the residual encoder network. A highly non-linear neural network is used to transform the residuals to the corresponding latent representation. See Fig. 7A, details of encoder, shows the a residual calculation unit 204, a transform processing unit 206 and a quantization unit 208 to obtain the quantized residual latent sample 209);
performing a first processing on the quantized residual latent sample to de-scale the quantized residual latent sample to obtain a processed quantized residual latent sample (Fig. 7A shows an inverse quantization unit 210 outputs quantized coefficients 209 (i.e. quantized residual latent sample) to the inverse (i.e. de-scaling) transform processing unit 212 (i.e. first process) for performing a process on the quantized residual latent representation); and
acquiring a reconstructed latent sample based on the processed quantized residual latent sample (Fig. 7A illustrates a reconstruction unit 214 that receives the reconstructed residual block 213 (i.e. processed quantized residual latent sample) from the inverse transform processing unit 212 after the inverse transform processing unit 212, outputs a reconstructed block 215 (i.e. reconstructed latent sample)).

Regarding Claim 2, Ikonin discloses claim 1. Ikonin further discloses wherein the method is used for decoding the visual data from a bitstream (Par. [0090], provide an efficient selection of information to be signaled from an encoder to a decoder (i.e. used for decoding) by improving the quality of encoded and decoded picture or video data and/or reducing the amount of data required to represent the encoded picture or video data).

Regarding Claim 3, Ikonin discloses claim 1. Ikonin further discloses wherein a residual latent sample is obtained by subtracting a prediction sample of the component (Par. [0152] The residual information (i.e. residual latent sample) between (i.e. subtracting) the original frame and the predicted frame is encoded by the residual encoder network, Fig. 7A residual calculation unit 204 receives the image latent representation and subtracts the prediction block 265)  from a latent sample of the component (Par. [0109], This image h is usually referred to as code, latent variables, or latent representation (i.e. latent sample)), a second processing is performed on the residual latent sample to obtain a processed residual latent sample (Fig. 7A illustrates transform processing unit 206 (i.e. second process) on the residual block 205 (i.e. residual latent sample) to obtain transformed coefficients 207 (i.e. processed residual latent sample)), and the quantized residual latent sample is obtained based on the processed residual latent sample (Fig. 7A illustrates outputs, quantized coefficients 209 (i.e. quantized residual latent sample), are obtained from the quantization unit 208 (i.e. based on) the transform coefficients 207 (i.e. processed residual latent sample)).

Regarding Claim 4, Ikonin discloses claim 3. Ikonin further discloses wherein the first processing is performed by an inverse gain unit (Fig. 7A inverse transform (i.e. inverse gain unit) processing unit 212 (i.e. first process)) and the second processing is performed by a gain unit (Fig. 7A illustrates transform (i.e. gain unit) processing unit 206 (i.e. second process)); wherein the first processing applies an opposite function as the second processing (Fig. 7A illustrates a latent representation sent through a residual calculation unit 204 and then through a transform processing unit 206 (i.e. second process), which is the opposite of an inverse transform processing unit 212 (i.e. first process)); wherein the first processing adjusts a magnitude of the quantized residual latent sample; or wherein the second processing adjusts a magnitude of the residual latent sample (Par. [0116] the several transforms are used for that purpose such as discrete cosine and sine transforms (DCT (i.e. gain unit), DST (i.e. inverse gain unit)) (i.e. adjusts a magnitude of residual sample, the gain or increase/decrease)).

Regarding Claim 5, Ikonin discloses claim 3. Ikonin further discloses wherein the first processing is based on a first processing vector (Fig. 7A inverse transform processing unit 212 (i.e. first processing on vector T), where Par. [0113] continuous-valued data (such as vectors of image pixel intensities) must be quantized to a finite set of discrete values and Par. [0115] most existing image compression methods operate by linearly transforming the data vector (i.e. based on a processing vector)), the second processing is based on a second processing vector (Fig. 7A illustrates transform processing unit 206 (i.e. second process vector R), where Par. [0113] continuous-valued data (such as vectors of image pixel intensities) must be quantized to a finite set of discrete values, Par. [0115] most existing image compression methods operate by linearly transforming the data vector (i.e. based on a processing vector) into a suitable continuous-valued representation), and the first processing vector and the second processing vector satisfy the following: T=1/R, where T is the first processing vector and R is the second process vector (Fig. 7A illustrates a latent representation sent through a residual calculation unit 204 and then through a transform processing unit 206 based on vector, which is the opposite of an inverse transform processing unit 212 based on another vector, which would satisfy the formula T=1/R).

Regarding Claim 8, Ikonin discloses claim 1. Ikonin further discloses wherein an indication is included in a bitstream to indicate which vector (Par. [0113] continuous-valued data (such as vectors of image pixel intensities) must be quantized to a finite set of discrete values) in a set of vectors is used in the first processing (Par. [0222] each lower resolution feature map or feature map part has either 0 or 1 assigned (i.e. indication), where assigned a zero (0) because it is not selected and no motion vectors (feature elements) are signaled. Feature map elements (motion vectors) (i.e. set of vectors) of three of the four parts are signaled (i.e. used), and, correspondingly, the flags are set to 1).

Regarding Claim 9, Ikonin discloses claim 1. Ikonin further discloses further comprising: performing a third processing on a probability parameter to obtain a processed probability parameter; wherein the processed probability parameter is used to derive the quantized residual latent sample (Par. [0142] The encoder then uses the quantized vector ẑ to estimate σ̂, the spatial distribution of standard deviations which is used for obtaining probability values (or frequency values) (i.e. probability parameter) for arithmetic coding (AE) (i.e. third process), and uses it to compress and transmit the quantized image representation ŷ (or latent representation) (i.e. quantized residual latent sample). The decoder first recovers ẑ from the compressed signal. It then uses h.sub.s to obtain ŷ, which provides it with the correct probability estimates to successfully recover ŷ as well. It then feeds ŷ into g.sub.s to obtain the reconstructed image).

Regarding Claim 10, Ikonin discloses claim 9. Ikonin further discloses wherein the first processing is based on a first processing vector (Fig. 7A illustrates inverse transform processing unit 212 (i.e. first processing of vector T), where Par. [0113] continuous-valued data (such as vectors of image pixel intensities) must be quantized to a finite set of discrete values and Par. [0115] most existing image compression methods operate by linearly transforming the data vector (i.e. based on a processing vector)), the third processing is based on a third processing vector (Par. [0113] continuous-valued data (such as vectors of image pixel intensities) must be quantized to a finite set of discrete values and Par. [0115] most existing image compression methods operate by linearly transforming the data vector (i.e. based on a processing third vector)), and the first processing vector is derived from the third processing vector (Par. [0142], The encoder then uses the quantized vector ẑ to estimate σ̂, the spatial distribution of standard deviations which is used for obtaining probability values (or frequency values) for arithmetic coding (AE), and uses it to compress and transmit the quantized image representation ŷ (or latent representation) (i.e. first vector derived from third vector). The decoder first recovers ẑ from the compressed signal. It then uses h.sub.s to obtain ŷ, which provides it with the correct probability estimates to successfully recover ŷ as well. It then feeds ŷ into g.sub.s to obtain the reconstructed image).

Regarding Claim 11, Ikonin discloses claim 9. Ikonin further discloses wherein an indication is included in a bitstream to indicate which vector in a first set of vectors is used (Par. [0222] each lower resolution feature map or feature map part has either 0 or 1 assigned (i.e. indication), where assigned a zero (0) because it is not selected and no motion vectors (feature elements) are signaled. Feature map elements (motion vectors) (i.e. set of vectors) of three of the four parts are signaled (i.e. used), and, correspondingly, the flags are set to 1) in the first processing and which vector in a second set of vectors is used in the third processing (Par. [0113] continuous-valued data (such as vectors of image pixel intensities) must be quantized to a finite set of discrete values (i.e. second set of vectors)); wherein the first processing is performed after obtaining (Fig. 7A illustrates inverse transform processing unit 212 (i.e. first processing), where is performed after quantization unit) the quantized residual latent sample using the processed probability parameter (Par. [0142], The encoder then uses the quantized vector ẑ to estimate σ̂, the spatial distribution of standard deviations which is used for obtaining probability values (or frequency values) (i.e. probability parameter) for arithmetic coding (AE), and uses it to compress and transmit the quantized image representation ŷ (or latent representation)).

Regarding Claim 12, Ikonin discloses claim 1. Ikonin further discloses wherein acquiring the reconstructed latent sample based on the processed quantized residual latent sample (Fig. 7A illustrates a reconstruction unit 214 that receives the reconstructed residual block 213 (i.e. processed quantized residual latent sample) from the inverse transform processing unit 212 after the inverse transform processing unit 212, outputs a reconstructed block 215 (i.e. reconstructed latent sample)), comprises: processing the processed quantized residual latent sample by an inverse residual (Fig. 7A illustrates a reconstruction unit 214 that receives the reconstructed residual block 213 (i.e. processed quantized residual latent sample) from the inverse transform processing unit 212 after the inverse quantization unit 212 (dequantized coefficients 211 (i.e. an inverse residual)) and variance scale module (Fig. 3A, Par. [0130] the second subnetwork (i.e. variance scale module) is to obtain statistical properties (e.g. mean value, variance and correlations between samples of bitstream 1) of the samples of “bitstream1”, such that the compressing of bitstream 1 by first subnetwork is more efficient, where the second subnetwork generates a second bitstream “bitstream2”, which comprises the said information (e.g. mean value, variance and correlations between samples of bitstream1) and Par. [0132] The statistical information provided by the second subnetwork might be used by AE (arithmetic encoder) 105 (i.e. variance scale module)  and AD (arithmetic decoder) 106 components); adding an output of the inverse residual and variance scale module (Par. [0133] The decoded quantized side information ẑ’ is then transformed 107 into decoded side information ŷ’, where ŷ’ represents the statistical properties of ŷ (e.g. mean value of samples of ŷ, or the variance of sample values or like). The decoded latent representation ŷ’ is then provided to the above-mentioned Arithmetic Encoder 105 and Arithmetic Decoder 106 to control the probability model of ŷ (i.e. output of variance scale module)) to a prediction sample to obtain the reconstructed latent sample (Fig. 7A, prediction block 265 and reconstructed residual block 213  (i.e. processed quantized residual latent sample) are added to reconstruction unit 214 to generate reconstructed block 215 (i.e. reconstructed latent sample)).

Regarding Claim 13, Ikonin discloses claim 1. Ikonin further discloses wherein the component is a luma component or a chroma component (Par. [0168] A (digital) picture can be regarded as a two-dimensional array or matrix of samples with intensity values. A sample in the array may also be referred to as pixel (short form of picture element). The number of samples in horizontal and vertical direction (or axis) of the array or picture define the size and/or resolution of the picture. For representation of color, in video coding each pixel is typically represented in a luminance and chrominance format or color space, e.g. YCbCr, which comprises a luminance component indicated by Y (sometimes also L is used instead) and two chrominance components indicated by Cb and Cr), and wherein a reconstructed image is obtained by processing of the reconstructed latent sample with a transform process, wherein the transform process is an inverse transform (Par. [0124]-[0129] The first subnetwork is responsible for the transformation 101 of the input image x into its latent representation y (which is easier to compress that x), quantizing 102 the latent representation y into a quantized latent representation ŷ, compressing the quantized latent representation ŷ using the AE by the arithmetic encoding module 105 to obtain bitstream “bitstream 1”, parsing the bitstream 1 via AD using the arithmetic decoding module 106, and reconstructing 104 the reconstructed image (x̂) (i.e. reconstructed image) using the parsed data, where the compressed image is reconstructed from these quantized values using an approximate parametric nonlinear inverse transform see Par. [0140]) or a synthesis transform.

Regarding Claim 15, Ikonin discloses a method for visual data processing (Fig. 1, Par. [0090] improving the quality of encoded and decoded picture or video data and/or reducing the amount of data required to represent the encoded picture or video data) comprising:
acquiring a residual latent sample (Par. [0152] The residual information (i.e. residual latent sample) between the original frame and the predicted frame is encoded by the residual encoder network. A highly non-linear neural network is used to transform the residuals to the corresponding latent representation.  Fig. 7A residual calculation unit 204 receives the image latent representation and subtracts the prediction block 265 to get a residual block 205 (i.e. residual latent sample)) for each component of visual data (Par. [0107], FIG. 1 shows the data flow in a typical convolutional neural network. First, the input image (i.e. visual data) is passed through convolutional layers. For the first layer which processes input data, e.g. an image, the number of input channels is normally equal to the number of channels of data representation, for instance 3 channels for RGB or YUV (i.e. each component) representation of images or video. Par. [0109], This image h is usually referred to as code, latent variables, or latent representation. Par. [0118], In FIG. 3A, the encoder 101 maps an input image x into a latent representation (denoted by y) via the function y = f (x). This latent representation may also be referred to as a part of or a point within a “latent space” in the following. Fig. 7A, details of encoder, shows the a residual calculation unit 204, to acquire a residual latent sample)); 
performing a second processing on the residual latent sample to scale the residual latent sample to obtain a processed residual latent sample (Fig. 7A illustrates transform processing unit 206 (i.e. second process scales) on the residual block 205 (i.e. residual latent sample) to obtain transformed coefficients 207 (i.e. processed residual latent sample)); and
processing the processed residual latent sample to acquire a quantized residual latent sample (Fig. 7A illustrates outputs, quantized coefficients 209 (i.e. quantized residual latent sample), are processed by the quantization unit 208 of the transform coefficients 207 (i.e. processed residual latent sample)).

Regarding Claim 18,  Ikonin discloses claim 15. Ikonin further discloses wherein the method is used for encoding the visual data into a bitstream (Par. [0090], provide an efficient selection of information to be signaled from an encoder (i.e. used for encoding) to a decoder by improving the quality of encoded and decoded picture or video data and/or reducing the amount of data required to represent the encoded picture or video data).

Apparatus claim 19 is drawn to the apparatus corresponding to the method of using the same as claimed in claim 1.  Therefore apparatus claim 19 corresponds to method claim 1, and is rejected for the same reasons of anticipation as used above. Claim 19 further recites a processor; and a non-transitory memory with instructions thereon (See Ikonin Par. [0036], a computer program product is provided, stored on a non-transitory medium (i.e. memory), which when executed on one or more processors performs the method).

Apparatus claim 20 is drawn to the apparatus corresponding to the method of using the same as claimed in claim 15.  Therefore apparatus claim 20 corresponds to method claim 15, and is rejected for the same reasons of anticipation as used above. Claim 20 further recites a processor; and a non-transitory memory with instructions thereon (See Ikonin Par. [0036], a computer program product is provided, stored on a non-transitory medium (i.e. memory), which when executed on one or more processors performs the method).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 6, 7, 16 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Ikonin (US 2023/0353764 A1) in view of TODERICI, G., et al., “Variable Rate Image Compression with Recurrent Neural Networks,” arXiv preprint arXiv:1511.06085v5 [cs.CV] 1 Mar 2016, 12 pages (See IDS filed March 19, 2025 NPL #18) referred to as Toderici hereinafter.
Regarding Claim 6, Ikonin discloses claim 1. Ikonin further discloses wherein the first processing is implemented according to (Fig. 1, Par. [0096], a convolutional neural network consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of a series of convolutional layers that convolve with a multiplication or other dot product. The result of a layer is one or more feature maps (f.maps in FIG. 1), sometimes also referred to as channels. Par. [0105] high-level reasoning in the neural network is done via fully connected layers, where neurons in a fully connected layer have connections to all activations in the previous layer, which can be computed as an affine transformation): the processed quantized residual latent sample (reconstructed residual block 213 (i.e. processed quantized residual latent sample)), a first processing vector (Fig. 7A inverse transform processing unit 212, where Par. [0113] continuous-valued data (such as vectors of image pixel intensities) must be quantized (i.e. first processing unit of vector) to a finite set of discrete values and Par. [0115] most existing image compression methods operate by linearly transforming the data vector), the first processing vector corresponding to the quantized residual latent sample (dequantized coefficients 211, (i.e. quantized residual latent sample)), the quantized residual latent sample (dequantized coefficients 211, (i.e. quantized residual latent sample)), and i is an index corresponding to a channel dimension (Par. [0370], the feature map of some layer can be represented in a two-dimensional space. The segmentation information comprises indicator (a binary flag) (i.e. index) for the positions of the 2D space indicating whether a feature map value corresponding to this position is presented in the bitstream). 
While Ikonin teaches neural network-based compression framework, Ikonin fails to explicitly teach implementation of a formula. 
However, Toderici teaches the first processing (Page 6, Section 3.5 Feed-forward convolutional/deconvolutional residual encoder) is implemented according to: 
ws [i] = T[K][i] x w [i] (Page 6, Section 3.5, Formula 12, W 0k X = W ®1 (Tk(x))) where  ws [i] indicates the processed quantized residual latent sample (Page 6, Section 3.5, W 0k X , the deconvolutional operator (i.e. processed quantized residual latent sample) is defined as the transpose of the convolutional operator), T[K] indicates a first processing vector, T[K][i] indicates an element in the first processing vector corresponding to the quantized residual latent sample, K indicates an index of the first processing vector (Page 6, Section 3.5, the "inflation" operator Tk), w [i]  indicates the quantized residual latent sample, and i is an index corresponding to a channel dimension (Page 6, Section 3.5, the convolutional operator W ®1, where 1 is the index i).
References Ikonin and Toderici are considered to be analogous art because they relate to video compression using neural networks. Therefore, it would be obvious to one possessing ordinary skill in the art before the effective filing date of the claimed invention to specify a formula for the first process as taught by Toderici in the invention of Ikonin. This modification would allow networks to assist or even entirely take over many of the processes used as part of a traditional image compression pipeline in order to learn more efficient frequency transforms, more effective quantization techniques and improved predictive coding (See Toderici, Section 2, 1st paragraph).

Regarding Claim 7, Ikonin in view Toderici teaches Claim 6. Ikonin further teaches where x corresponding to a horizontal spatial dimension, and y corresponding to a vertical spatial dimension (Par. [0187], number of feature map elements in one or more dimensions (such as x, y; alternatively or in addition, number of channels may be considered). Par. [207], the feature map 1110 is a dense optical flow of motion vectors with a width W (i.e. horizontal spatial dimension) and a height H (i.e. vertical spatial dimension)) Par. [0294], there may be a plurality of channels such as color or depth channels for the picture, so that the output may also have more dimensions. General feature maps may also come in more than 2 or three dimensions).
While Ikonin teaches neural network-based compression framework, Ikonin fails to explicitly teach implementation of a formula. 
However, Toderici further teaches wherein 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
 where x is an index, and y is an index (Page 6, Section 3.5, Wk(x)(i, j) = x(k x i, k x j) for 2D multi-channel image x and pixel coordinate (i,j), where i and j are the index for spatial dimension).
References Ikonin and Toderici are considered to be analogous art because they relate to video compression using neural networks. Therefore, it would be obvious to one possessing ordinary skill in the art before the effective filing date of the claimed invention to specify a formula for the first process as taught by Toderici in the invention of Ikonin. This modification would allow networks to assist or even entirely take over many of the processes used as part of a traditional image compression pipeline in order to learn more efficient frequency transforms, more effective quantization techniques and improved predictive coding (See Toderici, Section 2, 1st paragraph).

Regarding Claim 16, Ikonin discloses claim 15. Ikonin further discloses wherein the second processing is implemented according to (Fig. 1, Par. [0096], a convolutional neural network consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of a series of convolutional layers that convolve with a multiplication or other dot product. The result of a layer is one or more feature maps (f.maps in FIG. 1), sometimes also referred to as channels. Par. [0105] high-level reasoning in the neural network is done via fully connected layers, where neurons in a fully connected layer have connections to all activations in the previous layer, which can be computed as an affine transformation): the processed residual latent sample (Fig. 7A transform coefficients 207 (i.e. processed residual latent sample)), a second processing vector (Fig. 7A transform processing unit 206, where Par. [0113] continuous-valued data (such as vectors of image pixel intensities) must be quantized (i.e. second processing unit of vector) to a finite set of discrete values and Par. [0115] most existing image compression methods operate by linearly transforming the data vector), the second processing vector corresponding to the residual latent sample (residual block 205 (i.e. residual latent sample)), the residual latent sample (residual block 205 (i.e. residual latent sample)), and i is an index corresponding to a channel dimension (Par. [0370], the feature map of some layer can be represented in a two-dimensional space. The segmentation information comprises indicator (a binary flag) (i.e. index) for the positions of the 2D space indicating whether a feature map value corresponding to this position is presented in the bitstream). 
While Ikonin teaches neural network-based compression framework, Ikonin fails to explicitly teach implementation of a formula. 
However, Toderici teaches the second processing (Page 6, Section 3.5 Feed-forward convolutional/deconvolutional residual encoder) is implemented according to: 
ws[i] = R[K][i] x w[i] (Page 6, Section 3.5, above Formula 11, W ®k x = Sk(W ®1 x), where S is interchangeable for variable R) where  ws[i] indicates the processed quantized residual latent sample (Page 6, Section 3.5, W ®k x, the convolutional operator (i.e. processed quantized residual latent sample)), R[K] indicates a first processing vector, R[K][i] indicates an element in the first processing vector corresponding to the quantized residual latent sample, K indicates an index of the first processing vector (Page 6, Section 3.5, Sk(W ®1), w[i]  indicates the quantized residual latent sample (Page 6, Section 3.5, (W ®1 x)), and i is an index corresponding to a channel dimension (Page 6, Section 3.5, the convolutional operator W ®1, where 1 is the index).
References Ikonin and Toderici are considered to be analogous art because they relate to video compression using neural networks. Therefore, it would be obvious to one possessing ordinary skill in the art before the effective filing date of the claimed invention to specify a formula for the first process as taught by Toderici in the invention of Ikonin. This modification would allow networks to assist or even entirely take over many of the processes used as part of a traditional image compression pipeline in order to learn more efficient frequency transforms, more effective quantization techniques and improved predictive coding (See Toderici, Section 2, 1st paragraph).

Regarding Claim 17, Ikonin in view Toderici teaches Claim 16. Ikonin further teaches where x corresponding to a horizontal spatial dimension, and y corresponding to a vertical spatial dimension (Par. [0187], number of feature map elements in one or more dimensions (such as x, y; alternatively or in addition, number of channels may be considered). Par. [207], the feature map 1110 is a dense optical flow of motion vectors with a width W (i.e. horizontal spatial dimension) and a height H (i.e. vertical spatial dimension)) Par. [0294], there may be a plurality of channels such as color or depth channels for the picture, so that the output may also have more dimensions. General feature maps may also come in more than 2 or three dimensions).
While Ikonin teaches neural network-based compression framework, Ikonin fails to explicitly teach implementation of a formula. 
However, Toderici further teaches, wherein ws[i] is ws[i,x,y], and w[i] is w[i,x,y], where x is an index, and y is an index (Page 6, Section 3.5, Wk(x)(i, j) = x(k x i, k x j) for 2D multi-channel image x and pixel coordinate (i,j), where i and j are the index for spatial dimension).
References Ikonin and Toderici are considered to be analogous art because they relate to video compression using neural networks. Therefore, it would be obvious to one possessing ordinary skill in the art before the effective filing date of the claimed invention to specify a formula for the first process as taught by Toderici in the invention of Ikonin. This modification would allow networks to assist or even entirely take over many of the processes used as part of a traditional image compression pipeline in order to learn more efficient frequency transforms, more effective quantization techniques and improved predictive coding (See Toderici, Section 2, 1st paragraph).

Allowable Subject Matter
Claim 14 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter: Claim 14 specifically defines different vectors in the first processing are used for a first sample and a second sample, respectively, and based on different indications in a bitstream; different vectors in the second processing are used for the first sample and the second sample, respectively, and are respectively based on different indications in the bitstream; wherein the first sample and the second sample belong to different tiles; wherein the first sample and the second sample belong to different components; wherein reconstructed latent samples in the visual data are divided into at least two tiles, and the at least two tiles are rectangular partitions of the reconstructed latent samples which is not readily taught or suggested by the prior art uncovered during search or made of record. 

Conclusion
The prior art references made of record are not relied upon but are considered pertinent to applicant's disclosure. Li et al. (US 2009/0074076 A1) teaches method and device for vector quantization. BESENBRUCH et al. (US 2022/0279183 A1) teaches an autoencoder with a hyperprior and a hyperhyperprior, where hyperhyperlatents ‘w’ encodes information regarding the latent entropy parameters ϕ.sub.z, which in turn allows for the encoding/decoding of the hyperlatents ‘z’. 
Any inquiry concerning this communication should be directed to SUSAN E HODGES whose telephone number is (571)270-0498.  The Examiner can normally be reached on Monday - Friday from 8:00 am (EST) to 4:00 pm (EST).  
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner's supervisor, Brian T. Pendleton, can be reached on (571) . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Susan E. Hodges/Primary Examiner, Art Unit 2425
Read full office action
Prosecution Timeline

Jan 21, 2025
Application Filed
Feb 21, 2026
Non-Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/862,647
Patent 12603982
STEREOSCOPIC HIGH DYNAMIC RANGE VIDEO
2y 5m to grant Granted Apr 14, 2026
18/923,643
Patent 12604008
ADAPTIVE CLIPPING IN MODELS PARAMETERS DERIVATIONS METHODS FOR VIDEO COMPRESSION
2y 5m to grant Granted Apr 14, 2026
18/716,344
Patent 12574558
Method and Apparatus for Sign Coding of Transform Coefficients in Video Coding System
2y 5m to grant Granted Mar 10, 2026
18/208,163
Patent 12568212
ADAPTIVE LOOP FILTERING ON OUTPUT(S) FROM OFFLINE FIXED FILTERING
2y 5m to grant Granted Mar 03, 2026
18/906,582
Patent 12556671
THREE DIMENSIONAL STROBO-STEREOSCOPIC IMAGING SYSTEMS AND ASSOCIATED METHODS
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
67%
Grant Probability
81%
With Interview (+14.4%)
2y 4m
Median Time to Grant
Low
PTA Risk
Based on 375 resolved cases by this examiner. Grant probability derived from career allow rate.