Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2. This Office Action is sent in response to Applicant’s Communication received on November 19, 2024 for application number 18/867,234. This Office hereby acknowledges receipt of the following and placed of record in file: Specification, Drawings, Oath/Declaration and Claims.
3. Claims 1-6, 8-14, 22, 24 and 29-35 are presented for examination. Claims 7, 15-21, 23, 25-28 and 36-41 have been canceled.
Information Disclosure Statement
4. The information disclosure statement (IDS) submitted on November 19, 2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Specification
5. The abstract of the disclosure is objected to because "the. A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).
Claim Rejections - 35 USC § 103
6. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
7. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
8. Claims 1-6, 8-11, 14, 22, 24 and 29-35 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al.(US 2024/0414381 A1)(hereinafter Wu) in view of FINLAY et al.(US 2024/0354553 A1)(hereinafter Finlay) in further view of Vafin et al.(US 2011/0206131 A1)(hereinafter Vafin).
Regarding claims 1 and 24, Wu discloses a computer-implemented method [See Wu: at least Figs. 1-13 regarding image/video processing method] and a computing device [See Wu: at least Figs. 1-13 regarding image/video coding system and computing device 1300], comprising:
one or more processors[See Wu: at least Figs. 1-13 and par. 7-8, 340-351 regarding The processing unit 1310 may be a physical or virtual processor and can implement various processes based on programs stored in the memory 1320. In a multi-processor system, multiple processing units execute computer executable instructions in parallel so as to improve the parallel processing capability of the computing device 1300. The processing unit 1310 may also be referred to as a central processing unit (CPU), a microprocessor, a controller or a microcontroller…]; and
data storage, wherein the data storage has stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing device to carry out functions [See Wu: at least Figs. 1-13 and par. 7-8, 340-351regarding The computing device 1300 typically includes various computer storage medium. Such medium can be any medium accessible by the computing device 1300, including, but not limited to, volatile and non-volatile medium, or detachable and non-detachable medium. The storage unit 1330 may be any detachable or non-detachable medium and may include a machine-readable medium such as a memory, flash memory drive, magnetic disk or another other media, which can be used for storing information and/or data and can be accessed in the computing device 1300…] comprising:
encoding, by an encoder of a transmitting computing device, a plurality of successive input video frames as a corresponding sequence of quantized representations [See Wu: at least Figs. 1-13, par. 34, 46, 64-70, 180-186 regarding the data may comprise one or more pictures of a video or one or more images. The data encoder 114 encodes the data from the data source 112 to generate a bitstream. The bitstream may include a sequence of bits that form a coded representation of the data. The bitstream may include coded pictures and associated data. The coded picture is a coded representation of a picture. The associated data may include sequence parameter sets, picture parameter sets, and other syntax structures. The I/O interface 116 may include a modulator/demodulator and/or a transmitter. The encoded data may be transmitted directly to destination device 120 via the I/O interface 116 through the network 130A…]; and
predicting, by a transformer of the transmitting computing device, a probability distribution of a given quantized representation in the sequence of quantized representations, wherein the distribution is based on at least one dependency between one or more quantized representations that occur prior to the given quantized representation in the sequence of quantized representations[See Wu: at least Figs. 1-13 and par. 98-101, 133, 170-194, 198-204 regarding for example in FIG. 9 , the transformer is utilized as a replacement of the autoregressive model. In detail, for input image x, encoder network ga with parameter ϕga transform it into latent information y. Then it is quantized to reduce the information through quantization function Q(.).In the entropy coding part, hyperprior encoder ha with parameters ϕha is firstly utilized, which takes latent y as the input and output hyperprior information z. Similar to y, hyperprior information z is quantized by quantization function Q(.), then the probability distribution
p
z
^
of the quantized information
z
^
estimated by factorized model F… Besides the hyperprior model, transformer context model t is also utilized to aid probability distribution modeling…Further, FIG. 10 illustrates a possible structure of the transformer context model…].
Wu does not explicitly disclose predicting, by a transformer of the transmitting computing device, a probability mass function (PMF) as a conditional distribution of a given quantized representation in the sequence of quantized representations, wherein the conditional distribution is based on at least one dependency between one or more quantized representations that occur prior to the given quantized representation in the sequence of quantized representations.
However, using a probability mass function (PMF) as a conditional distribution for entropy coding in artificial intelligence/machine learning/neural network image/video compression was well known in the art at the time of the invention was filed as evident from the teaching of Finlay [See Finlay: at least Figs. 1-45, par. 50-60, 353-380, 384-420 regarding Encoding and decoding a stream of discrete symbols (such as the latent pixels in an AI-based compression pipeline) into a binary bitstream may require access to a discrete probability mass function (PMF)… The following description will outline the functionality, scope and future outlook of discrete probability mass functions and interpolation for usage in, but not limited to, AI-based image and video compression. The following provides a high-level description of discrete probability mass functions, a description of their use in inference and training AI-based compression algorithms, and methods of interpolating functions (such as discrete probability mass functions)… To perform the step of transforming the latent into a bitstream, the latent variable may be quantized into an integer-valued representation ŷ. This quantized latent ŷ is transformed into the bitstream via a lossless encoding/decoding scheme, such as an arithmetic encoder/decoder or range encoder/decoder. Lossless encoding/decoding schemes may require a model one-dimensional discrete probability mass function (PMF) for each element of the latent quantized variable. The optimal bitstream length (file-size) is achieved when this model PMF matches the true one-dimensional data-distribution of the latents…].
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Wu with Finlay teachings by including “predicting, by a transformer of the transmitting computing device, a probability mass function (PMF) as a conditional distribution of a given quantized representation in the sequence of quantized representations, wherein the conditional distribution is based on at least one dependency between one or more quantized representations that occur prior to the given quantized representation in the sequence of quantized representations” because this combination has the benefit of improving the performance of the video compression framework.
Further on, when combined, Wu and Finlay teach or suggest generating, by the transmitting computing device, a plurality of compressed video frames by applying, based on the predicted PMF, an entropy coding to each quantized representation [See Wu: at least Figs. 1-13 and par. 98-101, 133, 170-194, 198-204 regarding At 1204, the conversion is performed based on the probability distribution. In one example, the conversion may include encoding the data into the bitstream. Entropy encoding (such as arithmetic encoding or Huffman encoding) may be performed on quantized information based on the probability distribution, so as to generating the bitstream… See Finlay: at least Figs. 1-45, par. 50-60, 213-217, 353-380, 384-420 regarding In a third step, the quantized latent is entropy encoded in an entropy encoding process 150 to produce a bitstream 130. The entropy encoding process may be for example, range or arithmetic encoding. In a fourth step, the bitstream 130 may be transmitted across a communication network…For example, in Fig. 24 , The encoder outputs a latent y. This latent is then fed through a hyper-encoder, returning a hyper-latent Z. The hyper-latent is quantized to
z
^
and sent to the bitstream via an arithmetic encoder using a 1D PMF dependent on learned location μZ and scale σZ, and a lossless encoder. Optionally (though not depicted in FIG. 24) a learned L-context module can also be employed in the entropy model on ŷ…] and
transmitting, by the transmitting computing device, the plurality of compressed video frames[See Wu: at least Figs. 1-13, par. 34, 46, 64-70, 180-186 regarding the data may comprise one or more pictures of a video or one or more images. The data encoder 114 encodes the data from the data source 112 to generate a bitstream. The bitstream may include a sequence of bits that form a coded representation of the data. The bitstream may include coded pictures and associated data. The coded picture is a coded representation of a picture. The associated data may include sequence parameter sets, picture parameter sets, and other syntax structures. The I/O interface 116 may include a modulator/demodulator and/or a transmitter. The encoded data may be transmitted directly to destination device 120 via the I/O interface 116 through the network 130A…See Finlay: at least par. 98 regarding transmitting the entropy encoded latent representation to a second computer system ] .
Wu and Finlay do not explicitly disclose wherein the entropy coding comprises assigning a smaller number of bits to values that have a higher frequency of occurrence.
However, assigning a small number of bits to codewords or values with higher frequency of occurrence in entropy coding was well known in the art at the time of the invention was filed as evident from the teaching of Vafin[See Vafin: par. 46 regardin As discussed, the present invention provides a high-degree of adaptability of an entropy-coding scheme to the actual frequency of occurrence of symbol values within a short interval (e.g., within one audio or video frame), thereby reducing the average number of bits per symbol required for encoding. In preferred embodiments, the PMF is estimated for each frame based on the symbol values in that same frame. This PMF is used to entropy-code the symbol values within the frame and is transmitted along with the entropy-coded symbol values to the decoder as side information to facilitate decoding…].
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Wu and Finlay with Vafin teachings by including “wherein the entropy coding comprises assigning a smaller number of bits to values that have a higher frequency of occurrence” because this combination has the benefit of improving coding efficiency.
Regarding claim 29, Wu discloses a decoding device[See Wu: at least Figs. 1-13 and par. 32 and 338 regarding destination device 120 or data decoding device. The computing device 1300 may be implemented as or included in the source device 110 (or the data encoder 114) or the destination device 120 (or the data decoder 124).], comprising:
one or more processors[See Wu: at least Figs. 1-13 and par. 7-8, 340-351 regarding The processing unit 1310 may be a physical or virtual processor and can implement various processes based on programs stored in the memory 1320. In a multi-processor system, multiple processing units execute computer executable instructions in parallel so as to improve the parallel processing capability of the computing device 1300. The processing unit 1310 may also be referred to as a central processing unit (CPU), a microprocessor, a controller or a microcontroller…]; and
data storage, wherein the data storage has stored thereon computer-executable instructions that, when executed by the one or more processors, cause the decoding device to carry out functions [See Wu: at least Figs. 1-13 and par. 7-8, 340-351regarding The computing device 1300 typically includes various computer storage medium. Such medium can be any medium accessible by the computing device 1300, including, but not limited to, volatile and non-volatile medium, or detachable and non-detachable medium. The storage unit 1330 may be any detachable or non-detachable medium and may include a machine-readable medium such as a memory, flash memory drive, magnetic disk or another other media, which can be used for storing information and/or data and can be accessed in the computing device 1300…] comprising:
receiving, by a decoder of the decoding device, a plurality of compressed video frames as a corresponding sequence of quantized representations [See Wu: at least Figs. 1-13 and par. 34-35, 46, 64-70, 180-186 regarding The encoded data may be transmitted directly to destination device 120 via the I/O interface 116 through the network 130A. The encoded data may also be stored onto a storage medium/server 130B for access by destination device 120. The data decoder 124 may decode the encoded data…In Fig. 9, On the decoder side, after obtaining quantized latent ŷ, decoder network gs with parameters ϕgs will take it as input and obtain final reconstruction
x
^
]; and
predicting, by a transformer of the decoding device, a probability distribution of a given quantized representation in the sequence of quantized representations, wherein the distribution is based on at least one dependency between one or more quantized representations that occur prior to the given quantized representation in the sequence of quantized representations[See Wu: at least Figs. 1-13 and par. 98-101, 133, 170-204 regarding for example in FIG. 9 , the transformer is utilized as a replacement of the autoregressive model. In detail, for input image x, encoder network ga with parameter ϕga transform it into latent information y. Then it is quantized to reduce the information through quantization function Q(.).In the entropy coding part, hyperprior encoder ha with parameters ϕha is firstly utilized, which takes latent y as the input and output hyperprior information z. Similar to y, hyperprior information z is quantized by quantization function Q(.), then the probability distribution
p
z
^
of the quantized information
z
^
estimated by factorized model F… Besides the hyperprior model, transformer context model t is also utilized to aid probability distribution modeling…Further, FIG. 10 illustrates a possible structure of the transformer context model…After the masked multi-head attention module, the latent information is further processed by residual connection, normalization, and linear layer. The final output is combined with the output of the hyperprior decoder to obtain the final probability distribution of pŷ. As discussed above, in order to build the relationship between adjacent spatial elements in the quantized latent representation ŷ of data, the existing coding framework includes an autoregressive model (such as a context model shown in FIGS. 6 and 7) and a hyperprior model (such as a hyper encoder and/or a hyper decoder shown in FIGS. 6 and 7)…].
Wu does not explicitly disclose predicting, by a transformer of the decoding device, a probability mass function (PMF) as a conditional distribution of a given quantized representation in the sequence of quantized representations, wherein the conditional distribution is based on at least one dependency between one or more quantized representations that occur prior to the given quantized representation in the sequence of quantized representations.
However, using a probability mass function (PMF) as a conditional distribution for entropy coding in artificial intelligence/machine learning/neural network image/video compression/decompression was well known in the art at the time of the invention was filed as evident from the teaching of Finlay [See Finlay: at least Figs. 1-45, par. 50-60, 353-380, 384-420 regarding Encoding and decoding a stream of discrete symbols (such as the latent pixels in an AI-based compression pipeline) into a binary bitstream may require access to a discrete probability mass function (PMF)… The following description will outline the functionality, scope and future outlook of discrete probability mass functions and interpolation for usage in, but not limited to, AI-based image and video compression. The following provides a high-level description of discrete probability mass functions, a description of their use in inference and training AI-based compression algorithms, and methods of interpolating functions (such as discrete probability mass functions)… To perform the step of transforming the latent into a bitstream, the latent variable may be quantized into an integer-valued representation ŷ. This quantized latent ŷ is transformed into the bitstream via a lossless encoding/decoding scheme, such as an arithmetic encoder/decoder or range encoder/decoder. Lossless encoding/decoding schemes may require a model one-dimensional discrete probability mass function (PMF) for each element of the latent quantized variable. The optimal bitstream length (file-size) is achieved when this model PMF matches the true one-dimensional data-distribution of the latents…].
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Wu with Finlay teachings by including “predicting, by a transformer of the decoding device, a probability mass function (PMF) as a conditional distribution of a given quantized representation in the sequence of quantized representations, wherein the conditional distribution is based on at least one dependency between one or more quantized representations that occur prior to the given quantized representation in the sequence of quantized representations” because this combination has the benefit of improving the performance of the video compression framework.
Further on, when combined, Wu and Finlay teach or suggest generating, by the decoding device, a plurality of decompressed video frames by applying, based on the predicted PMF, an entropy decoding to each quantized representation, wherein the entropy decoding comprises reversing an entropy encoding[See Wu: at least Figs. 1-13 and par. 98-101, 133, 170-194, 198-204 regarding At 1204, the conversion is performed based on the probability distribution. In one example, the conversion may include encoding the data into the bitstream. Entropy decoding (such as arithmetic decoding or Huffman decoding) may be performed on the bitstream based on the probability distribution, so as to obtain the quantized information which may be further processed to reconstruct the data.… See Finlay: at least Figs. 1-45, par. 50-60, 85, 87, 98, 121, 130, 137, 213-218, 353-380, 384-420 regarding In a fifth step, the bitstream is entropy decoded in an entropy decoding process 160. The quantized latent is provided to another trained neural network 120 characterized by a function go acting as a decoder, which decodes the quantized latent. The trained neural network 120 produces an output based on the quantized latent. The output may be the output image of the AI based compression process 100. The encoder-decoder system may be referred to as an autoencoder…Encoding and decoding a stream of discrete symbols (such as the latent pixels in an AI-based compression pipeline) into a binary bitstream may require access to a discrete probability mass function (PMF)…]; and
providing, by the decoding device, the plurality of decompressed video frames[See Wu: at least Figs. 1-13 and par. 83-85, 180-186 regarding After the probability distributions (e.g. the mean and variance parameters) are obtained by the entropy parameters subnetwork, the arithmetic decoding module decodes the samples of the quantized latent one by one from the bitstream bits1. Finally, the fully reconstructed quantized latent ŷ is input to the synthesis transform (denoted as decoder in FIG. 7) module to obtain the reconstructed image. In the above description, all of the elements in FIG. 7 are collectively called decoder. The synthesis transform that converts the quantized latent into reconstructed image is also called a decoder (or auto-decoder)… See Finlay: at least Figs. 1-45 and par. 214-229 regarding In a fifth step, the bitstream is entropy decoded in an entropy decoding process 160. The quantized latent is provided to another trained neural network 120 characterized by a function go acting as a decoder, which decodes the quantized latent. The trained neural network 120 produces an output based on the quantized latent. The output may be the output image of the AI based compression process 100. The encoder-decoder system may be referred to as an autoencoder… In the decode phase, a reverse transform is applied to the latent variable, in which the original data (or an approximation of the original data) is recovered.].
Wu and Finlay do not explicitly disclose the entropy encoding having assigned a smaller number of bits to values with a higher frequency of occurrence.
However, assigning a small number of bits to codewords or values with higher frequency of occurrence in entropy coding was well known in the art at the time of the invention was filed as evident from the teaching of Vafin[See Vafin: par. 46 regardin As discussed, the present invention provides a high-degree of adaptability of an entropy-coding scheme to the actual frequency of occurrence of symbol values within a short interval (e.g., within one audio or video frame), thereby reducing the average number of bits per symbol required for encoding. In preferred embodiments, the PMF is estimated for each frame based on the symbol values in that same frame. This PMF is used to entropy-code the symbol values within the frame and is transmitted along with the entropy-coded symbol values to the decoder as side information to facilitate decoding…].
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Wu and Finlay with Vafin teachings by including “the entropy encoding having assigned a smaller number of bits to values with a higher frequency of occurrence” because this combination has the benefit of improving coding efficiency.
Regarding claim 2, Wu, Finlay and Vafin teach all of the limitations of claim 1, and are analyzed as previously discussed with respect to that claim. Further on, when combined, Wu and Finlay teach or suggest further comprising: receiving, by a receiving computing device, the plurality of compressed video frames[See Wu: at least Figs. 1-13 and par. 34-35, 46, 64-70, 180-186 regarding The encoded data may be transmitted directly to destination device 120 via the I/O interface 116 through the network 130A. The encoded data may also be stored onto a storage medium/server 130B for access by destination device 120. The data decoder 124 may decode the encoded data…In Fig. 9, On the decoder side, after obtaining quantized latent ŷ, decoder network gs with parameters ϕgs will take it as input and obtain final reconstruction
x
^
… See Finlay: at least Figs. 1-45 and par. 212-229 regarding In a fourth step, the bitstream 130 may be transmitted across a communication network. In a fifth step, the bitstream is entropy decoded in an entropy decoding process...]; and generating, by a decoder of the receiving computing device and based on the predicted PMF, a plurality of decompressed video frames[See Wu: at least Figs. 1-13 and par. 98-101, 133, 170-194, 198-204 regarding At 1204, the conversion is performed based on the probability distribution. In one example, the conversion may include encoding the data into the bitstream. Entropy decoding (such as arithmetic decoding or Huffman decoding) may be performed on the bitstream based on the probability distribution, so as to obtain the quantized information which may be further processed to reconstruct the data.… See Finlay: at least Figs. 1-45, par. 50-60, 85, 87, 98, 121, 130, 137, 213-218, 353-380, 384-420 regarding In a fifth step, the bitstream is entropy decoded in an entropy decoding process 160. The quantized latent is provided to another trained neural network 120 characterized by a function go acting as a decoder, which decodes the quantized latent. The trained neural network 120 produces an output based on the quantized latent. The output may be the output image of the AI based compression process 100. The encoder-decoder system may be referred to as an autoencoder…Encoding and decoding a stream of discrete symbols (such as the latent pixels in an AI-based compression pipeline) into a binary bitstream may require access to a discrete probability mass function (PMF)…].
Regarding claims 3 and 30, Wu, Finlay and Vafin teach all of the limitations of claims 1 and 29, and are analyzed as previously discussed with respect to those claims. Further on, when combined, Finlay and Vafin teach or suggest wherein an average number of bits corresponds to a cross-entropy of the conditional distribution with respect to the predicted PMF[See Finlay: at least par. 239-251, 476-490 regarding Here R determines the cost of encoding the quantised latents according to the distribution p(ŷ), D measures the reconstructed image quality, and λ is a parameter that determines the tradeoff between low file size and reconstruction quality. A typical choice of R is the cross entropy…Note that D and R are averaged over the entire data distribution… See Vafin: par. 7-10, 21-23, 46, 59 regarding For efficient entropy coding, the available PMF should represent the expected frequencies as accurately as possible. To achieve this, the PMF is conventionally pre-trained at the design stage by using a large set of data that represent symbols to be encoded. This "globally" trained PMF is then pre-stored at the encoder and the decoder…While a (possibly adaptive) globally pre-trained PMF may typically provide a good representation of occurrence of symbol values on average, the present invention improves on this by providing a high-degree of adaptation to the current symbol values (e.g. within a frame). The result is a lower average number of bits per symbol in the output bit stream… So although the present invention does require some extra information about the PMF to be included in the output bitstream, the inventors have recognised that this is outweighed by the reduced number of average bits per symbol achieved by the local adaption based on actual observed symbol values. Therefore, the present invention can still result in a lower average number of bits for coding of symbol values compared to the global PMF techniques, with the sparse local PMF itself being efficiently coded with a low number of bits of side information.].
Regarding claims 4 and 31, Wu, Finlay and Vafin teach all of the limitations of claims 3 and 29, and are analyzed as previously discussed with respect to those claims. Further on, when combined, Finlay and Vafin teach or suggest wherein the predicting of the PMF further comprises: maintaining a coding efficiency of the entropy coding by adjusting the cross-entropy / the functions for the predicting of the PMF further comprising: maintaining a decoding efficiency of the entropy decoding by adjusting the cross-entropy[See Finlay: at least par. 239-251, 476-490 regarding Here R determines the cost of encoding the quantised latents according to the distribution p(ŷ), D measures the reconstructed image quality, and λ is a parameter that determines the tradeoff between low file size and reconstruction quality. A typical choice of R is the cross entropy…Note that D and R are averaged over the entire data distribution….See Vafin: par. 7-10, 21-23, 46, 59 regarding For efficient entropy coding, the available PMF should represent the expected frequencies as accurately as possible. To achieve this, the PMF is conventionally pre-trained at the design stage by using a large set of data that represent symbols to be encoded. This "globally" trained PMF is then pre-stored at the encoder and the decoder…While a (possibly adaptive) globally pre-trained PMF may typically provide a good representation of occurrence of symbol values on average, the present invention improves on this by providing a high-degree of adaptation to the current symbol values (e.g. within a frame). The result is a lower average number of bits per symbol in the output bit stream… So although the present invention does require some extra information about the PMF to be included in the output bitstream, the inventors have recognised that this is outweighed by the reduced number of average bits per symbol achieved by the local adaption based on actual observed symbol values. Therefore, the present invention can still result in a lower average number of bits for coding of symbol values compared to the global PMF techniques, with the sparse local PMF..].
Regarding claim 5, Wu, Finlay and Vafin teach all of the limitations of claim 1, and are analyzed as previously discussed with respect to that claim. Further on, Wu and Finlay teach or suggest wherein the encoder performs a spatial downscaling and increases a channel dimension[See Wu: par. 59-61 regarding Auto-encoder originates from the well-known work proposed by Hinton and Salakhutdinov. The method is trained for dimensionality reduction and consists of two parts: encoding and decoding. The encoding part converts the high-dimension input signal to low-dimension representations, typically with reduced spatial size but a greater number of channels. The decoding part attempts to recover the high-dimension input from the low-dimension representation. Auto-encoder enables automated learning of representations and eliminates the need of hand-crafted features, which is also believed to be one of the most important advantages of neural networks….See Finlay: par. 201, 231 regarding The encoder transforms the input data x into a latent representation y, which is lower-dimensional and in an improved form for further compression…].
Regarding claims 6 and 32, Wu, Finlay and Vafin teach all of the limitations of claims 1 and 29, and are analyzed as previously discussed with respect to those claims. Further on, when combined, Wu and Finlay teach or suggest wherein the encoder is a convolutional neural network (CNN) based image encoder and wherein the decoder is a convolutional neural network (CNN) based image decoder[See Wu: at least Fig. 4 , par. 63-64 regarding an autoencoder implementing the hyperprior model using convolutional neural networks layers…See Finlay: at least par. 314-316, 392, 420, 430-435 regarding An AI-based image and video compression pipeline usually follows an autoencoder structure, which is composed by convolutional neural networks (CNNs) that make up an encoding module and decoding module whose parameters can be optimised by training on a dataset of natural-looking images and video...].
Regarding claim 8, Wu, Finlay and Vafin teach all of the limitations of claim 1, and are analyzed as previously discussed with respect to that claim. Further on, Wu and Finlay teach or suggest wherein the encoding of each frame further comprises a quantization of the quantized representation to an integer grid[See Wu: at least par. 403, 540-545 regarding The use of tensor networks for probabilistic modeling in AI-based image and video compression will now be discussed in more detail. As discussed above, in an AI-based compression pipeline, an input image (or video) x is mapped to a latent variable y, via an encoding function (typically a neural network). The latent variable y is quantized to integer values ŷ, using a quantization function Q. These quantized latents are converted to a bitstream using a lossless encoding method such as entropy encoding as discussed above….See Finlay: at least par. 226, 232, 534-535 regarding Quantisation is a critical step in any AI-based compression pipeline. Typically, quantisation is achieved by rounding data to the nearest integer…].
Regarding claims 9 and 33, Wu, Finlay and Vafin teach all of the limitations of claims 1 and 29, and are analyzed as previously discussed with respect to those claims. Further on, when combined, Wu and Finlay teach or suggest further comprising: applying neural image compression to train one or more of the encoder or the decoder to be respective lossy transforms, wherein a target distortion variable is based on a range of each quantized representation and the functions further comprising: applying neural image decompression to train the decoder to be a lossy transform, wherein a target distortion variable is based on a range of each quantized representation[See Wu: at least par. 59-65, 180-186 regarding auto-encoder network to lossy image compression. Further, the method is trained for dimensionality reduction and consists of two parts: encoding and decoding. The encoding part converts the high-dimension input signal to low-dimension representations, typically with reduced spatial size but a greater number of channels. The decoding part attempts to recover the high-dimension input from the low-dimension representation. Auto-encoder enables automated learning of representations and eliminates the need of hand-crafted features, which is also believed to be one of the most important advantages of neural networks… The framework is trained with the rate-distortion loss function,… D is the distortion between x and
x
^
, R is the rate calculated or estimated from the quantized representation ŷ, and λ is the Lagrange multiplier… See Finlay: at least par. 203-225 regarding AI based compression processes may involve the use of neural networks. A neural network is an operation that can be performed on an input to produce an output. A neural network may be made up of a plurality of layers. The first layer of the network receives the input. One or more operations may be performed on the input by the layer to produce an output of the first layer. The output of the first layer is then passed to the next layer of the network which may perform one or more operations in a similar way. The output of the final layer is the output of the neural network. An example of an AI based compression process 100 is shown in FIG. 1… The output of the discriminator may then be used in the loss function of the compression process as a measure of the distortion of the compression process. Alternatively, the discriminator may receive both the input image 5 and the output image 6 and the difference in output indication may then be used in the loss function of the compression process as a measure of the distortion of the compression process. Training of the neural network acting as a discriminator and the other neutral networks in the compression process may be performed simultaneously. During use of the trained compression pipeline for the compression and transmission of images or video, the discriminator neural network is removed from the system and the output of the compression pipeline is the output image 6..].
Regarding claims 10 and 34, Wu, Finlay and Vafin teach all of the limitations of claims 9 and 33, and are analyzed as previously discussed with respect to those claims. Further on, when combined, Wu and Finlay teach or suggest wherein the training of the one or more of the encoder or the decoder is based on a rate-distortion trade-off loss and wherein the training of the decoder is based on a rate-distortion trade-off loss[See Wu: at least par. 59-65, 90, 180-186, 213 regarding The framework is trained with the rate-distortion loss function,… D is the distortion between x and
x
^
, R is the rate calculated or estimated from the quantized representation ŷ, and λ is the Lagrange multiplier…See Finlay: at least par. 239-257 regarding Here R determines the cost of encoding the quantised latents according to the distribution p(ŷ), D measures the reconstructed image quality, and λ is a parameter that determines the tradeoff between low file size and reconstruction quality. A typical choice of R is the cross entropy.. Altogether, note that the loss function depends explicitly on the choice of quantisation scheme through the R term, and implicitly, because ŷ depends on the choice of quantisation scheme…]
Regarding claims 11 and 35, Wu, Finlay and Vafin teach all of the limitations of claims 1 and 30, and are analyzed as previously discussed with respect to those claims.
wherein the at least one dependency is a temporal dependency[See Wu: at least par. 87-94 regarding Studies on neural network-based video compression can be divided into two categories according to the targeted scenarios: random access and the low-latency. …In low-latency case, it aims at reducing decoding time thereby usually merely temporally previous frames can be used as reference frames to decode subsequent frames…See Finlay: at least par. 423 regarding For image and video data, which exhibits large spatial and temporal redundancy, an autoregressive process termed context modelling can be very helpful to exploit this redundancy in the entropy modelling. In high level, the general idea is to condition the explanation of subsequent information with existing, available information. The process of conditioning on previous variables to realise the next variable implies an autoregressive information retrieval structure of a certain ordering. This concept has proven to be incredibly powerful in AI-based image and video compression and is commonly part of cutting-edge neural compression architectures...].
Regarding claim 14, Wu, Finlay and Vafin teach all of the limitations of claim 1, and are analyzed as previously discussed with respect to that claim. Further on, Wu teaches or suggests wherein the transmitting computing device comprises a camera, and the method further comprising: capturing the plurality of input video frames using the camera; and receiving, by the encoder, the plurality of input video frames from the camera[See Wu: at least Figs. 1-13 and par. 341-350 regarding the computing device 1300 may be implemented as any user terminal or server terminal having the computing capability. The server terminal may be a server, a large-scale computing device or the like that is provided by a service provider. The user terminal may for example be any type of mobile terminal, fixed terminal, or portable terminal, including a mobile phone, station, unit, device, multimedia computer, multimedia tablet, Internet node, communicator, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, personal communication system (PCS) device, personal navigation device, personal digital assistant (PDA), audio/video player, digital camera/video camera, positioning device, … In the example embodiments of performing data encoding, the input device 1350 may receive data as an input 1370 to be encoded. The data may be processed, for example, by the data coding module 1325, to generate an encoded bitstream. The encoded bitstream may be provided via the output device 1360 as an output 1380…].
Regarding claim 22, Wu, Finlay and Vafin teach all of the limitations of claim 2, and are analyzed as previously discussed with respect to that claim. Further on, Wu teaches or suggests wherein the transmitting computing device is the same as the receiving computing device [See Wu: at least Figs. 1-13 and par. 32 and 338 regarding destination device 120 or data decoding device. The computing device 1300 may be implemented as or included in the source device 110 (or the data encoder 114) or the destination device 120 (or the data decoder 124).].
9. Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Wu et al.(US 2024/0414381 A1)(hereinafter Wu) in view of FINLAY et al.(US 2024/0354553 A1)(hereinafter Finlay) in further view of Vafin et al.(US 2011/0206131 A1)(hereinafter Vafin) and in further view of ZHU et al.(US 2023/0100413 A1)(hereinafter Zhu).
Regarding claim 12, Wu, Finlay and Vafin teach all of the limitations of claim 1, and are analyzed as previously discussed with respect to that claim.
Wu, Finlay and Vafin do not explicitly disclose further comprising: splitting the given quantized representation spatially into non-overlapping blocks of size N×N, and wherein the one or more quantized representations that occur prior to the given quantized representation are configured to be overlapping blocks of size M×M, with M>N, wherein each block is spatially flattened to generate one or more tokens for the transformer, and wherein the predicting of the PMF is based on a spatial context and a temporal context derived from the overlapping blocks.
However, splitting quantized representations into non-overlapping blocks to generate tokens for the transformer and predicting PMF based on the non-overlapping blocks was well known in the art at the time of the invention was filed as evident from the teaching of Zhu[See Zhu: at least Figs. 1-10, par. 71, 94-96, 130-144 regarding Once quantization is performed, the coded video bitstream includes quantized transform coefficients, prediction information (e.g., prediction modes, motion vectors, block vectors, or the like), partitioning information, and any other suitable data, such as other syntax data. The different elements of the coded video bitstream may then be entropy encoded by the encoding device. For example, the encoding device may use context adaptive variable length coding, context adaptive binary arithmetic coding, syntax-based context-adaptive binary arithmetic coding, probability interval partitioning entropy coding, or another suitable entropy encoding technique…As described above, many E2E-NNVC systems are designed as combination of an autoencoder sub-network (the encoder sub-network) and a second sub-network responsible for learning a probabilistic model over quantized latents used for entropy coding. Such an architecture can be viewed as a combination of a transform plus quantization module (encoder sub-network) and the entropy modelling sub-network module…The encoder and decoder sub-networks operate over a series of patches (also referred to herein as “patch tokens”). In some cases, the series of patches are initially formed at the encoder sub-network as a non-overlapping segmentation of an input image…FIG. 7B illustrates an example of two different window partitioning configurations, including window partitioning configuration 720 and window partitioning configuration 730. Window partitioning configuration 720 depicts a non-overlapping window partitioning applied over a set of patch tokens, and in some examples may be utilized by the first self-attention layer 622 of FIG. 6A. An example of a non-overlapping window partition is indicated at 722 and an example of one of its constituent patch tokens is indicated at 711. In some examples, the first shifted window transformer block 601 of FIG. 6A can apply the non-overlapping window partitioning configuration 720 using self-attention component 622. In some examples, the non-overlapping window partitioning configuration 720 divides the set of input patch tokens into equally sized windows, shown here as 4×4 windows containing 16 patch tokens, although other window geometries and/or sizes can also be utilized. The shifted window partitioning configuration 730 can utilize windows that are displaced relative to those of the non-overlapping partitioning configuration 720. For example, shifted windows 732 and 734 have been displaced such that they each contain a set of tokens that were previously contained in multiple different ones of the non-overlapping windows of partitioning configuration 720. Because a single shifted window contains patch tokens from multiple non-overlapping windows of the previous self-attention layer, the previously mentioned cross-window connections can thereby be introduced. As illustrated, the shifted window partitioning configuration 730 uses the same 4×4 window size as the non-overlapping window partitioning configuration, with clipping or truncation of the window size where it extends beyond the boundaries of the patch token set. However, in some examples the shifted window portioning configuration 730 and the non-overlapping window partitioning configuration 720 can use different window sizes.].
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Wu, Finlay and Vafin with Zhu teachings by including “further comprising: splitting the given quantized representation spatially into non-overlapping blocks of size N×N, and wherein the one or more quantized representations that occur prior to the given quantized representation are configured to be overlapping blocks of size M×M, with M>N, wherein each block is spatially flattened to generate one or more tokens for the transformer, and wherein the predicting of the PMF is based on a spatial context and a temporal context derived from the overlapping blocks” because this combination has the benefit of improving coding efficiency.
10. Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Wu et al.(US 2024/0414381 A1)(hereinafter Wu) in view of FINLAY et al.(US 2024/0354553 A1)(hereinafter Finlay) in further view of Vafin et al.(US 2011/0206131 A1)(hereinafter Vafin) and in further view of ZHU et al.(US 2023/0100413 A1)(hereinafter Zhu) and in further view of YANG et al.(YANG et al., “Insights from Generative Modeling for Neural Video Compression,” arXiv:2107.13136v1, 28 July 2021, 14 pages; cited in IDS by Applicant)(hereinafter Yang).
Regarding claim 13, Wu, Finlay, Vafin and Zhu teach all of the limitations of claim 12, and are analyzed as previously discussed with respect to that claim.
Wu, Finlay, Vafin and Zhu do not explicitly disclose wherein the predicting of the PMF by the transformer comprises: extracting, by a first transformer, separately from each of the overlapping blocks, temporal information corresponding to the one or more quantized representations that occur prior to the given quantized representation; and mixing, by a second transformer, the extracted temporal information.
However, Yang teaches or suggests wherein the predicting of the PMF by the transformer comprises: extracting, by a first transformer, separately from each of the overlapping blocks, temporal information corresponding to the one or more quantized representations that occur prior to the given quantized representation; and mixing, by a second transformer, the extracted temporal information[See Yang: at least section 2.6 “ Temporal Prior(TP) Extensions” regarding Conditioning on the previous frame p(vt|xt-1)(TP+)…We explore an alternative scheme in which vt is conditioned on the previous reconstruction, xt-1, which maintains a less noisy, more informative and more consistent feature representation throughout training and simplifies the learning procedure. In this scenario, the model also no longer requires the extra prior for p(v2)…(thus, a first transformer extracts temporal information and mixed it with a second transformer)] .
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Wu, Finlay, Vafin and Zhu with Yang teachings by including “wherein the predicting of the PMF by the transformer comprises: extracting, by a first transformer, separately from each of the overlapping blocks, temporal information corresponding to the one or more quantized representations that occur prior to the given quantized representation; and mixing, by a second transformer, the extracted temporal information” because this combination has the benefit of maintaining a less noisy, more informative and more consistent feature representation throughout training and learning process[See Yang: at least section 2.6].
Conclusion
11. Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANA J PICON-FELICIANO whose telephone number is (571)272-5252. The examiner can normally be reached Monday-Friday 9:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christopher Kelley can be reached at 571 272 7331. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Ana Picon-Feliciano/Examiner, Art Unit 2482
/CHRISTOPHER S KELLEY/Supervisory Patent Examiner, Art Unit 2482