DETAILED ACTION
Status of Claims
Claim(s) 1-4, 7-13, and 16-24 are pending and are examined herein.
Claim(s) 1-3, 8-12, and 17-24 have been Amended. Claim(s) 5-6 and 14-15 previously Cancelled.
Claim(s) 1-4, 7-13, and 16-24 are rejected under 35 U.S.C. § 103.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed on February 02, 2026 has been entered. Claims 1-4, 7-13, and 16-24 are pending in the application. Applicant’s amendments to claims have overcome the rejection under 35 U.S.C. § 101 set forth in the Non-Final Office Action mailed on October 31, 2025. Applicant’s amendments to the claims have been fully considered and are addressed in the rejections below.
Response to Arguments
Applicant's arguments with respect to the rejection under 35 U.S.C. § 101, filed on 02/02/2026, have been fully considered and are persuasive. (See Remarks pp. 11-14).
Specifically, the claim recites the neural network processing operation of different layers to generate feature data while iteratively performing quantization using index values of LUT(s) of non-uniform quantization levels. Applicant further argues that the claims recite a combination of features that integrate any putative abstract idea into a practical application directed to efficient storage of feature map representations, and that the claims are directed to an improvement of neural network technology. Accordingly, the rejection is withdrawn.
Applicant's arguments with respect to the rejection under 35 U.S.C. § 103 filed on 02/02/2026 have been fully considered but they are not persuasive.
Applicant’s argument (Pp. 15-17 of the remarks):
Applicant argues that Covell does not discloses or suggest the claimed limitation reciting “wherein the second quantization level differs from the first quantization level based on differences between the first feature data associated with the first index values and the second feature data associated with the second index values.” Applicant contends that Covell’s statements that quantization levels for different layers are “not shared” does not relate to difference between first and second feature data or their underlying feature-map distributions, where quantization levels approximate the underlying distributions of the respective feature maps, as described in Applicant’s disclosure (e.g., Figures 4A- 4B).
Examiner's response: The examiner respectfully disagrees for the following reasons:
Under the broadest reasonable interpretation (BRI), the amended limitation reciting “wherein the second quantization level differs from the first quantization level based on differences between the first feature data associated with the first index values and the second feature data associated with the second index values” merely requires that the quantization levels used to represent feature data of one neural network layer are different from those used to represent feature data of another neural network layer because feature data differs. The recitation of “based on differences” does not require an explicit operation of determining, computing, or comparing differences between first and second feature data. Rather, the limitation is descriptive and defining the results of processing data using multiple layers of neural network, suggesting that different feature data produced by different layers leads to different quantization levels.
As described in the present applicant’s specification (e.g., paragraph [0051]), different neural networks layers inherently generate feature maps with different statistical distribution, and therefore different non-uniform quantization is used. Thus, the claim language merely states the obvious or expected outcome of different feature maps of different layers that are quantized using different quantization levels.
With respect to the applied prior art references, the combination of Covell, Löhdefink, Cardinaux, and Cai collectively and completely teaches all limitations of the currently amended claim. Specifically, the combination teaches a method/system of processing input data through consecutive neural network layers to generate feature maps; representing feature data using index values corresponding to records of each layer non-uniform lookup tables (LUTs), where quantization levels of LUT are assigned to and differ based on the distribution of the respective feature maps; storing the index values in memory; regenerating feature data by cross-referencing the index values using LUTs; and processing the regenerated feature data through subsequent neural network layers. The combination further teaches the non-uniform quantization level of LUTs being estimated through an iterative training process until convergence is achieved.
In particular, Covell discloses processing input data through consecutive neural network layers and using for a give layer a lookup table with layer specific quantization levels. Covell explicitly stating that when quantization levels are not shared across layers, different layers employ different lookup tables (see, e.g., [0051] and [0071]). Because different neural network layers inherently generate different output data (feature data), the use of different quantization levels reflects difference in the layer generated data.
Additionally, the additional references relied upon (i.e., Löhdefink and Cardinaux) emphasize the concept of using separate lookup tables with different non-uniform quantization levels for feature maps. Specifically, Löhdefink and Cardinaux explicitly describe the use of separate lookup tables implementing non-uniform quantization scheme for neural network data (e.g., feature maps and weights) for each layer of the neural network. The non-uniform quantization which uses learned dictionaries and lookup tables to represents neural network data of each given layer, where separate feature-dependent codebooks are adjusted to feature map, such that the different feature maps yield different non-uniform quantization levels based on their data distribution.
Accordingly, the cited references describes lookup tables of each layer with non-uniform quantization levels for feature map data of neural network layers, where quantization levels differs due to the difference in the underlying feature data of different layers.
The same reasoning and rationale applies to corresponding independent claims 10 and 19. For at least the above reasons set forth above, Applicant’s arguments are not persuasive, and the rejection under 35 U.S.C. § 103 based on the combination of Covell, Löhdefink, Cardinaux, and Cai.
The examiner refers to the updated rejection under 35 U.S.C. § 103 for more details.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION. —The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim(s) 8-9, 17-18, and 24 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, for pre-AIA the applicant regards as the invention.
Regarding Currently Amended Dependent Claims 8 and 9, Claim 8 recites the limitation “processing, using the at least one processor, the first feature map using the second layer of the neural network to generate the second feature map; regenerating, using the at least one processor, the second feature data of the second feature map by cross-referencing the second index values with the second LUT; and processing, using the at least one processor, the regenerated second feature data using a fourth layer of the neural network.” and claim 9, which depends from claim 8, further recites “wherein the third layer and the fourth layer are the same layer.”
Claim 8 introduces a fourth neural network layer for processing regenerated feature data without defining its relationship to the third layer recited in claim 1, making it unclear whether the regenerated feature data is processed by the third layer, the fourth layer or both. Claim 9 further states that the third layer and fourth layers are the same layer, which introduce a lack of clarity as to whether the regenerated feature data is processed once or multiple times, and fails to clearly define the sequence operations. Accordingly, the scope of claims 8 and 9 is unclear and fails to particularly point out the metes and bounds of the claim invention. For examination purposes, the Examiner interpreted the limitations as reciting a method in which regenerated feature data is processed by additional one or more neural network layers.
Regarding Claims 17-18, and 24, the claims recite substantially similar limitations as corresponding to claims 8 and 9 with similar issues. Thus, the same rationale applies to the dependent claims 17-18, and 24.
In view of the above, Examiner respectfully requests that Applicant thoroughly review the claims for compliance with the requirements set forth under 35 U.S.C. § 112.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-4, 7-13, 16-20, and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Covell et al., (Pub. No.: US 20200234126 A1) in view of Lohdefink et al., (NPL: “Scalar and Vector Quantization for Learned Image Compression: A Study on the Effects of MSE and GAN Loss in Various Spaces” (September, 2020)), further in view of Cardinaux et al., (NPL: "Iteratively training look-up tables for network quantization." (May, 2020)), and further in view of Cai et al., (NPL: "DEEP IMAGE COMPRESSION WITH ITERATIVE NON-UNIFORM QUANTIZATION." (2018)).
Regarding Currently Amended Claim 1,
Covell discloses the following:
A method comprising: (Covell, [Abstract] “Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing a network input using a neural network to generate a network output for the network input.”)
processing, using at least one processor of an electronic device, input data using a first layer of a neural network to generate a first feature map and a second layer of the neural network to generate a second feature map, wherein the first layer and the second layer are consecutive layers of the neural network; (Covell, [Abstract] “One of the methods includes maintaining, for each of the plurality of neural network layers, a respective look-up table that maps each possible combination of a quantized input index and a quantized weight index to a multiplication result; and generating a network output from a network input, comprising, for each of the neural network layers ….” [0003] “The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.” [0023] “The neural network system 100 is a system that receives data specifying a neural network 110 and then uses the neural network 110 to process network inputs 102 to generate network outputs 150.” [0024] “the system 100 can process network inputs 102 using the special-purpose hardware, e.g., the special-purpose computer chip 130, to generate network outputs 150.” [0029] “More specifically, the neural network includes multiple neural network layers. Each of the neural network layers generates an output by performing multiplications between weights for the neural network layer and layer input values, accumulating subsets of the results of those multiplications, and then, optionally, applying an activation function to the accumulated values.”) [Examiner’s Note: Covell teaches the neural network environment hardware architecture shown in FIG 1 would correspond to the electronic device which includes at least one processor to process the input data to generate feature data. Covell further teaches the quantization process performed for each layer of the neural network where the output of a given layer is input to the following layer (i.e., consecutive layers). The multiple layers of the neural network including first and second layers’ outputs correspond to the claimed first and second feature maps of each layer.]
representing, using the at least one processor, first feature data of the first feature map generated by the first layer of the neural network using first index values, (Covell, [0033] “To generate a network output 150 for the neural network 110, the system 100 identifies the outputs of the multiplications required by the layers of the neural network 110, i.e., the output required to generate accumulated values, using the look-up tables 120 ….” [0042] “The look-up table 210 maps each possible combination of a quantized input index and a quantized weight index to a multiplication result.” [0043] “Each quantized input index represents a quantized input value from a set of possible quantized input values.” [0044] “Similarly, each quantized weight index represents a quantized weight value from a set of possible quantized weight values.” [0063] “The system then uses the activation table 220 to map the accumulated value index to an activation value index, which serves as the activation value index when using the output of the current layer as the input to the next layer in the neural network.” Further described in [0055]-[0075].) the first index values corresponding to multiple records of a first look up table (LUT), (Covell, [0055] “For each of these multiplications, the system determines the quantized input index representing the quantized input value, determines the quantized weight index representing the quantized weight value, and identifies, as the result of the multiplication, the multiplication result mapped to in the table 210 by the determined quantized input index and the determined quantized weight index.” [0071] “The system maintains, for each of the plurality of neural network layers, a respective look-up table that maps each possible combination of a quantized input index and a quantized weight index to a multiplication result (step 302). …etc.”) each of the multiple records of the first LUT comprising a representation of a first quantization level ... of quantization levels of the first feature map; ( [0040] “FIG. 2 illustrates an example of determining an output for one of the neural network layers using look-up tables. In particular, as shown in FIG. 2, the system maintains two look-up tables for the neural network layer: a first look-up table 210 and an activation table 220.” [0051] “When quantization levels for the different layers in the neural network are not shared, i.e., the quantization levels are different ... the table 210 will be different for any two layers that do not share the same quantization levels. “) [Examiner’s Note: the quantized input values (e.g., accumulated values / activation outputs) represent the output data of the previous layer (i.e., feature data). In case the neural network layer (e.g., intermediate layer) applies an activation function, the activation values are quantized using the lookup table before being input to the next layer. Thus, the quantized input values to the next layer (i.e., intermediate data) would represent the feature map data of the previous layer, and the look-up table entries store quantized values (indices) representation used for neural network layer processing. Each lookup table has multiple records (entries), each representing possible quantized value of the layer output.]
representing, using the at least one processor, second feature data of the second feature map generated by the second layer of the neural network using second index values, the second index values corresponding to multiple records of a second LUT, (Covell, [0003] “The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.” [0033] “To generate a network output 150 for the neural network 110, the system 100 identifies the outputs of the multiplications required by the layers of the neural network 110, i.e., the output required to generate accumulated values, using the look-up tables 120 ….” [0037] “That is, when performing the required computation for a given neural network layer, the special-purpose computer chip 130 is only required to load the look-up tables 120 that are relevant to the given neural network layer into the on-chip memory 140 and to use the look-up tables 120 to identify the required multiplication outputs and non-linearity outputs necessary to perform the computation of the given layer for the current input.” [0042] “The look-up table 210 maps each possible combination of a quantized input index and a quantized weight index to a multiplication result.” [0043] “Each quantized input index represents a quantized input value from a set of possible quantized input values.” [0044] “Similarly, each quantized weight index represents a quantized weight value from a set of possible quantized weight values.” [0063] “The system then uses the activation table 220 to map the accumulated value index to an activation value index, which serves as the activation value index when using the output of the current layer as the input to the next layer in the neural network.” Further described in [0055]-[0075].) each of the multiple records of the second LUT comprising a representation of a second quantization level ... of quantization levels of the second feature map, wherein the second quantization level differs from the first quantization level based on differences between the first feature data associated with the first index values and the second feature data associated with the second index values; (Covell, [0051] “When quantization levels for the different layers in the neural network are not shared, i.e., the quantization levels are different for the weights of different layers in the neural network or for the activations generated by different layers of the neural network, the table 210 will be different for any two layers that do not share the same quantization levels. However, given that the neural network is executed layer by layer, only the table 210 for the layer currently being executed needs to be in the on-chip memory of the special-purpose hardware at any given time.”) [Examiner’s Note: Covell teaches the second consecutive layer also quantized using different index values of LUT of different quantization level. In particular, Covell teaches quantization of data at each layer of a neural network using look-up tables (LUTs) and corresponding indices for each layer. The quantized input data to the next layer would correspond to the “second input data” processed by the third layer of the neural network. Additionally, Covell describes that quantization levels differ between layers for neural network data generated by different layers. Since the feature data distribution differs between layers, the quantization levels are likely to be different.]
storing, using the at least one processor, the first index values and the second index values in a memory of the electronic device; (Covell, [0036] “More particularly, the tables 120 required to perform input-weight multiplications and to apply activation functions for any given layer can be stored in the on-chip memory 140 of the special-purpose hardware, e.g., on-chip memory of an FPGA or an ASIC on which the neural network computations are being performed …etc.” [0042] “The look-up table 210 maps each possible combination of a quantized input index and a quantized weight index to a multiplication result.” [0068] “the system can load, e.g., only the portion of the weight data corresponding to the weights of the current neural network layer being processed, onto the special purpose hardware when loading the first table 210 and the activation table 220 onto the special purpose hardware.” [0071] “The system maintains, for each of the plurality of neural network layers, a respective look-up table that maps each possible combination of a quantized input index and a quantized weight index to a multiplication result (step 302).”)
regenerating, using the at least one processor, the second feature data of the second feature map by cross-referencing the second index values with the second LUT; (Covell, [0033] “To generate a network output 150 for the neural network 110, the system 100 identifies the outputs of the multiplications required by the layers of the neural network 110, i.e., the output required to generate accumulated values, using the look-up tables 120 instead of performing the multiplications in software or using hardware multipliers.” [0060]-[0063] “The activation table 220 maps each of a plurality of accumulated value indices that each represent a possible accumulated value from the plurality of possible accumulated values to a respective quantized input index that represents the quantized input value that is generated by applying the activation function for the neural network layer ... The system then uses the activation table 220 to map the accumulated value index to an activation value index, which serves as the activation value index when using the output of the current layer as the input to the next layer in the neural network.” [0072]-[0075] “The system receives data specifying a quantized input to the neural network layer that includes a plurality of quantized input values (step 304). For each multiplication between a quantized weight and a corresponding quantized input value that is required to generate the layer output for the neural network layer, the system determines the quantized input index representing the quantized input value and determines the quantized weight index representing the quantized weight value (step 306). For each multiplication between a quantized weight and a corresponding quantized input value that is required to generate the layer output for the neural network layer, the system identifies, as the result of the multiplication, the multiplication result that is mapped to in the respective look-up table for the neural network layer by the determined quantized input index and the determined quantized weight index (step 308). The system then generates a plurality of accumulated values by, for each accumulated value, summing the identified results of the corresponding subset of the plurality of multiplications (step 310).”) [Examiner’s Note: The generated index values corresponding to the quantized inputs and weights are mapped using the look-up table (LUT). The generated accumulated values are then mapped using index of the activation look-up table to generate layer output, which is then passed to the next layer as input. Thus, the output data of the previous layer are retrieved using index values stored in the look-up table and then passed to the preceding layer, which correspond to the claimed limitation regenerating the feature data by cross-referencing the index values with the LUT.]
processing, using the at least one processor, the regenerated second feature data using a third layer of the neural network; (Covell, [0070]-[0075] “FIG. 3 is a flow diagram of an example process 300 for generating a network output from a network input. For convenience, the process 300 will be described as being performed by a system of one or more computers located in one or more locations. For example, a neural network system, e.g., the neural network system 100 of FIG. 1, appropriately programmed in accordance with this specification, can perform the process 300. The system maintains, for each of the plurality of neural network layers, a respective look-up table that maps each possible combination of a quantized input index and a quantized weight index to a multiplication result (step 302). As described above, when all of the neural network layers use the same quantization levels, this look-up table can be shared between all of the neural network layers. When the neural network layers use different quantization levels, this look-up table is only shared among neural network layers that use the same quantization levels.”) [Examiner’s Note: Covell teaches that the quantized input data from the current layer are then input into subsequent layer, which correspond to processing feature data using a third layer of the neural network.]
wherein the representation of the first quantization level of... the first feature map comprised in the multiple records of the first LUT and the representation of the second quantization level of... the second feature map comprised in the multiple records of the second LUT are estimated in an iterative training process ... (Covell, [0079]-[0082] “the system perform quantization-aware training of the neural network on a set of training data in order to ensure that the trained neural network can still perform well after being quantized. ... the system can periodically update the quantization scheme during the quantization-aware training. In particular, the system identifies new quantization centers once every S training steps, where S is a constant value, e.g., 1000, 500, or 250. ... Nonetheless, these periodic reinforcements of the selected levels ensures that the final quantization event is not detrimental to performance after training.”)
As explained above, Covell teaches quantizing the input values of each layer (i.e., output of the current layer) and using index values with look-up table to represent the quantized data. The look-up tables including the index values are stored in on-chip memory of the special-purpose hardware device. Covell also teaches the use of different quantization levels and quantization schemes to each layer of the neural network layers, see [0047]-[0051]. Additionally, Covell discuss the repeated process and access of look-up tables and index values to quantize inputs and generate outputs for each neural network layer efficiently (i.e., iterative training process), enabling faster computation with minimal computational resources. However, Covell is salient on whether the quantization scheme that is used to map index values of lookup table using different quantization levels for different layers is non-uniform quantization. Additionally, Covell does not define the iterative training process as based on repeated extraction and re-estimation until a stable LUT is obtained. Accordingly, Covell does not appear to explicitly teach:
a non-uniform distribution of quantization levels of the feature map;
wherein the representation of the first quantization level of the non-uniform distribution of quantization levels of the first feature map comprised in the multiple records of the first LUT and the representation of the second quantization level of the non-uniform distribution of quantization levels of the second feature map comprised in the multiple records of the second LUT are estimated in an iterative training process based on repeated extraction and re-estimation until stable LUTs are obtained.
However, Covell in view of Löhdefink teaches the following:
each of the multiple records of the first LUT comprising a representation of a first quantization level of a non-uniform distribution of quantization levels of the first feature map; ... each of the multiple records of the second LUT comprising a representation of a second quantization level of the non-uniform distribution of quantization levels of the second feature map, wherein the second quantization level differs from the first quantization level based on differences between the first feature data associated with the first index values and the second feature data associated with the second index values; (Löhdefink, [P. 3, Section (A)] “Non-uniform scalar quantization, on the other hand, improves the accuracy because the decision boundaries can be arbitrarily chosen, reducing the quantization error due to a higher resolution of codebook entries in regions where most of the data lies. ... An even stronger fit to the training data results when using separate non-uniform quantizers for each of the feature maps to be quantized. This yields c feature-dependent codebooks, where c denotes the number of bottleneck feature maps.” [P. 4, Section (B)] “The quantizer takes on different modes of operation for the forward pass and the backward pass. In the forward pass, each vector vn is assigned a symbol sn 2 S by searching for the nearest neighbor of the input in the codebook, which results in the vector s = (s1; :::; sN)T with N = h_w_c d being the total number of quantizer symbols (or patches) representing the encoder output data v or r. Using the codebook as a lookup table for the symbols s yields the reconstructed (quantized) patches which are subsequently recomposed to have the initial dimensions. In the backward pass, instead of searching for the nearest neighbor in the codebook, the differentiable approximation of the quantization function is applied.” [P. 4, Section (C)] “The quantization used in the forward pass can be calculated by simply using the vectorial argmin function over the distance: vˆn = argmin c∈CB kvn − ck 2 = csn (5) The code s in the forward pass consists of the sequence of the codebook indices sn, resulting from quantization of the input vectors vn.” [P.5, Section (D)] “After pretraining for 60 epochs, we extract the encoded training data (latent space representation) and use it in the codebook search, computing the optimal step size ∆ (uniform SQ) or performing the LBG algorithm [39] (non-uniform SQ and VQ), whereby we use {1, 2, 3} bit per latent space pixel as codebook sizes. Grouping d = 4 pixels to a vector, we end up with {4, 8, 12} bit for vector quantization and a bitrate of {0.03125, 0.06250, 0.09375} bit per input image pixel (bpp) consistently for all quantization approaches, see (8) and (9).”) [Examiner’s Note: the paper explicitly discloses the use of non-uniform scalar quantizer for feature maps where separate non-uniform quantization for a group of feature maps is used. Each codebook entry corresponds to one quantization level (a scalar reconstruction value). The quantization levels are non-uniformly distributed and indices correspond to the entries, and separate codebooks (LUTs) are created for feature maps. This defines differential quantization condition.]
Accordingly, at the effective filing date, it would have been prima facie obvious to one ordinarily skilled in the art of machine learning to modify the method of Covell to incorporate the quantization approaches as taught by Löhdefink in order to reduce the bitrate of the feature representation beyond the mere topological compression, and far below the typical 8, 16 or 32 bit number representations (Löhdefink [Section IV]).
While Covell in view of Löhdefink teaches the repeated process of representing feature data (output of each layer) using index of a lookup table (LUT), and the use of separate codebook (lookup table) with non-uniform quantization levels for feature maps, where the decision boundaries are adjusted to the data distribution of the feature being quantized. Covell in view of Löhdefink does not appear to explicitly teach:
wherein the representation of the first quantization level of the non-uniform distribution of quantization levels of the first feature map comprised in the multiple records of the first LUT and the representation of the second quantization level of the non-uniform distribution of quantization levels of the second feature map comprised in the multiple records of the second LUT are estimated in an iterative training process based on repeated extraction and re-estimation until stable LUTs are obtained.
However, Cardinaux, in combination with Covell in view of Löhdefink, teaches:
wherein the representation of the first quantization level of the non-uniform distribution of quantization levels of the first feature map comprised in the multiple records of the first LUT and the representation of the second quantization level of the non-uniform distribution of quantization levels of the second feature map comprised in the multiple records of the second LUT are estimated in an iterative training process based on repeated extraction and re-estimation ...(Cardinaux, [P. 2, Section: I & A] “We introduce LUT-Q, a trainable non-uniform quantization method which reduces the size and computational complexity of a DNN. We propose an update rule to train DNNs which use LUTQ. The update rule is a combination of a gradient descent and a k-means update, which can jointly learn the optimal weight dictionary d and assignment matrix A. Fig. 1 illustrates our LUT-Q training scheme.” [p. 2, Section: II] “The LUT-Q approach takes the best of the latter two methods: for each layer, we jointly update both dictionary and weight assignments during training. This approach to compression is similar to Deep Compression [10] in the way that we learn a dictionary and assign each weight in a layer to one of the dictionary’s values. However, we run k-means iteratively during training and update both the assignments and the dictionary at each mini-batch iteration.” [p. 3, Section: III-A] “we unroll the k-means updates over the training iterations, meaning that we just perform one k-means update for each forward pass. This considerably reduces the computational complexity of each forward pass, but is sufficiently accurate for training if we assume that the continuous weights W do not change much between iterations (which is always the case if we use a sufficiently small learning rate η). Algorithm 1 summarizes the LUT-Q update steps for a minibatch {X, T}, where X denotes the minibatch data and T the corresponding ground truth. We denote the layer index by l and the total number of layers by L. K(l) is the number of values in the dictionary d(l) . In the forward/backward pass, we use the current quantized weights {Q(1),..., Q(L)} in order to obtain the cost C and the gradients {G(1),..., G(L)}. These gradients are used to update the full precision weights {W(1),...,W(L) }. Finally, using M steps of k-means after each minibatch, we update the dictionaries {d(1),..., d(L) } and the assignment matrices {A(1),..., A(L)}. In all our experiments of Section VI, we use M = 1. k-means ensures that LUT-Q is a good approximation of the full precision weights. ... After this initialization, for each layer, we run k-means in order to obtain the initial dictionary and assignment matrix2.” ) [Examiner’s Note: Cardinaux teaches an iterative training look-up tables for neural network quantization. This involves a trainable non-uniform quantization method using look-up tables (LUT-Q) which uses: learned dictionaries d ∈ RK and lookup tables to represent the network weights. The dictionary d and assignment matrix A are iteratively updated during training via k-means. Thus, the dictionaries and assignments matrices are all jointly updated at each training iteration through repeated k-means re-estimation. Cardinaux explicitly describes the LUT-Q is a non-uniform quantization, which uses learned dictionaries to represents different weights of each neural network layers. Examiner notes that while LUT-Q Training primarily discusses weight quantization, the disclosure also indicates that LUT-Q can be applied to quantize both weights and activations of the network, which correspond to feature data, see Sections (E).]
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Covell, Löhdefink, and Cardinaux before them, to incorporate the iterative training non-uniform quantization algorithm as taught by Cardinaux in order to quantize the weights and activations of the network, reducing the size and computational complexity of a DNN, and allowing DNNs to operate efficiently on resource-constrained devices (Cardinaux [Abstract]).
While the combination of Covell, Löhdefink, and Cardinaux teaches the iterative training of non-uniform quantization levels of neural network data (i.e., feature data), and describes the iterative training process that jointly and repeatedly re-estimate LUTs using k-means for each neural network layer. The combination of Covell, Löhdefink, and Cardinaux does not appear to explicitly suggest that the iterative training involves repeated extraction and re-estimation until stabilizes. However, it would have been obvious in view of Cai.
Hereinafter, Cai, in combination with Covell, Löhdefink, and Cardinaux, teaches the limitation:
wherein the representation of the first quantization level of the non-uniform distribution of quantization levels of the first feature map comprised in the multiple records of the first LUT and the representation of the second quantization level of the non-uniform distribution of quantization levels of the second feature map comprised in the multiple records of the second LUT are estimated in an iterative training process based on repeated extraction and re-estimation until stable LUTs are obtained. (Cai, [Pp. 451, Section: 1] “we propose an iterative non-uniform quantization strategy to train a deep CNN compressor. The quantizer and encoder-decoder are trained in an alternative optimization manner. With fixed quantizer, an encoder-decoder network is trained to minimize the
l
1
loss between the input and reconstructed images. While with the fixed encoder-decoder network, an optimal non-uniform quantizer is adaptively learned based on the distribution of encoding coefficients to minimize the quantization error. The quantizer and encoder-decoder are alternatively and iteratively updated till convergence.” [P. 452, Section: 3.2] “we first train the encoder-decoder network without the quantizer. …, When the parameters Ω and Θ are learned, the latent representation of an image
z
can be obtained by
z
=
E
X
,
Ω
. With a set of latent representations of training images, we can easily compute
p
(
z
)
, the probability density function (PDF) of
z
. The optimal quantizer can be solved as follows to minimize the quantization error: …(see Equation 3) Given a number of decision intervals
M
, the optimal quantizer is expected to find the set of decision boundaries
{
b
q
}
0
M
and quantized values
{
z
^
q
}
1
M
. …, The optimal solutions of Eq.(5) can be easily solved by the Lloyd’s algorithm [17], outputting the optimal quantizer Q with decision boundaries
{
b
q
}
0
M
and quantized values
{
z
^
q
}
1
M
. With the obtained quantizer fixed, we can use Eq.(2) to fine-tune the encoder-decoder network by minimizing the
l
1
-norm error between input and reconstructed images. The updated encoder-decoder network can then be used to update the non-uniform optimal quantizer by solving Eq.(5). Such an alternative optimization process continues till the loss function in Eq.(2) converges.”) [Examiner’s Note: Cai teaches: the extracted latent representation z (feature representation) from the encoder at each iteration, the quantizer Q creates a discrete representation, the distribution p(z) of the extracted feature data is computed and the non-uniform quantizer is re-estimated based on that extracted feature data distribution using Lloyd’s algorithm that finds the optimal quantized values and boundaries iteratively, and refined through the process of minimize quantization error until the loss function converges.]
Therefore, it would have been prima facie obvious to one of ordinary skill in the art, before the effective date of the claimed invention, having the combination Covell, Löhdefink, and Cardinaux, to incorporate the iterative non-uniform quantization training method as taught by Cai. One would have been motivated to make such a combination in order to iteratively compute the best quantization intervals and quantized values, update a non-uniform quantizer to reduce quantization error. Doing so would improve compression performance (Cai [Pp. 450-451, Intro]).
Regarding Currently Amended Claim 2, the combination of Covell, Löhdefink, Cardinaux, and Cai teaches the elements of claim 1 as outlined above, and further teaches:
wherein each of the multiple records of the first LUT includes one index value of the first index values and a single quantization level identified by the one index value. (Covell, [0042] The look-up table 210 maps each possible combination of a quantized input index and a quantized weight index to a multiplication result. [0043]-[0044] “Each quantized input index represents a quantized input value from a set of possible quantized input values. Similarly, each quantized weight index represents a quantized weight value from a set of possible quantized weight values.” [0051] “When quantization levels for the different layers in the neural network are not shared, i.e., the quantization levels are different for the weights of different layers in the neural network or for the activations generated by different layers of the neural network, the table 210 will be different for any two layers that do not share the same quantization levels.”) [Examiner’s Notes: Each index corresponds to one quantized value (e.g., either quantized input or weight).]
Regarding Currently Amended Claim 3, the combination of Covell, Löhdefink, Cardinaux, and Cai teaches the elements of claim 1 as outlined above, and further teaches:
wherein a size of the first LUT and a size of the second LUT each corresponds to a bit precision of the memory of the electronic device. (Covell, [0061] “Thus, the activation table 220 is generally of size Nx, where Nx is the number of quantization steps needed to fully span a quantized range of outputs of the activation function. Note that, in some cases, the number of entries in the activation table can be more than Na (the number of distinct quantized activation levels) if the activation function does not change level at a uniform rate (e.g., quantized tan h).” [0071].)
Regarding Original Claim 4, the combination of Covell, Löhdefink, Cardinaux, and Cai teaches the elements of claim 1 as outlined above, and further teaches:
wherein the memory of the electronic device comprises on-chip dynamic random access memory (DRAM) or static random access memory (SRAM). (Covell, [0024] “special-purpose computer chip 130, e.g., an ASIC or an FPGA, that has on-chip memory 140 and uses hardware acceleration to perform the operations required by the neural network.” [0091] Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both.” Further described in [0037].)
Regarding Original Claim 7,
the combination of Covell, Löhdefink, Cardinaux, and Cai teaches the elements of claim 1 as outlined above, and further teaches:
wherein the input data is associated with one or more images or videos. (Covell, [0014] “In some cases, the neural network is a convolutional neural network that is configured to receive an input image and to process the input image to generate a network output for the input image, i.e., to perform some kind of image processing task.”)
Regarding Currently Amended Claim 8,
the combination of Covell, Löhdefink, Cardinaux, and Cai teaches the elements of claim 1 as outlined above, and further teaches:
Claim 8 recites substantially similar limitations as claim 1, claim 8 further introduce the concept of processing input data using a fourth layer.
Covell teaches quantization of data at each layer of a neural network using look-up tables (LUTs) and corresponding indices for each layer. The quantized input data to the subsequent layer would correspond to the “second feature data” processed by the fourth layer of the neural network. Thus, the same prior art mapping and rationale applied in claim 1 applies to claim 8.
Regarding Currently Amended Claim 9,
the combination of Covell, Löhdefink, Cardinaux, and Cai teaches the elements of claim 1 as outlined above, and further teaches:
wherein the third layer and the fourth layer are the same layer. (Covell, [Abstract] “One of the methods includes maintaining, for each of the plurality of neural network layers, a respective look-up table that maps each possible combination of a quantized input index and a quantized weight index to a multiplication result; and generating a network output from a network input, comprising, for each of the neural network layers: receiving data specifying a quantized input to the neural network layer, the quantized input comprising a plurality of quantized input values; and generating a layer output for the neural network layer from the quantized input to the neural network layer using the respective look-up table for the neural network layer.”) [Examiner’s Note: the plurality of neural network layers would include the next layer being processed using the output data (feature data) from the previous layer.]
Regarding Currently Amended Claim 10,
The claim recites substantially similar limitation as corresponding claim 1 and is rejected for similar reasons as claim 1 using similar teachings and rationale. Claim 1 is directed to a method, and claim 10 is directed to An electronic device comprising: at least one memory configured to store instructions; and at least one processing device configured when executing the instructions...
Covell also discloses “FIG. 1 shows an example neural network system 100. The neural network system 100 is an example of a system implemented as computer programs on one or more computers in one or more location.”, see [Abstract] and [0022].
Regarding Currently Amended Claim 11,
The claim recites substantially similar limitations as corresponding claim 2 and is rejected for similar reasons as claim 2 using similar teachings and rationale.
Regarding Currently Amended Claim 12,
The claim recites substantially similar limitations as corresponding claim 3 and is rejected for similar reasons as claim 3 using similar teachings and rationale.
Regarding Original Claim 13,
The claim recites substantially similar limitations as corresponding claim 4 and is rejected for similar reasons as claim 4 using similar teachings and rationale.
Regarding Original Claim 16,
The claim recites substantially similar limitations as corresponding claim 7 and is rejected for similar reasons as claim 7 using similar teachings and rationale.
Regarding Currently Amended Claim 17,
The claim recites substantially similar limitations as corresponding claim 8 and is rejected for similar reasons as claim 8 using similar teachings and rationale.
Regarding Currently Amended Claim 18,
The claim recites substantially similar limitations as corresponding claim 9 and is rejected for similar reasons as claim 9 using similar teachings and rationale.
Regarding Currently Amended Claim 19,
The claim recites substantially similar limitation as corresponding claim 1 and is rejected for similar reasons as claim 1 using similar teachings and rationale. Claim 1 is directed to a method, and claim 19 is directed to a non-transitory machine-readable medium containing instructions that when executed cause at least one processor of an electronic device ... .
Covell also discloses “Computers suitable for the execution of a computer program can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. …, Computer readable media suitable for storing computer program instructions and data include all forms of non volatile memory, media and memory devices,” See [0091]-[0092].
Regarding Currently Amended Claim 20,
The claim recites substantially similar limitations as corresponding claim 2 and is rejected for similar reasons as claim 2 using similar teachings and rationale.
Regarding Currently Amended Claim 23,
The claim recites substantially similar limitations as corresponding claim 3 and is rejected for similar reasons as claim 3 using similar teachings and rationale.
Regarding Currently Amended Claim 24,
The claim recites substantially similar limitations as corresponding claim 8 and is rejected for similar reasons as claim 8 using similar teachings and rationale.
Claim(s) 21 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Covell, Löhdefink, Cardinaux, and Cai as described above, and further in view of Cai(2) et al., (NPL: "Learning a single tucker decomposition network for lossy image compression with multiple bits-per-pixel rates." (2018)).
Regarding Currently Amended Claim 21, the combination of Covell, Löhdefink, Cardinaux, and Cai teaches the elements of claim 1 as outlined above, and further teaches:
As outlined above, the combination of Covell, Löhdefink, Cardinaux, and Cai teaches an iterative non-uniform quantization scheme to refine and find the optimal quantization levels for representing feature data using LUT of index values.
Cai, in combination with Covell, Löhdefink, Cardinaux, further teaches:
extracting scalar samples from the first feature map for at least the first layer; (Cai, [Section: 3.2] “we first train the encoder-decoder network without the quantizer. …, When the parameters Ω and Θ are learned, the latent representation of an image
z
can be obtained by
z
=
E
X
,
Ω
.”)
estimating a distribution of feature map values for the first layer based on the scalar samples as an array of feature map values; (Cai, [Section: 3] “With a set of latent representations of training images, we can easily compute
p
(
z
)
, the probability density function (PDF) of
z
. The optimal quantizer can be solved as follows to minimize the quantization error: (see Equation 3) Given a number of decision intervals
M
, the optimal quantizer is expected to find the set of decision boundaries
{
b
q
}
0
M
and quantized values
{
z
^
q
}
1
M
.”) [Examiner’s Note: the probability density function of z broadly interpreted as estimating a distribution of feature map values.]
the combination of Covell, Löhdefink, Cardinaux, and Cai does not appear to explicitly teach:
estimating a distribution of feature map values for the first layer based on the scalar samples as a flattened and detached array of feature map values; performing an estimation operation based on the array of feature map values; and adjusting quantization boundaries of the first LUT based on the estimation operation to obtain a revised LUT.
However, Cai(2), in combination with Covell, Löhdefink, Cardinaux and Cai, teaches the limitations:
extracting scalar samples from the first feature map for at least the first layer; (Cai(2), [P. 3, Col. 2, Section: B] “Given the latent image representation
z
i
, we use
{
Y
,
U
(
1
)
,
U
(
2
)
,
U
(
3
)
}
=
T
(
z
i
)
to decompose the features into 3 orthogonal matrices
{
U
(
3
)
}
n
=
1
3
and a core tensor Y, and then quantize the decomposed components to generate bitstream.” Further see Page 6 section B.) estimating a distribution of feature map values for the first layer based on the scalar samples as an array of feature map values; (Cai(2), [P. 2, Col.2 2nd Paragraph] “The key component of TDNet is a novel tucker decomposition layer (TDL), which decomposes the latent image representation into a set of projection matrices and a compact core tensor. By changing the rank of core tensor and its quantization levels, we can easily adjust the bpp rate of latent image representation, and thus a single CNN model can be trained to compress and reconstruct images under multiple bpp rates. Besides, we propose an iterative non-uniform quantization strategy to obtain the optimal quantization boundaries based on the distribution of encoding coefficients. A coarse-to-fine training strategy is introduced to train a stable TDNet and reconstruct the decompressed images.” [P. 6, Col. 2, Section: B] “Quantization and de-quantization: Since the core tensor Y has both positive and negative values, we take one bit to represent the sign of the original value. Let |Y| denotes the absolute value of the core tensor. With a set of training images, we can easily compute p(|Y|), the probability density function (PDF) of the positive core tensor |Y|. The optimal quantizer can be solved as follows by minimizing the quantization error: …. See equation (16).”) [Examiner’s Note: examiner interpreted the scalar samples as an array of feature map values as the obtained and decomposed laten representation. The probability density function of tensor values correspond to the estimating a distribution of feature map values.] performing a estimation operation based on the array of feature map values; (Cai(2), [P. 6, Col. 2, Section: B] “The optimal quantizer can be solved as follows by minimizing the quantization error: ….see equation (16). Given a number M of decision intervals, the optimal quantizer is expected to find the set of decision boundaries
{
b
q
}
0
M
and quantized values
{
Y
^
q
}
1
M
. Solving the partial derivative of Eq.(16), we could have: …. See equation (17), The optimal solutions of Eq.(17) can be easily solved by the Lloyd‘s algorithm [38], outputting the optimal quantizer Q(·) with decision boundaries
{
b
q
}
0
M
and quantized values
{
Y
^
q
}
1
M
.”) and adjusting quantization boundaries of the first LUT based on the estimation operation to obtain a revised LUT. (Cai(2), [P. 6, Col.2, Section: V] “With the initialized TDL, we can use Eq.(4) or Eq. (3) to jointly fine-tune the encoder-TDL-decoder network by minimizing the loss function. The latent image representations at multiple bpp rates will be taken into consideration during the training process. In each training epoch, we first decide which group of ranks and decision boundaries will be used by calculating gˆ = mod(epoch, G), then take this group of desired output ranks and decision boundaries
{
R
1
,
R
2
,
R
3
,
{
b
q
}
0
M
}
g
^
+
1
(ˆg+1) to update the TDL and fine-tune the parameters
{
Ω
,
Θ
,
Π
}
of the whole network. To obtain G groups of optimal decision boundaries
{
{
b
q
}
0
M
}
g
=
1
G
and network parameters
{
Ω
,
Θ
,
Π
}
, an iterative training scheme can be used, i.e., fix the encoder-decoder network to update the TDL decision boundaries
{
{
b
q
}
0
M
}
g
=
1
G
by solving Eq.(17), and fix the TDL to update the network parameters
{
Ω
,
Θ
,
Π
}
. Such an alternative optimization process continues till the loss function in Eq.(4) or Eq. (3) converges. After the TDNet converges, we can use the optimal decision boundaries
{
{
b
q
}
0
M
}
g
=
1
G
and network parameters
{
Ω
,
Θ
,
Π
}
to compress and reconstruct images with different bpp rates. The overall all-in-one training scheme is summarized as Algorithm 2.”) [Examiner’s Note: examiner interpreted adjusting quantization boundaries based on estimation operation as the iterative ranking, updating, fine-tune to obtain the optimal quantization boundaries until loss function converges. The examiner refers to the process described in algorithm 1 and 2 on Page 6.]
Accordingly, it would have been prima facie obvious to one having ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Covell, Löhdefink, Cardinaux and Cai, to incorporate the deep Tucker Decomposition Network (TDNet) as taught by Cai(2). One would have been motivated to make such a combination in order to achieve the objective of multiple bpp rates with a single network. Doing so would provide highly competitive performance (Cai(2) [VII]).
Claim(s) 22 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Covell, Löhdefink, Cardinaux, and Cai as described above, and further in view of Hsu et al., (Pub. No.: US 11861452 B1).
Regarding Currently Amended Claim 22,
the combination of Covell, Löhdefink, Cardinaux, and Cai teaches the elements of claim 1 as outlined above, and further teaches:
As explained above, Covell teaches quantization of input values represented by index values of a lookup table LUT. Thus, Covell teaches the first data (input values) and second data (index and LUT).
Covell, Löhdefink, Cardinaux, and Cai do not appear to explicitly teach:
wherein: the first feature data of the first feature map comprises first data; the first index values and the first LUT comprise second data; and the second data is smaller than the first data.
However, it would have been obvious in view of Hsu. Hereinafter, Hus, in combination with Covell, Löhdefink, Cardinaux, and Cai, teaches the limitation:
wherein: the first feature data of the first feature map comprises first data; (Hus, [Col. 5, Lines 45-50] “Method 400 begins with operation 402 receiving, at an input to a softmax layer of a neural network from an intermediate layer of the neural network, a non-normalized output comprising a plurality of intermediate network decision values.”) the first index values and the first LUT comprise second data; (Hus, [Col. 5, Lines 5-10] “In the lookup table according to various embodiments, the index of the lookup table represents the distance between the current input and the maximum possible value of the input.” [Col.5, Lines 63-70] “A corresponding lookup table value is then requested from a lookup table in operation 406 using the difference between the intermediate network decision value and the maximum network decision value for each intermediate network decision value of the plurality of intermediate network decision values.”) and the second data is smaller than the first data. (Hus, [Col. 6, Lines 25-32] “In other embodiments, matching bits values for inputs and outputs to the softmax layer are used (e.g., eight bits, 24 bits, etc.). In other embodiments, with significant reduction in the number of table entry values, the number of output bits can be smaller than the number of input bits.” Further see [Col. 5, Lines 10-20]) Examiner’s Note: the examiner notes the non-normalized output of intermediate network decision values would read on the feature data of the feature map “first data.” The index of the lookup table would correspond to the “second data.” The quantized representation of output values using the index of lookup table is smaller than the input value.]
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Covell, Löhdefink, Cardinaux, and Cai before them, to incorporate the method of generating a single compact lookup table for a quantized softmax layer as taught by Hus. One would have been motivated to make such a combination in order to enable improvements to a device by reducing memory resources for softmax operations and further reducing the associated processing resources for softmax operations when compared with similar operations using larger tables or deconstructed index entries (Hus [Col. 2, Lines 10-15]).
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SADIK ALSHAHARI whose telephone number is (703)756-4749. The examiner can normally be reached Monday Friday, 9 A.M - 6 P.M. ET..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/S.A.A./Examiner, Art Unit 2121
/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121