Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 2/2/2026 has been entered.
Response to Arguments
Applicant’s arguments have been considered but are moot in view of new grounds of rejections.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 5-6, 8, 12, 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over US 20220044114A1-Sriram et al (Hereinafter referred to as “Sri”), in view of US 20230139347 A1-Bondarenko et al (Hereinafter referred to as “Bond”).
Regarding claim 1, Sri discloses an apparatus comprising at least one processor ([0002]); and at least one non-transitory memory comprising instructions that, when executed with the at least one processor ([0118), cause the apparatus at least to perform:
determine one or more quantization parameters (quantizers) based at least on one or more of the following ([0112], wherein the weights can be quantized by finding the absolute maximum value):
a mean absolute value computed based on a set of parameters of a neural network comprising a parameter;
a maximum absolute value computed based on a set of activations of the neural network comprising an activation ([0113], wherein both weights and activations are uniformly quantized during the forward-pass of the training using the absolute maximum value of weights and a running average of absolute maximum value of activations) ;
a number of parameters in the set of parameters of the neural network
comprising the parameter ([0113]); or
a maximum absolute value computed based on an output value computed based on the parameter and the activation; and
quantize at least one of the parameters or the activation based at least on the one or more quantization parameters ([0112]. Wherein parameters may be quantized; [0119], quantize the parameters))
overfitting one or more multiplier parameters, wherein the one or more multiplier parameters are used for scaling one or more activations to a range ([0098]), and wherein values of the one or more multiplier parameters are determined based at least on a training process [0247]; According to instant applicant’s publication, [0493], overfitting refers to training or fine tuning a NN. Therefore, to be consistent with applicant’s specification, Sri discloses DNN training in [0060]. Sri also discloses in [0062], during training of a neural network first applies QAT to generate a first trained model and then applies PTQ on the first trained model to output a second trained model that has parameters (e.g., weights and activations) represented by low-bit integers. In addition, Sri discloses in [0070], QAT may be applied by quantizing all weights and activations of the neural network except for layers that require finer granularity in representation than the 8-bit quantization can provide (e.g., regression layers). In some embodiments, QAT applies quantization on all weights and activation except for the last layer of the neural network. This may result in a mixed-precision DNN model. Weights may refer to the parameter within a neural network that transforms input data within the neural network's hidden layers. Within each node of the neural network, there is a set of inputs, weights, and a bias value and as the input enters each node, a weight is multiplied on the input, a bias is added and an activation function is applied. The resulting output is either observed, or passed to the next layer in the neural network. Each input node takes in information that can be numerically expressed (e.g., activation values) where each node is given a number. This node is then passed through the neural network during training.
Sriram fails to disclose wherein the one or more quantization parameters are determined further based at least on one or more of the following: a maximum absolute value of one or more kernel weights, a scaling factor of one or more input activations, a precision of an accumulator, or an approximate maximum value in accumulation.
However, in the same field of endeavor, Bond discloses wherein the one or more quantization parameters are determined further based at least on one or more of the following: a maximum absolute value of one or more kernel weights, a scaling factor of one or more input activations, a precision of an accumulator, or an approximate maximum value in accumulation ([0050], wherein For example, distinct scaling factors and zero-points may be applied per embedding dimension of an activation tensor rather than having two scalars for an entire activation tensor. As such, the quantization parameters may be collectively denoted by vectors s, z ∈ [AltContent: rect].sup.d, where s represents the scaling factor and z represents the zero-points).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify the apparatus disclosed by Sri to disclose wherein the one or more quantization parameters are determined further based at least on one or more of the following: a maximum absolute value of one or more kernel weights, a scaling factor of one or more input activations, a precision of an accumulator, or an approximate maximum value in accumulation as taught by Bond, to improve quantization for neural networks ([0026], Bond).
Regarding claim 5, analyses are analogous to those presented for claim 1 and are applicable for claim 5.
Regarding claim 6, Sri discloses the apparatus of claim 5, wherein the apparatus is further caused to: determine allocation bits to be used for representing the set of activations and the set of parameters of the neural network, based on one or more test data samples ([0519]); and signal the allocation bits to a decoder ([0397], wherein send instruction to decoder).
Regarding claim 8, analyses are analogous to those presented for claim 1 and are applicable for claim 8.
Regarding claim 12, analyses are analogous to those presented for claim 1 and are applicable for claim 12.
Regarding claim 15, analyses are analogous to those presented for claim 1 and are applicable for claim 15.
Regarding claim 16, analyses are analogous to those presented for claim 1 and are applicable for claim 16.
Regarding claim 17, Sri discloses the apparatus of claim 1, wherein values of the one or more multiplier parameters are determined based at least on a training process (According to instant applicant’s publication, [0493], overfitting refers to training or fine tuning a NN. Therefore, to be consistent with applicant’s specification, Sri discloses DNN training in [0060]. Sri also discloses in [0062], during training of a neural network first applies QAT to generate a first trained model and then applies PTQ on the first trained model to output a second trained model that has parameters (e.g., weights and activations) represented by low-bit integers. In addition, Sri discloses in [0070], QAT may be applied by quantizing all weights and activations of the neural network except for layers that require finer granularity in representation than the 8-bit quantization can provide (e.g., regression layers). In some embodiments, QAT applies quantization on all weights and activation except for the last layer of the neural network. This may result in a mixed-precision DNN model. Weights may refer to the parameter within a neural network that transforms input data within the neural network's hidden layers. Within each node of the neural network, there is a set of inputs, weights, and a bias value and as the input enters each node, a weight is multiplied on the input, a bias is added and an activation function is applied. The resulting output is either observed, or passed to the next layer in the neural network. Each input node takes in information that can be numerically expressed (e.g., activation values) where each node is given a number. This node is then passed through the neural network during training).
Regarding claim 18, analyses are analogous to those presented for claim 17 and are applicable for claim 18.
Regarding claim 19, analyses are analogous to those presented for claim 17 and are applicable for claim 19.
Regarding claim 20, analyses are analogous to those presented for claim 17 and are applicable for claim 20.
Claim(s) 2-4, 7, 9-11, 13 are rejected under 35 U.S.C. 103 as being unpatentable over US 20220044114A1-Sriram et al (Hereinafter referred to as “Sri”), in view of US 20230139347 A1-Bondarenko et al (Hereinafter referred to as “Bond”), in further view of US 20190191172 A1-Ruanovskyy et al (Hereinafter referred to as “Rus”).
Regarding claim 2, Sri discloses the apparatus of claim 1 (see claim 1),
Sri fails to disclose wherein the apparatus is caused to signal the one or more quantization parameters to a decoder
However, in the same field of endeavor, Rus discloses wherein the apparatus is caused to signal the one or more quantization parameters to a decoder ([0139])
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify the imaging optical system disclosed by Sri to disclose wherein the apparatus is caused to signal the one or more quantization parameters to a decoder as taught by Rus, to improve the compression efficiency ([0024], Rus).
Regarding claim 3, Rus discloses the apparatus of claim 2, wherein the one or more quantization parameters are signaled as part of a supplemental enhancement information message or an adaptation parameter set ([0139]).
Regarding claim 4, Rus discloses the apparatus of claim 2, wherein the apparatus is further caused to signal association between the signaled one or more quantization parameters and data to be quantized by using the one or more quantization parameters ([0139]).
Regarding claim 7, Sri discloses the apparatus of claim 5, wherein the apparatus is further caused to: determine the one or more quantization parameters for one or more activations based on one or more test data([0112-113];
Sri fail to disclose signal the one or more quantization parameters to a decoder, wherein a reconstructed quantization parameters is applied by the decoder to quantize the one or more activations at a decoding stage.
However, in the same field of endeavor, Rus discloses determine the one or more quantization parameters (abstract); signal the one or more quantization parameters to a decoder ([0139]) , wherein a reconstructed quantization parameters is applied by the decoder to quantize the one or more activations at a decoding stage ([0060]).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify the imaging optical system disclosed by Sri to disclose signal the one or more quantization parameters to a decoder, wherein a reconstructed quantization parameters is applied by the decoder to quantize the one or more activations at a decoding stage as taught by Rus, to improve the compression efficiency ([0024], Rus).
Regarding claim 9, analyses are analogous to those presented for claim 2 and are applicable for claim 9.
Regarding claim 10, analyses are analogous to those presented for claim 3 and are applicable for claim 10.
Regarding claim 11, analyses are analogous to those presented for claim 4 and are applicable for claim 11.
Regarding claim 13, analyses are analogous to those presented for claim 7 and are applicable for claim 13.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LERON BECK whose telephone number is (571)270-1175. The examiner can normally be reached M-F 8 am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Czekaj can be reached at (571) 272-7327. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
LERON . BECK
Examiner
Art Unit 2487
/LERON BECK/Primary Examiner, Art Unit 2487