Office Action Analysis: 18104043 — NEURON-BY-NEURON QUANTIZATION FOR EFFICIENT TRAINING OF LOW-BIT QUANTIZED NEURAL NETWORKS

Office Action

§102 §103 §112
DETAILED ACTION
Claims 1-20 are presented for examination.
This office action is in response to submission of application on 01/31/2023.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/31/2023 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 7, and 13-16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Claim 7 recites “each subset of components consists of a single neuron”, whereas claim 2 recites “each of the plurality of components is a neuron that comprises one or more filters and an activation”. The transitional phrase “consists of” excludes any element, step, or ingredient not specified in the claim (See MPEP 2111.03) which is thus inconsistent with the additional elements provided in claim 2. Thus, claim 7 is found to be indefinite for failing to particularly point out and distinctly claim the subject matter.
	Claim 13 recites “wherein the subset of components consists of one or more filters”, whereas claim 1 recites “select a subset of the plurality of components, wherein each of the plurality of components comprises weights”. The transitional phrase “consists of” excludes any element, step, or ingredient not specified in the claim (See MPEP 2111.03) which is thus inconsistent with the additional elements provided in claim 1. Thus, claim 13 is found to be indefinite for failing to particularly point out and distinctly claim the subject matter. Dependent claims 14-16 inherit the deficiency and therefore are rejected on the same basis.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-4, 7-10, 13-14, 19 and 20 are rejected under AIA  35 U.S.C. 102(a)(1) as being anticipated by Gao et al. (Pub. No.: US 2020/0082269 A1), hereafter Gao.

Regarding claim 1, Gao discloses:
A method of training a neural network comprising using at least one hardware processor to (Fig. 1A and Fig. 2),
for each layer to be quantized in the neural network, for each of a plurality of iterations, until all of a plurality of components of the layer are quantized (Fig. 2 and ¶[0029] and ¶[0031] teaches iteratively quantizing parameters such as activations and weights, i.e. plurality of components, of layers of a neural network until all are quantized),
select a subset of the plurality of components, wherein each of the plurality of components comprises weights (Fig. 2, Fig. 4, and ¶[0041] teaches selecting weights of individual layers during fine tuning as selecting a subset of the plurality of components), 
quantize the weights in the subset of components (Fig. 2, Fig. 4, ¶[0039] and ¶[0041] teaches quantizing the weights in the subset of components),
retrain the neural network (Fig. 2, Fig. 4, ¶[0039] teaches retraining the neural network), 
freeze the subset of components, such that the subset of components is not subsequently modified during training (Fig. 2, Fig. 4, ¶[0039] teaches freezing the parameters of the layers of the neural network such that they are not modified during training).

Regarding claim 2, Gao discloses the method of claim 1 (and thus the rejection of claim 1 is incorporated). Gao further discloses:
wherein each of the plurality of components is a neuron that comprises one or more filters and an activation that produces an input to a subsequent layer from an output of the one or more filters (Examiner notes: for prior art purposes, the examiner interprets the BRI of filters to include operations of convolutional layers, which filter inputs to a CNN) (¶[0032], ¶[0050], ¶[0080] teaches neurons of layers to comprise filters, i.e. operations of convolutional layers, and activations that produce an input to the subsequent layer from an output of the convolutional block).

Regarding claim 3, Gao discloses the method of claim 2 (and thus the rejection of claim 2 is incorporated). Gao further discloses:
further comprising using the at least one hardware processor to, for each of the plurality of iterations, quantize a corresponding portion of the input to the subsequent layer (Fig. 4, ¶[0041] and ¶[0050] teaches quantizing a corresponding portion of the input to the subsequent layer for each iteration of training).

Regarding claim 4, Gao discloses the method of claim 3 (and thus the rejection of claim 3 is incorporated). Gao further discloses:
wherein freezing the subset of components comprises freezing weights of the one or more filters (Fig. 4, ¶[0039], and ¶[0053] teaches freezing the weights of the convolutional blocks).

Regarding claim 7, Gao discloses the method of claim 2 (and thus the rejection of claim 2 is incorporated). Gao further discloses:
wherein each subset of components consists of a single neuron (¶[0032] teaches the subset of components to consist of neurons making up each layer).

Regarding claim 8, Gao discloses the method of claim 2 (and thus the rejection of claim 2 is incorporated). Gao further discloses:
wherein retraining the network comprises back- propagating gradients to all quantized layers (Fig. 2, ¶[0033] and ¶[0035]).

Regarding claim 9, Gao discloses the method of claim 2 (and thus the rejection of claim 2 is incorporated). Gao further discloses:
wherein retraining the network comprises back- propagating a gradient from the layer to another layer that immediately precedes the layer in forward order of the neural network (Fig. 2, ¶[0033] and ¶[0035]).

Regarding claim 10, Gao discloses the method of claim 1 (and thus the rejection of claim 1 is incorporated). Gao further discloses:
wherein each of the plurality of components is a filter (¶[0032], ¶[0039], ¶[0080] teaches each of the components to be operations of convolutional layers).

Regarding claim 13, Gao discloses the method of claim 10 (and thus the rejection of claim 10 is incorporated). Gao further discloses:
wherein the subset of components consists of one or more filters (¶[0032] and ¶[0039] teaches the subset of components to consist of operations of convolutional layers, i.e. filters), 
wherein freezing the subset of components comprises freezing weights of the one or more filters in the subset of components (Fig. 4, ¶[0039], and ¶[0053] teaches freezing the weights of the convolutional blocks).

Regarding claim 14, Gao discloses the method of claim 13 (and thus the rejection of claim 13 is incorporated). Gao further discloses:
further comprising using the at least one hardware processor to, for each of the plurality of iterations, prior to retraining the neural network, quantize an input to the one or more filters in the subset of components (¶[0053] teaches quantizing inputs, near the input layer, to one of more filters and doing so prior to retraining, i.e. tuning).

Claims 19 and 20 are substantially similar to claim 1 and are rejected on the same basis.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Gao et al. (Pub. No.: US 2020/0082269 A1), hereafter Gao, in view of Yang et al. (Pub. No.: US 2021/0019606 A1), hereafter Yang.

Regarding claim 11, Gao discloses the method of claim 10 (and thus the rejection of claim 10 is incorporated). Gao does not disclose:
wherein a number of the plurality of iterations is predefined as N, and selecting the subset of components comprises selecting 1/N filters within the layer, such that all filters in the layer are selected over the plurality of iterations.

Yang discloses:
wherein a number of the plurality of iterations is predefined as N, and selecting the subset of components comprises selecting 1/N filters within the layer, such that all filters in the layer are selected over the plurality of iterations (¶[0100] teaches selecting a subset of filters iteratively in batches such that all the filters in the layer are selected over the plurality of iterations).

Gao and Yang are analogous art because they are from the same field of endeavor, CNN quantization.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Gao to include wherein a number of the plurality of iterations is predefined as N, and selecting the subset of components comprises selecting 1/N filters within the layer, such that all filters in the layer are selected over the plurality of iterations, based on the teachings of Yang. One of ordinary skill in the art would have been motivated to make this modification for performance improvement in CNN processing, as suggested by Yang (¶[0060]).

Regarding claim 12, Gao, in view of Yang, discloses the method of claim 11 (and thus the rejection of claim 11 is incorporated). Yang further discloses:
wherein N >4 (¶[0123] teaches iteration numbers to be 100, 200, 100, and etc.).


Claims 5 - 6, and 15 - 17 are rejected under 35 U.S.C. 103 as being unpatentable over  Gao et al. (Pub. No.: US 2020/0082269 A1), hereafter Gao, in view of Lee et al. ("Quantune: Post-training quantization of convolutional neural networks using extreme gradient boosting for fast deployment"), hereafter Lee, in further view of Nagel et al. ("A White Paper on Neural Network Quantization "), as cited in the IDS dated 01/31/2023, hereafter Nagel.

Regarding claim 5, Gao discloses the method of claim 3 (and thus the rejection of claim 3 is incorporated). Gao does not discloses:
further comprising using the at least one hardware processor to: construct a histogram of a data distribution of inputs to the subsequent layer,
solve a minimum mean-square error problem to obtain one or more parameters of quantization based on the constructed histogram, 
wherein the one or more parameters are used to quantize the input to the subsequent layer.

Lee discloses:
further comprising using the at least one hardware processor to: construct a histogram of a data distribution of inputs to the subsequent layer … to obtain one or more parameters of quantization based on the constructed histogram, wherein the one or more parameters are used to quantize the input to the subsequent layer (Fig. 1, and page 126, right column, paragraph 2, lines 2-4 “the histogram of possible numeric ranges in each layer of the neural network is captured for the activation of the quantization” and last 4 lines “the histogram of the tensor values is generated by observing the execution during the inference to capture the possible numeric ranges of activations in each layer of the neural network.”).

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Gao to include construct a histogram of a data distribution of inputs to the subsequent layer … to obtain one or more parameters of quantization based on the constructed histogram, wherein the one or more parameters are used to quantize the input to the subsequent layer, based on the teachings of Lee. One of ordinary skill in the art would have been motivated to make this modification in order to generate optimal quantized models, considering accuracy, as suggested by Lee (page 126, left column, paragraph 2, last 4 lines).

While Lee discloses obtaining one or more parameters of quantization, they do not disclose solving a minimum mean-square error problem to do so. 
Nagel discloses:
solve a minimum mean-square error problem to obtain one or more parameters of quantization … (page 9, equation 16 and final paragraph, lines 1-3 “Mean squared error (MSE) … in this range setting method we find qmin and qmax that minimize the MSE between the original and the quantized tensor”).

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Gao, in view of Lee, to include solve a minimum mean-square error problem to obtain one or more parameters of quantization, based on the teachings of Nagel. One of ordinary skill in the art would have been motivated to make this modification in order to alleviate the issue of large outliers, as suggested by Nagel (page 9, final paragraph, line 1).

Regarding claim 6, Gao, in view of Lee, in further view of Nagel, discloses the method of claim 5 (and thus the rejection of claim 5 is incorporated). Lee further discloses:
wherein the one or more parameters comprise a scale S having a real value and an offset 0 having an integer value (equation (2) and page 128, left column, section 4.2, paragraph “Asymmetric” teaches scale as scale S and zero point as offset value O),
wherein quantizing the input comprises performing a quantization operation on each real-valued element r in an input array as follows:  
    PNG
    media_image1.png
    42
    134
    media_image1.png
    Greyscale
, wherein q(r) is a quantized value for the real-valued element r (equation (2) and page 128, left column, section 4.2, paragraph “Asymmetric” teaches a quantization scheme that performs equation (2) on each real valued element xfp32).

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Gao to include wherein the one or more parameters comprise a scale S having a real value and an offset 0 having an integer value, wherein quantizing the input comprises performing a quantization operation on each real-valued element r in an input array as follows:  
    PNG
    media_image1.png
    42
    134
    media_image1.png
    Greyscale
 , wherein q(r) is a quantized value for the real-valued element r, based on the teachings of Lee. One of ordinary skill in the art would have been motivated to make this modification in order to generate optimal quantized models, considering accuracy, as suggested by Lee (page 126, left column, paragraph 2, last 4 lines).


Regarding claim 15, Gao discloses the method of claim 14 (and thus the rejection of claim 14 is incorporated). Gao does not disclose:
further comprising using the at least one hardware processor to: construct a histogram of a data distribution of inputs to the subset of components; and 
solve a minimum mean-square error problem to obtain one or more parameters of quantization based on the constructed histogram, 
wherein the one or more parameters are used to quantize the input.

Lee discloses:
construct a histogram of a data distribution of inputs to the subset of components … to obtain one or more parameters of quantization based on the constructed histogram, wherein the one or more parameters are used to quantize the input (Fig. 1, and page 126, right column, paragraph 2, lines 2-4 “the histogram of possible numeric ranges in each layer of the neural network is captured for the activation of the quantization” and last 4 lines “the histogram of the tensor values is generated by observing the execution during the inference to capture the possible numeric ranges of activations in each layer of the neural network.”).

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Gao to include construct a histogram of a data distribution of inputs to the subset of components … to obtain one or more parameters of quantization based on the constructed histogram, wherein the one or more parameters are used to quantize the input, based on the teachings of Lee. One of ordinary skill in the art would have been motivated to make this modification in order to generate optimal quantized models, considering accuracy, as suggested by Lee (page 126, left column, paragraph 2, last 4 lines).

While Lee discloses obtaining one or more parameters of quantization, they do not disclose solving a minimum mean-square error problem to do so. 
Nagel discloses:
solve a minimum mean-square error problem to obtain one or more parameters of quantization … (page 9, equation 16 and final paragraph, lines 1-3 “Mean squared error (MSE) … in this range setting method we find qmin and qmax that minimize the MSE between the original and the quantized tensor”).

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Gao, in view of Lee, to include solve a minimum mean-square error problem to obtain one or more parameters of quantization, based on the teachings of Nagel. One of ordinary skill in the art would have been motivated to make this modification in order to alleviate the issue of large outliers, as suggested by Nagel (page 9, final paragraph, line 1).

Regarding claim 16, Gao, in view of Lee, in further view of Nagel, discloses the method of claim 15 (and thus the rejection of claim 15 is incorporated). Lee further discloses:
wherein the one or more parameters comprise a scale S having a real value and an offset 0 having an integer value (equation (2) and page 128, left column, section 4.2, paragraph “Asymmetric” teaches scale as scale S and zero point as offset value),
wherein quantizing the input comprises performing a quantization operation on each real-valued element r in an input array as follows: 
    PNG
    media_image2.png
    41
    134
    media_image2.png
    Greyscale
, wherein q(r) is a quantized value for the real-valued element r (equation (2) and page 128, left column, section 4.2, paragraph “Asymmetric” teaches a quantization scheme that performs equation (2) on each real valued element xfp32).

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Gao to include wherein the one or more parameters comprise a scale S having a real value and an offset 0 having an integer value, wherein quantizing the input comprises performing a quantization operation on each real-valued element r in an input array as follows:  
    PNG
    media_image1.png
    42
    134
    media_image1.png
    Greyscale
 , wherein q(r) is a quantized value for the real-valued element r, based on the teachings of Lee. One of ordinary skill in the art would have been motivated to make this modification in order to generate optimal quantized models, considering accuracy, as suggested by Lee (page 126, left column, paragraph 2, last 4 lines).

Regarding claim 17, Gao discloses the method of claim 1 (and thus the rejection of claim 1 is incorporated). Gao does not disclose:
wherein quantizing the weights comprises solving: S,0: argmin (E(r - qs,o(r))2) 
wherein S is a scale having a real value, 0 is an offset having an integer value, and 
E(r - qs,o(r))2 is a mean-square error function that calculates an error between real values r of the weights and quantized values qso(r), 
wherein 
    PNG
    media_image3.png
    41
    162
    media_image3.png
    Greyscale
.

Lee discloses:
wherein quantizing the weights comprises solving: S,0… wherein S is a scale having a real value, 0 is an offset having an integer value (equation (2) and page 128, left column, section 4.2, paragraph “Asymmetric” teaches scale as scale S and zero point as offset value),
… a mean-square error function that calculates an error between real values r of the weights and quantized values qso(r), wherein 
    PNG
    media_image3.png
    41
    162
    media_image3.png
    Greyscale
 (equation (2) and page 128, left column, section 4.2, paragraph “Asymmetric” and page 130, left column, paragraph 1, lines 8-9 “The differentiable convex functions are mean square error”).

While Lee teaches wherein quantizing the weights comprises solving: S,0… wherein S is a scale having a real value, 0 is an offset having an integer value … a mean-square error function that calculates an error between real values r of the weights and quantized values qso(r), wherein 
    PNG
    media_image3.png
    41
    162
    media_image3.png
    Greyscale
, they do not explicitly teach solving argmin (E(r - qs,o(r))2) … where E(r - qs,o(r))2 is a mean-square error function.

Nagel discloses:
solving argmin (E(r - qs,o(r))2) … where E(r - qs,o(r))2 is a mean-square error function (page 9, equation 16 and final paragraph, lines 1-3 “Mean squared error (MSE) … in this range setting method we find qmin and qmax that minimize the MSE between the original and the quantized tensor” teaches solving the mean square error function in equation 16 , where r and qs,o(r) are original and quantized tensors respectively).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Gao, in view of Lee, to include solving argmin (E(r - qs,o(r))2) … where E(r - qs,o(r))2 is a mean-square error function, based on the teachings of Nagel. One of ordinary skill in the art would have been motivated to make this modification in order to alleviate the issue of large outliers, as suggested by Nagel (page 9, final paragraph, line 1).

Claims 18 is rejected under 35 U.S.C. 103 as being unpatentable over  Gao et al. (Pub. No.: US 2020/0082269 A1), hereafter Gao, in view of Lee et al. ("Quantune: Post-training quantization of convolutional neural networks using extreme gradient boosting for fast deployment"), hereafter Lee, in further view of Nagel et al. ("A White Paper on Neural Network Quantization "), as cited in the IDS dated 01/31/2023, hereafter Nagel, in further view of Surti et al. (Pub. No.: US 2020/0311041 A1), hereafter Surti.

Regarding claim 18, Gao, in view of Lee, in further view of Nagel, discloses the method of claim 17 (and thus the rejection of claim 17 is incorporated). Gao, in view of Lee, in further view of Nagel, does not disclose:
 wherein the solving comprises a ternary search.

Surti discloses:
wherein the solving comprises a ternary search (¶[0218] and ¶[0226] teaches solving using a ternary search).
Gao, Lee, Nagel, and Surti are analogous art because they are from the same field of endeavor, data compression and neural networks.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Gao, in view of Lee, in further view of Nagel, to include solving using a ternary search, based on the teachings of Surti. One of ordinary skill in the art would have been motivated to make this modification in order to improve memory bandwidth of workloads including machine learning, allowing for increased power efficiency and performance, as suggested by Surti (¶[0203]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUMAIRA ZAHIN MAUNI whose telephone number is (703)756-5654. The examiner can normally be reached Monday - Friday, 9 am - 5 pm (ET).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MATT ELL can be reached at (571) 270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/H.Z.M./Examiner, Art Unit 2141                                                                                                                                                                                                        
/MATTHEW ELL/Supervisory Patent Examiner, Art Unit 2141
Read full office action
NEURON-BY-NEURON QUANTIZATION FOR EFFICIENT TRAINING OF LOW-BIT QUANTIZED NEURAL NETWORKS

This examiner grants 38% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

NEURON-BY-NEURON QUANTIZATION FOR EFFICIENT TRAINING OF LOW-BIT QUANTIZED NEURAL NETWORKS

This examiner grants 38% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email