Office Action Analysis: 17560010 — QUANTIZATION METHOD FOR NEURAL NETWORK MODEL AND DEEP LEARNING ACCELERATOR

Office Action

§101 §103
DETAILED ACTION 
Notice of Pre-AIA  or AIA  Status 
The present application is being examined under the pre-AIA  first to invent provisions.  
 Response to Arguments
102 Rejection Arguments
Applicant asserts: 
Applicant argues, on page 5, that the amended claim 1 is not taught by Hubara.
Examiner response:
Examiner respectfully disagrees. Applicant’s arguments with respect to claim(s) 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

103 Rejection Arguments
Applicant asserts: 
Applicant argues, on page 5, that the amended claim 1 is not taught by Hubara and therefore dependent claims are not unpatentable.
Examiner response:
Examiner respectfully disagrees. Applicant’s arguments with respect to claim(s) 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

101 Rejection Arguments
Applicant asserts:
Applicant argues, on page 7, that the claim recites steps that is performed in tangible computing hardware and therefore not able to be performed within the human mind. The recited claims are integrated into a practical application and amount to significantly more
Examiner response:
Examiner respectfully disagrees. The argument regarding steps being performed on a tangible computing hardware is not reflected within the claims; however if they were, the limitation would be considered an additional limitation that would merely reciting words to apply the judicial exception on a generic computer (2106.05(f)). This would not integrate the judicial exception into a practical application and amount to significantly more. Regarding the argument of the claimed invention integrating the features for changing how the training system operates at the hardware and control level. This is not reflected in the claimed language and is not considered in the analysis. Regarding the argument that the claim amounts to significantly more by technical improvement, the arguments do not point out how the claimed invention is an improvement over prior inventions.
	
Information Disclosure Statement 
The information disclosure statement (IDS) submitted on 03/07/2022 and 01/30/2023 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner. 
 
Claim Rejections - 35 USC § 101 
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. 
 
Claims 1-5 and 7 are rejected under 35 U.S.C. 101 because the claimed invention is 
directed to an abstract idea without significantly more. 
In reference to claim 1: 
Step 1 - Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is directed to a method 
 
Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon? 
“A quantized method for a neural network model comprising: initializing a weight array of the neural network model, wherein the weight array comprises a plurality of initial weights” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could initialize a weight array with initial weights for the neural network. 
“performing a quantization procedure to generate a quantized weight array according to the weight array, wherein the quantized weight array comprises a plurality of quantized weights, and the plurality of quantized weights is within a fixed range” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could perform a quantization procedure to generate a quantized weight array so that the quantized weight array has quantized weights that fit within a fixed range. 
“performing a training procedure of the neural network model according to the quantized weight array” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could perform training procedure of the neural network using the quantized weight array. 
“determining whether a loss function is convergent in the training procedure, and outputting a trained quantized weight array when the loss function is convergent” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could evaluate whether a loss function is convergent and output the trained quantized weight array if so. 
“wherein the loss function comprises a basic term, a regularization term and a weight value associated with the regularization term,” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, is a mathematical relationships, mathematical formulas or equations, mathematical calculations (MPEP 2106.04(a)(2)(I)). 
“and determining whether the loss function is convergent in the training procedure comprises adjusting the weight value according to a convergent degree of the basic term and the regularization term;” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could evaluate whether a loss function is convergent in the training procedure and adjusting the weight value according to a convergent degree of the basic and regularization term. 
“wherein adjusting the weight value comprises decreasing the weight value when the convergent degree of the regularization term exceeds a predetermined threshold, and increasing the weight value when the convergent degree of the regularization term is below the predetermined threshold.” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could adjust the weight value by decreasing or increasing the weight value based on whether the convergent degree of the regularization term exceeds or is below predetermined threshold.
 
Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application? 
No  
 
Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception? 
No  
 
In reference to claim 2: 
Step 1 - Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is directed to a process 
 
Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon? 
“The method of claim 1, performing the quantization procedure to generate the quantized weight array according to the weight array comprising: inputting the plurality of initial weights to a conversion function so as to convert an initial range of the plurality of initial weights into the fixed range” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could input the initial weights into a conversion function to obtain an initial range for the initial weights into a fixed range. 
“inputting a result outputted by the conversion function to a quantization function to generate the plurality of quantized weights.” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could input the output from the conversion function to the quantization function to generate the quantized weights. 
 
Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application? 
No  
 
Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception? 
No  
 
In reference to claim 3: 
Step 1 - Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is directed to a process 
 
Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon? 
“The method of claim 2, wherein the conversion function comprises a nonlinear conversion formula, and the fixed range is [ -1, +1]” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, is a mathematical relationships, mathematical formulas or equations, mathematical calculations (MPEP 2106.04(a)(2)(I)). 
 
Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application? 
No  
 
Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception? 
No  
 
In reference to claim 4: 
Step 1 - Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is directed to a process 
 
Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon? 
“The method of claim 3, wherein the nonlinear conversion formula is a hyperbolic tangent function” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, is a mathematical relationships, mathematical formulas or equations, mathematical calculations (MPEP 2106.04(a)(2)(I)). 
 
Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application? 
No  
 
Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception? 
No  
 
In reference to claim 5: 
Step 1 - Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is directed to a process 
 
Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon? 
“The method of claim 3, further comprising determining an architecture of the neural network model, wherein” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could determine an architecture of the neural network model. 
“the basic term is associated with the quantized weight array” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, is a mathematical relationships, mathematical formulas or equations, mathematical calculations (MPEP 2106.04(a)(2)(I)). 
“the regularization term is associated with a plurality of parameters of the architecture and a hardware architecture configured to perform the training procedure” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, is a mathematical relationships, mathematical formulas or equations, mathematical calculations (MPEP 2106.04(a)(2)(I)). 
“the regularization term is configured to increase sparsity of the quantized weight array after the training procedure.” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, is a mathematical relationships, mathematical formulas or equations, mathematical calculations (MPEP 2106.04(a)(2)(I)). 
 
Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application? 
No  
 
Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception? 
No  

In reference to claim 7: 
Step 1 - Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is directed to a process 
 
Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon? 
“The method of claim 1, wherein performing the training procedure of the neural network model according to the quantized weight array comprises: performing a multiply-accumulate operation by a processing element matrix according to the quantized weight array and an input vector to generate an output vector having a plurality of output values” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could a multiply-accumulate operation according to the quantized weight array and input vector to generate an output vector of output values. 
“reading the plurality of output values respectively by a plurality of output readout circuits” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could observe the plurality of output values. 
“detecting whether each of the plurality of output values is zero by a respective one of a plurality of output detectors” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could observe whether each of the plurality of output values is zero. 
“and disabling an output readout circuit whose output value is zero from the plurality of output readout circuits, wherein the plurality of output detectors electrically connects to the plurality of output readout circuits respectively” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could observe that an output value is zero and disabling the respective output readout circuit. 
 
Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application? 
No  
 
Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception? 
No  
  
Claim Rejections - 35 USC § 103 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.   
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 
 
Claim(s) 1-3 are rejected under 35 U.S.C. 103 as being unpatentable over “Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations”; Itay Hubara et al; 2018 (hereinafter “Hubara”) in view of “Weighted Quantization-Regularization in DNNs for Weight Memory Minimization Toward HW Implementation”; Matthias Wess et al; 18 July 2018 (hereinafter “Wess”) in further view of Yi Wang; “Convolutional Neural Networks with Dynamic Regularization” published on Dec 31 2020 (hereinafter “Wang”). 
 
Regarding Claim 1, Hubara discloses A quantized method for a neural network model comprising: initializing a weight array of the neural network model, wherein the weight array comprises a plurality of initial weights; (Hubara Page 3 Paragraph 2; “We introduce a method to train Quantized-Neural-Networks (QNNs), neural networks with low precision weights and activations“ Hubara Page 4 Paragraph 3; “Although our BNN training method utilizes binary weights and activations to compute the parameter gradients, the real-valued gradients of the weights are accumulated in real valued variables.” Hubara Page 15 Paragraph 1; “Similar to (Ott et al., 2016), our preliminary results indicate that binarization of weight matrices lead to large accuracy degradation” Examiner notes that the weights belong the neural networks; the weights are stored/initialized in weight matrices/2d arrays and real-valued-weights are stored in weight matrix/array to be quantized); 
performing a quantization procedure to generate a quantized weight array according to 
the weight array, wherein the quantized weight array comprises a plurality of quantized 
weights, and the plurality of quantized weights is within a fixed range; (Hubara Page 6 Paragraph 1; “A similar binarization process was applied for weights in which we combine two ingredients: • Project each real-valued weight to [-1,1], i.e., clip the weights during training, as per Algorithm 1. The real-valued weights would otherwise grow very large without any impact on the binary weights. • When using a real-valued weight wr, quantize it using wb = Sign(wr).” Examiner notes that the quantization procedure as binarization process has the weights projected into a fixed range. Then, the projected weights are quantized in the wb= Sign(wr) function to produce quantized weights.); 
performing a training procedure of the neural network model according to the quantized weight array; (Hubara Page 4 Paragraph 3; “Our method of training BNNs can be seen as a variant of Dropout, in which instead of randomly setting half of the activations to zero when computing the parameter gradients, we binarize both the activations and the weights.” Examiner notes that binarizing weights is a process of quantizing the weights; the training of BNN is trained with quantized weights); 
And determining whether a loss function is convergent in the training procedure, and 
outputting a trained quantized weight array when the loss function is convergent. 
    PNG
    media_image1.png
    263
    406
    media_image1.png
    Greyscale
(Hubara Page 5 Paragraph 2; “During training, we update each weight Wij,l (connecting neuron j in layer l − 1 to neuron i in layer l), using the gradient from the previous layer (here ∂C ∂ri,l ) and the layer input.”; Hubara Page 12 Figure 1; “Figure 1: Training curves for different methods on the CIFAR-10 dataset. The dotted lines represent the training costs (square hinge losses) and the continuous lines the corresponding validation error rates. Although BNNs are slower to train, they are nearly as accurate as 32-bit float DNNs.” Examiner notes that square hinge loss is a loss function; figure 1 shows that the model converges/no longer improves after certain number of epochs; the training will produce trained/updated weights.); 

Hubara does not teach wherein the loss function comprises a basic term, a regularization term and a weight value associated with the regularization term,
and determining whether the loss function is convergent in the training procedure comprises adjusting the weight value according to a convergent degree of the basic term and the regularization term;
However, Wess does teach wherein the loss function comprises a basic term, a regularization term and a weight value associated with the regularization term, (Wess Section 3B Paragraph 2; “Modified Loss = Loss+λ1∗QR+λ2∗WQR.” Wess Section 3B Paragraph 5; “and similarly to (10) we can weight the tradeoff between accuracy and weight regularization with λ1 and λ2.” Examiner notes that modified loss function comprises original loss term is basic term; WQR is regularization term; λ2 is a weight value that is associated with the regularization term); 
and determining whether the loss function is convergent in the training procedure comprises adjusting the weight value according to a convergent degree of the basic term and the regularization term; Wess Section 3B Paragraph 3; “For the experiments we applied fixed λ1 and linearly increasing λ1 (e.g., λ1=10∗epoch ). Fig. 7 depicts the trained quantization process. With each epoch the weights are pulled closer to the quantization levels, thus decreasing QR and Δacc .” Wess Section 3B Paragraph 5; “and similarly to (10) we can weight the tradeoff between accuracy and weight regularization with λ1 and λ2 Again during training the parameters λ1 and λ2 have to be adjusted carefully to reach the desired improvement of AccMq , without at the same time decreasing AccM . In our experiments we found a linear increasing λ2 to work best (e.g., λ2=10∗epoch ).” Examiner notes that λ2 is adjusted so that desired AccMq (classification accuracy of the quantized network) is met, without at the same time decreasing AccM (classification accuracy of the original network); loss function is convergent when AccMq and AccM meet the desired improvement; loss function contains the basic and regularization term therefore it converges with the basic and regularization terms in consideration.)

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Hubara and Wess. Hubara teaches a hardware architecture to perform the training procedure of weights. Wess teaches a loss function that has a basic and regularization term that are associated to a quantized weight array and parameters of the architecture respectively, and the loss function is used to obtain trained quantized weights when the function has converged. One of ordinary skill would have motivation to combine Hubara and Wess to help determine the architecture of the neural network model, “QR and WQR enable trained quantization to improve performance in comparison to direct quantization. Regularization-based quantization is very simple to implement and applicable for any quantization scheme. As a consequence QR and WQR could be employed alongside other effective quantization techniques such as stochastic quantization. Besides the simplicity of the approach, it also allows a deeper analysis of quantization schemes by recording the weight distribution during training.” (Wess Section 3C Paragraph 6).

Hubara in view of Wess does not teach wherein adjusting the weight value comprises decreasing the weight value when the convergent degree of the regularization term exceeds a predetermined threshold, and increasing the weight value when the convergent degree of the regularization term is below the predetermined threshold.
However, Wang does teach wherein adjusting the weight value comprises decreasing the weight value when the convergent degree of the regularization term exceeds a predetermined threshold, and increasing the weight value when the convergent degree of the regularization term is below the predetermined threshold. (Wang Equation 15 and Page 4 Paragraph 4; “From Eq. (15), it can be observed that if the training loss decreases (∇f(lossi) ≤ 0), the regularization amplitude increases to avoid overfitting; otherwise, it decreases to prevent underfitting. The dynamic factor keeps updating to reflect the dynamics of the training loss.” Examiner notes that wherein adjusting the weight value comprises decreasing the weight value (regularization amplitude) when the convergent degree of the regularization term (training loss) exceeds a predetermined threshold ((∇f(lossi) > 0), and increasing the weight value when the 
    PNG
    media_image2.png
    46
    291
    media_image2.png
    Greyscale
convergent degree of the regularization term is below the predetermined threshold ((∇f(lossi) ≤ 0)) 
 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Hubara, Wess, and Wang. Hubara teaches a hardware architecture to perform the training procedure of weights. Wess teaches a loss function that has a basic and regularization term that are associated to a quantized weight array and parameters of the architecture respectively, and the loss function is used to obtain trained quantized weights when the function has converged. Wang teaches a dynamic regularization method as a function of the training loss. One of ordinary skill would have motivation to combine Hubara, Wess, and Wang to utilize dynamic regularization to improve the classification accuracy, “PyramidNet, ResNeXt, and DenseNet equipped with our dynamic regularization improve the classification accuracy in various model settings, when compared with the same networks with ShakeDrop [12], Shake-Shake [11], and DropBlock [10], respectively.” (Wang Page 2 Paragraph 1).

Regarding Claim 2, Hubara teaches The method of claim 1, performing the quantization procedure to generate the quantized weight array according to the weight array comprising: (Hubara Page 6 Paragraph 1; “A similar binarization process was applied for weights in which we combine two ingredients: • Project each real-valued weight to [-1,1], i.e., clip the weights during training, as per Algorithm 1. The real-valued weights would otherwise grow very large without any impact on the binary weights. • When using a real-valued weight wr, quantize it using wb = Sign(wr).” Hubara Page 15 Paragraph 1; “Similar to (Ott et al., 2016), our preliminary results indicate that binarization of weight matrices lead to large accuracy degradation.” Examiner notes that binarization process is quantization procedure to produce a quantized weight array from weight array; the weights are stored/represented in weight matrices/2d arrays and the quantization procedure is fed real-valued-weights as a weight array to be quantized); 
inputting the plurality of initial weights to a conversion function so as to convert an 
initial range of the plurality of initial weights into the fixed range; (Hubara Page 4 Paragraph 1; “When training a BNN, we constrain both the weights and the activations to either +1 or −1. Those two values are very advantageous from a hardware perspective, as we explain in Section 6. In order to transform the real-valued variables into those two values, we use two different binarization functions.” Examiner notes that real-valued variables include the initial weights; weights as initial weights are passed through a binarization function as a conversion function to have a fixed range.); 
 
Regarding Claim 3, Hubara teaches The method of claim 2, wherein the conversion function comprises a nonlinear conversion formula, and the fixed range is [ -1, +1]. (Hubara Page 4 Paragraph 1; “When training a BNN, we constrain both the weights and the activations to either +1 or −1. Those two values are very advantageous from a hardware perspective, as we explain in Section 6. In order to transform the real-valued variables into those two values, we use two different binarization functions…The second binarization function is stochastic where z ~ U[-1, 1], a uniform random variable, and σ is the "hard sigmoid" function.” Examiner notes that a stochastic function is a nonlinear conversion formula and is constrained to fit inside range [-1, +1].) 

Claim(s) 4 are rejected under 35 U.S.C. 103 as being unpatentable over “Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations”; Itay Hubara et al; 2018 (hereinafter “Hubara”) in view of “Weighted Quantization-Regularization in DNNs for Weight Memory Minimization Toward HW Implementation”; Matthias Wess et al; 18 July 2018 (hereinafter “Wess”) in further view of Yi Wang; “Convolutional Neural Networks with Dynamic Regularization” published on Dec 31 2020 (hereinafter “Wang”) in view of KIM; Lok Won; US 20220138586 A1 (hereinafter “Kim”). 

Regarding Claim 4, Hubara teaches The method of claim 3, wherein the nonlinear conversion formula is a [hyperbolic tangent function]. (Hubara Page 4 Paragraph 1; “The second binarization function is stochastic where z ~ U[-1, 1], a uniform random variable, and σ is the "hard sigmoid" function.” Examiner notes that hard sigmoid function is a nonlinear conversion formula) 
 
Hubara does not teach the limitation of “hyperbolic tangent function.” 
However, Kim does teach hyperbolic tangent function. (Kim Paragraph 0196; “In other words, various activation functions to impart non-linearity to the accumulated value may be additionally provided. The activation function may be, for example, a sigmoid function, a hyperbolic tangent function, an ELU function, a Hard-Sigmoid function, a Swish function, a Hard-Swish function, a SELU function, a CELU function, a GELU function, a TANHSHRINK function, a SOFTPLUS function, a MISH function, a Piecewise Interpolation Approximation for Non-linear function, or an ReLU function, but the present disclosure is not limited thereto.”) 
 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to substitute Hubara, Wess, Wang, and Kim. Hubara teaches a hardware architecture to perform the training procedure of weights. Wess teaches a loss function that has a basic and regularization term that are associated to a quantized weight array and parameters of the architecture respectively, and the loss function is used to obtain trained quantized weights when the function has converged. Wang teaches a dynamic regularization method as a function of the training loss. Kim teaches a plurality of nonlinear functions that include hyperbolic tangent function. One of ordinary skill would have motivation to substitute Hubara, Wess, Wang, and Kim so that the nonlinear conversion function is a hyperbolic tangent function, “to impart non-linearity to the accumulated value” (Kim, Paragraph 0196). 
 
Claim(s) 5 is rejected under 35 U.S.C. 103 as being unpatentable over “Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations”; Itay Hubara et al; 2018 (hereinafter “Hubara”) in view of “Weighted Quantization-Regularization in DNNs for Weight Memory Minimization Toward HW Implementation”; Matthias Wess et al; 18 July 2018 (hereinafter “Wess”) in further view of Yi Wang; “Convolutional Neural Networks with Dynamic Regularization” published on Dec 31 2020 (hereinafter “Wang”) in further view of SHOMRON; Gil et al; US 20220075669 A1 (hereinafter “Shomron”). 
 
Regarding Claim 5, Hubara teaches “and a hardware architecture configured to perform the training procedure” (Hubara Page 20 Paragraph 2; “Moreover, our training method has almost no multiplications, and therefore might be implemented efficiently in dedicated hardware.” Examiner notes that the training is performed on a dedicated hardware architecture.) 
 
Hubara does not teach The method of claim 3, further comprising determining an architecture of the neural network model, wherein:
the basic term is associated with the quantized weight array;
the regularization term is associated with a plurality of parameters of the architecture;
However, Wess does teach The method of claim 3, further comprising determining an architecture of the neural network model, wherein: (Wess Section 3 Paragraph 2; “While the first two steps serve for finding the best quantization method and bit-width for each layer when applying quantization without retraining, in the third step we perform retraining aiming to reduce the accuracy loss caused by quantization. In our flow, the weights are not first quantized and then retrained, but we always start from the high accuracy model, fine-tune weights with modified loss-functions and then perform quantization.” Examiner notes that the proposed method will determine the weights/architecture of the neural network.); 
the basic term is associated with the quantized weight array; (Wess Section 3A Paragraph 3; “By adding “0” as a quantization level, we enable Po2 quantization to also serve as a pruning mechanism when applied to weight matrices, as small weights are rounded to zero.” Wess Section 3A Figure 4; “MSE for an example layer when applying Po2 and DFP quantization with different bit-widths. While DFP quantization decreases the quantization error exponentially with increasing bit-width, with Po2 quantization the quantization error reaches the minimum already at bit-width 4.” Examiner notes that mean square error is the basic term/un modified loss term; Po2 quantization produces a quantized weight matrices/2d arrays and applied to the MSE.); 
the regularization term is associated with a plurality of parameters of the architecture; (Wess Section 3B Paragraph 5; “While in normal QR each weight within one layer is considered equally important for reaching high classification accuracy, the efficiency of pruning [15]–​[17] shows that especially weights with small magnitudes can be changed without reducing the accuracy of the network. Similarly to [8], the weights can be divided into two disjoint subsets, where QR is applied on one of the subsets while the other weights are being retrained without QR. Going one step further we can multiply the QR value of each weight with the absolute magnitude of the weight.2 This strategy forces quantization stronger on weights with higher magnitudes which can be especially useful for Po2 quantization, where density of quantization levels decreases with increasing weight values. Therefore, we define the WQR term as WQR=∑nN∑icard(Wn)(∣Wni−Wqni∣∣Wni∣max(Qn)2∗card(Wn))(11).” Examiner notes that WQR/regularization term equation takes into account  number of layer (N), original weights (Wn), quantization schemes for layers (Qn), and quantized weights (Wqn) as parameters of the architecture) 

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Hubara and Wess. Hubara teaches a hardware architecture to perform the training procedure of weights. Wess teaches a loss function that has a basic and regularization term that are associated to a quantized weight array and parameters of the architecture respectively, and the loss function is used to obtain trained quantized weights when the function has converged. One of ordinary skill would have motivation to combine Hubara and Wess to help determine the architecture of the neural network model, “QR and WQR enable trained quantization to improve performance in comparison to direct quantization. Regularization-based quantization is very simple to implement and applicable for any quantization scheme. As a consequence QR and WQR could be employed alongside other effective quantization techniques such as stochastic quantization. Besides the simplicity of the approach, it also allows a deeper analysis of quantization schemes by recording the weight distribution during training.” (Wess Section 3C Paragraph 6). 
 
Hubara in view of Wess does not teach “And the regularization term is configured to increase sparsity of the quantized weight array after the training procedure.” 
However, Shomron does teach “And the regularization term is configured to increase sparsity of the quantized weight array after the training procedure.” (Shomron Paragraph 0108; “Yet, conventional DNN training with only a loss function (i.e., no regularization terms) produces weights that do not comprise many zeros. Different techniques have, therefore, emerged to increase weight sparsity either through regularization, e.g. L1.” Examiner notes training uses regularization L1 to increase the sparsity of the weights) 

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Hubara, Wess, Wang, and Shomron. Hubara teaches a hardware architecture to perform the training procedure of weights. Wess teaches a loss function that has a basic and regularization term that are associated to a quantized weight array and parameters of the architecture respectively, and the loss function is used to obtain trained quantized weights when the function has converged. Wang teaches a dynamic regularization method as a function of the training loss. Shomron teaches a technique of using regularization L1 to increase sparsity. One of ordinary skill would have motivation to combine Hubara, Wess, Wang, and Shomron to increase the accuracy of the model “(e.g., 20% means that 20% of weights are equal to zero). As before, we trade speedup for accuracy by slowing down layers to run with two threads. We observe that with a speedup of 4×, the 60%-pruned model achieves highest accuracy.” (Shomron Paragraph 0109). 
  
Claim(s) 7 is rejected under 35 U.S.C. 103 as being unpatentable over “Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations”; Itay Hubara et al; 2018 (hereinafter “Hubara”) in view of Hoang; Tung Thanh et al; US 20210110244 A1 (hereinafter “Hoang”) in further view of LI; Ren; US 20230049323 A1 (hereinafter “Li”). 
 
Regarding Claim 7, Hubara does not teach The method of claim 1, wherein performing the training procedure of the neural network model according to the quantized weight array comprises:
performing a multiply-accumulate operation by a processing element matrix according to the quantized weight array and an input vector to generate an output vector having a plurality of output values;
However, Hoang does teach The method of claim 1, wherein performing the training procedure of the neural network model according to the quantized weight array comprises: (Hoang Paragraph 0072; “A technique that can be used to reduce the computational complexity of the inference process is by use of a Binarized Neural Network (BNN), in which a neural network works with binary weights and activations.” Hoang Paragraph 0082; “The use of NAND flash memory to store weight and compute the dot products of inputs and weights in-array can be used in both the training and inference phases.” Examiner notes that MAC operation is performed during training procedure; operation can be performed with quantized/binarized weights); 
performing a multiply-accumulate operation by a processing element matrix according to the quantized weight array and an input vector to generate an output vector having a plurality of output values; (Hoang Paragraph 0050; “Memory structure 126 is addressable by word lines via a row decoder 324 and by bit lines via a column decoder 332. The read/write circuits 328 include multiple sense blocks 350 including SB1, SB2, SBp (sensing circuitry) and allow a page of memory cells to be read or programmed in parallel.” Hoang Paragraph 0056; “Examples of suitable technologies for memory cell architectures of the memory structure 126 include two dimensional arrays, three dimensional arrays, cross-point arrays, stacked two dimensional arrays, vertical bit line arrays, and the like.” Hoang Paragraph 0068; “The input data is represented as a vector of a length corresponding to the number of input nodes. The weights are represented in a weight matrix, where the number of columns corresponds to the number of the number of intermediate nodes in the hidden layer and the number of rows corresponds to the number of input nodes. The output is determined by a matrix multiplication of the input vector and the weight matrix, where each element of the output vector is a dot product of the vector of the input data with a column of the weight matrix. A common technique for executing the matrix multiplications is by use of a multiplier-accumulator.” Hoang Paragraph 0070; “the matrix multiplication can be computed within a memory array by leveraging the characteristics of Storage Class Memory (SCM), such as those based on ReRAM, PCM, or MRAM based memory cells.” Examiner notes that memory structure is can be arranged as a 2d array that are connected via word lines and bit lines which is a processing element matrix; memory structure takes in an input vector and weight matrix/array to generate an output vector with computed values; Memory structure of memory cells are equipped to perform multiply-accumulate operations) 

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Hubara and Hoang. Hubara teaches a process to quantize a weight array and train the neural network with the quantized weights. Hoang teaches a processing element matrix that performs multiply-accumulate operation on the quantized weight array. One of ordinary skill would have motivation to combine Hubara and Hoang to help determine the architecture of the neural network model during training while performing a multiply accumulate operation; ”These techniques allow for in-array implementations of matrix multiplication with improved inference accuracy when applying TBN and TTN for large datasets and complicated deep neural network (DNN) structures.” (Hoang Paragraph 0163). 
 
Hubara in view of Hoang does not teach reading the plurality of output values respectively by a plurality of output readout circuits;
detecting whether each of the plurality of output values is zero by a respective one of a plurality of output detectors,
and disabling an output readout circuit whose output value is zero from the plurality of output readout circuits, wherein the plurality of output detectors electrically connects to the plurality of output readout circuits respectively.
However, Li does teach reading the plurality of output values respectively by a plurality of output readout circuits; (Li Paragraph 0058; “In some aspects, there is one sequential accumulator connected to each column-wise output of the CIM array (e.g., each read bit line of CIM array 206), and the sequential accumulators can themselves be connected together in order to accumulate across multiple outputs” Examiner notes that the output values are read onto read bit lines into the sequential accumulators/output readout circuits.); 
detecting whether each of the plurality of output values is zero by a respective one of a plurality of output detectors, (Li Paragraph 0040; “if input data sparsity detection component 112 detects zero-valued input data in input data nodes INP1 and INP2, it may disable the rows associated with those nodes prior to processing data in CIM array 106 because zero multiplied by any weight stored in the bit cells along those rows is still zero.” Li Paragraph 0051; “Like input data 102, weight data 108 may also be sparse.” Li Paragraph 0144; Processing system 900 further comprises sparsity detection circuit 930, such as described above, for example, with respect to FIGS. 1, 2, 3, and 6. For example, sparsity detection circuit may be configured to detect sparsity in input data and/or weight data, as described above. In some cases, a separate circuit may be used for input data sparsity detection and for weight data sparsity detection.” Examiner notes that a zero input will produce a zero output; input data sparsity detection component is output detector and detects a zero output in advance; weight data can also be zero and a similar implementation can be performed using a weight data sparsity detection component of disabling the associated column is zero weight is detected.); 
and disabling an output readout circuit whose output value is zero from the plurality of output readout circuits, wherein the plurality of output detectors electrically connects to the plurality of output readout circuits respectively. (Li Paragraph 0058; “In some aspects, there is one sequential accumulator connected to each column-wise output of the CIM array (e.g., each read bit line of CIM array 206), and the sequential accumulators can themselves be connected together in order to accumulate across multiple outputs” Li Paragraph 0144 – 0145; “Processing system 900 further comprises sparsity detection circuit 930, such as described above, for example, with respect to FIGS. 1, 2, 3, and 6. For example, sparsity detection circuit may be configured to detect sparsity in input data and/or weight data, as described above. In some cases, a separate circuit may be used for input data sparsity detection and for weight data sparsity detection. Processing system 900 further comprises bit cell disabling circuit 932, such as described above, for example, with respect to FIGS. 1, 2 and 3. For example, bit cell disabling circuit may be configured to disable rows (word lines) of a CIM array, columns (bit lines) of a CIM array, and/or tiles of a CIM array comprising multiple bit cells.” Examiner notes that when a zero output is detected then output readout circuit/sequential accumulator output is disabled; output detectors/sparsity detection are electrically connected to output readout circuits/sequential accumulator output through bit lines of the bit cell disabling circuit) 
  
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Hubara, Hoang, and Li. Hubara teaches a process to quantize a weight array and train the neural network with the quantized weights. Hoang teaches a processing element matrix that performs multiply-accumulate operation on the quantized weight array. Li teaches an accelerator that disables portions of the processing element matrix array if the output is detected as zero. One of ordinary skill would have motivation to combine Hubara, Hoang, and Li to help determine the architecture of the neural network model during training while performing a multiply accumulate operation and disabling bit lines based on the output of the operation; “Sparsity detection (e.g., input data sparsity detection component 112 of FIG. 1) and control logic (e.g., state control component 114 of FIG. 1) can help reduce power use by bit cell 400 when operating in “XNOR” mode by disabling (e.g., tristating) bit cell 400, for example, by tying PCWL1P and PCWL2P high when the input data is 0. Similarly, when in the “AND” mode, the control logic may generate a sparse mask signal (e.g., as in FIG. 3), which likewise bypasses bit cell 400 by tying PCWL1P and PCWL2P high for the entire multibit input.” (Li Paragraph 0073).  
 
Conclusion 
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL DUC TRAN whose telephone number is (571)272-6870. The examiner can normally be reached Mon-Fri 8:00-5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/D.D.T./Examiner, Art Unit 2147                                                                                                                                                                                                        
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action
QUANTIZATION METHOD FOR NEURAL NETWORK MODEL AND DEEP LEARNING ACCELERATOR

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

QUANTIZATION METHOD FOR NEURAL NETWORK MODEL AND DEEP LEARNING ACCELERATOR

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email