Office Action Analysis: 17664616 — SPARSITY AND QUANTIZATION FOR DEEP NEURAL NETWORKS

Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is made final.
Claims 1-3 and 5-21 are pending. Claims 1, 11 and 16 are independent claims.

Response to Arguments
Applicant’s arguments, dated 11/12/2025, regarding the 35 U.S.C. 101 rejections of the now amended claims have been fully considered and are persuasive. The 101 rejections of claims 1-3, and 5-20 have been withdrawn.
Applicant’s arguments, dated 11/12/2025, regarding the 35 U.S.C. 103 rejections of the previous office action have been fully considered. Due to the claim amendments, the scope of the claims has changed and new grounds of rejection are applied – see the updated rejection below. 
Applicant’s arguments, dated 11/12/2025, regarding the 35 U.S.C. 112(f) interpretations of the previous office action have been fully considered but are unpersuasive. The limitations that recited placeholders followed by functional language were not given sufficient structure in the amended version of the claims to avoid the 112(f) interpretation. The applicant argues that the storage subsystem with instructions that are executable to implement the deep neural network (DNN) is enough structure, but explaining the subsystem only provides structure to the computing system, not to the DNN layers or controllers of the claims.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: “an input layer for receiving inputs applied to the deep neural network”, “an output layer for outputting inferences based on the received inputs”, “a sparsity controller configured to selectively apply a plurality of different sparsity states” and “a quantization controller configured to selectively quantize the parameters of the deep neural network” in claim 1 and dependent claims 2-10 and 21.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 2, 3, 6 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasquez et al. (“Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks”, 2021), herein Vasquez, in view of Yang et al. (“DASNet: Dynamic Activation Sparsity for Neural Network Efficiency Improvement”, 2019), herein Yang.
Regarding claim 1, Vasquez teaches: A computing system, comprising: a logic subsystem of one or more logic devices; a storage subsystem of one or more storage devices having instructions stored thereon executable by the one or more logic devices of the logic subsystem to implement a deep neural network that includes (pg. 1-2, I. Introduction, ¶5, To demonstrate the energy and compute efficiency improvement of our proposed method, we design a precision-scalable Process In-Memory (PIM) hardware platform that can support variable data-precision and pruning for different layers): an input layer for receiving inputs applied to the deep neural network; an output layer for outputting inferences based on the received inputs; a plurality of hidden layers interposed between the input layer and the output layer; a plurality of nodes disposed within and interconnecting the input layer, output layer, and hidden layers, wherein the nodes selectively operate on the inputs to generate and cause outputting of the inferences, and wherein operation of the nodes is controlled based on a set of parameters of the deep neural network (pg. 5, Tables II and III describe information for the various layers of the network selected);… and a quantization controller configured to selectively quantize the set of parameters of the deep neural network in a manner that is sparsity-dependent, such that a degree of quantization applied by the quantization controller to at least one or more non-sparsified parameters of the set of parameters within a parameter tensor of the deep neural network is based on which of the plurality of different sparsity states is applied by the sparsity controller to that parameter tensor (pg. 2, Section C. Activation Density, ¶2, We take advantage of this fact to quantize each layer of a network to lower bit-precisions based on AD – ¶1, AD is defined as the proportion of non-zero activations in a layer – see eq. 2.); wherein for each of the parameter tensors to which a sparsity state of the plurality of different sparsity states is applied by the sparsity controller, the degree of quantization applied by the quantization controller to the one or more non-sparsified parameters of that parameter tensor is dependent upon the proportion of the set of parameters that are sparsified within that parameter tensor (pg. 3, III. Methodology, ¶1, Once ADl stabilizes across all layers, we break the training process and then, perform quantization. The quantization bit-width kl for each layer l is calculated as- kl = round(klinitial ∗ ADl) – the quantization bit-width is determined based on the ratio of nonzero activations of a layer); and wherein for each non-sparsified parameter that is quantized by the quantization controller, the degree of quantization defines a magnitude of a reduction in a quantity of bits that encode that non-sparsified parameter within memory of the computing system (pg. 3, III. Methodology, ¶1, The quantization bit-width kl for each layer l is calculated as- kl = round(klinitial ∗ ADl) – the AD defines a magnitude of a reduction in the quantity of bits).
	Vasquez fails to teach: a sparsity controller configured to selectively apply a plurality of different sparsity states to parameter tensors of the deep neural network to control parameter density of the set of parameters within the parameter tensors, wherein each sparsity state differs from one or more other sparsity states of the plurality of different sparsity states with respect to a proportion of the set of parameters that are sparsified within each parameter tensor of the deep neural network to which that sparsity state is applied by the sparsity controller.
	However, in the same field of endeavor, Yang teaches: a sparsity controller configured to selectively apply a plurality of different sparsity states to parameter tensors of the deep neural network to control parameter density of the set of parameters within the parameter tensors, wherein each sparsity state differs from one or more other sparsity states of the plurality of different sparsity states with respect to a proportion of the set of parameters that are sparsified within each parameter tensor of the deep neural network to which that sparsity state is applied by the sparsity controller (pg. 7, Table III, different layers have different levels of activation sparsity).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply different levels of activation sparsity to different layers of a DNN as disclosed by Yang in the system disclosed by Vasquez to reduce unnecessary calculation (pg. 3, Part A., ¶1, selection of the winner rate p for each layer is a tradeoff process between the activation sparsification level and the model inference accuracy – and – Abstract, removing unimportant activations can reduce the amount of data communication and the computation cost).
	
Regarding claim 2, Vasquez further teaches: The computing system of claim 1… and wherein for each non-sparsified parameter that is quantized by the quantization controller, the degree of quantization defines a magnitude of a reduction in a quantity of bits that encode that non-sparsified parameter within memory of the computing system (pg. 3, III. Methodology, ¶1, Once ADl stabilizes across all layers, we break the training process and then, perform quantization. The quantization bit-width kl  for each layer l is calculated as- kl = round(klinitial ∗ ADl) – the bit-width is smaller for the layers with a lower activation density, i.e., more quantization for layers with high sparsity).
Vasquez fails to teach: wherein for a parameter tensor of the deep neural network, a first sparsity state and a second sparsity state of the plurality of different sparsity states; each parameter tensor of the deep neural network to which the first sparsity state is applied has a greater percentage of parameters of the set of parameters that are sparsified within that parameter tensor, and each parameter tensor of the deep neural network to which the second sparsity state is applied has a lesser percentage of parameters of the set of parameters that are sparsified within that parameter tensor.
However, in the same field of endeavor, Yang teaches: wherein for a parameter tensor of the deep neural network, a first sparsity state and a second sparsity state of the plurality of different sparsity states; each parameter tensor of the deep neural network to which the first sparsity state is applied has a greater percentage of parameters of the set of parameters that are sparsified within that parameter tensor, and each parameter tensor of the deep neural network to which the second sparsity state is applied has a lesser percentage of parameters of the set of parameters that are sparsified within that parameter tensor (pg. 7, Table III, different layers have different levels of activation sparsity, some with higher sparsity and some with lower sparsity).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply different levels of activation sparsity to different layers of a DNN as disclosed by Yang in the system disclosed by Vasquez to reduce unnecessary calculation (pg. 3, Part A., ¶1, selection of the winner rate p for each layer is a  tradeoff process between the activation sparsification level and the model inference accuracy – and – Abstract, removing unimportant activations can reduce the amount of data communication and the computation cost).

	Regarding claim 3, Vasquez fails to teach: The computing system of claim 1, wherein for a parameter of the set of parameters of the deep neural network, a first sparsity state of the plurality of different sparsity states causes sparsification of the parameter within one or more of the parameter tensors of the deep neural network, and a second sparsity state of the plurality of different sparsity states does not cause sparsification of the parameter within the one or more of the parameter tensors of the deep neural network.
However, in the same field of endeavor, Yang teaches: wherein for a parameter of the set of parameters of the deep neural network, a first sparsity state of the plurality of different sparsity states causes sparsification of the parameter within one or more of the parameter tensors of the deep neural network, and a second sparsity state of the plurality of different sparsity states does not cause sparsification of the parameter within the one or more of the parameter tensors of the deep neural network (pg. 4, Fig. 3, the graph measures various pruning levels applied to fully connected layers of two different models – some parameters must be sparse in the more pruned versions of models that are not sparse in the less pruned ones).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply different levels of activation sparsity to different layers of a DNN as disclosed by Yang in the system disclosed by Vasquez to reduce unnecessary calculation (pg. 3, Part A., ¶1, selection of the winner rate p for each layer is a  tradeoff process between the activation sparsification level and the model inference accuracy – and – Abstract, removing unimportant activations can reduce the amount of data communication and the computation cost).

	Regarding claim 6, Vasquez further teaches: The computing system of claim 1, wherein at least some of the set of parameters of the deep neural network are stored in each parameter tensor of the deep neural network, and wherein at least some of the parameter tensors of the deep neural network include separate exponent values for each of the parameters of the parameter tensor (pg. 2, Section II, Part A, ¶1, A 32-bit floating-point arithmetic (FP32) is the default in most of the modern deep learning implementations – FP32 uses a separate exponent value for parameters, i.e., each number has an exponent and a mantissa).

Regarding claim 11, it is a method that is similar to the system of claim 1 and is rejected on the same grounds – see above.

Claim(s) 5 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasquez in view of Yang as applied to claims 1 and 11 above, and further in view of the "IEEE Standard for Floating-Point Arithmetic, 2019", herein IEEE 754.
Regarding claim 5, Vasquez in view of Yang fails to explicitly teach: The computing system of claim 1, wherein selectively quantizing the set of parameters of the deep neural network includes a mantissa bit determination that differs between a first sparsity state and a second sparsity state of the plurality of different sparsity states.
However, in the same field of endeavor, IEEE 754 teaches: a mantissa bit determination that differs between a first sparsity state and a second sparsity state of the plurality of different sparsity states (pg. 19, Section 3.4, the leading bit of the significand, d0, is implicitly encoded in the biased exponent E. If the encoded value is zero, as in a sparsified parameter, then the leading bit is assumed to be 0 – If E = 0 and T = 0… v = (−1)S × (+0). Otherwise, its value is assumed to be 1 – normal numbers have an implicit leading significand bit of 1).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to determine a mantissa bit differently for sparse parameters as disclosed by IEEE in the system disclosed by Vasquez in view of Yang to utilize limited bits efficiently (pg. 23, Table 3.5, for 16-bit floats, for example, 11 bits of precision are achieved with 10 bits in the significand).

Regarding claim 12, it recites similar limitations to claim 5 and is rejected on the same grounds – see above.

Claim(s) 7 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasquez in view of Yang as applied to claims 1 and 11 above, and further in view of Nair et al. (US 20190042944 A1), herein Nair.
Regarding claim 7, Vasquez in view of Yang fails to explicitly teach: The computing system of claim 1, wherein at least some of the set of parameters of the deep neural network are stored in a parameter tensor of the deep neural network as: 1) a mantissa portion for each parameter of the parameter tensor, (2) a private exponent portion for each parameter of the parameter tensor, and (3) a shared exponent portion, wherein the shared exponent portion is common to each of the parameters of the parameter tensor and is not replicated in storage for each of the parameters, and wherein the private exponent portion and shared exponent portion collectively specify an exponent value for a respective parameter of the parameter tensor.
However, in the same field of endeavor, Nair teaches: wherein at least some of the set of parameters of the deep neural network are stored in a parameter tensor of the deep neural network as: 1) a mantissa portion for each parameter of the parameter tensor, (2) a private exponent portion for each parameter of the parameter tensor, and (3) a shared exponent portion (Fig. 2, mantissa portions 208, private exponent portions 206, and shared exponent 154 in tensor 200), wherein the shared exponent portion is common to each of the parameters of the parameter tensor and is not replicated in storage for each of the parameters, and wherein the private exponent portion and shared exponent portion collectively specify an exponent value for a respective parameter of the parameter tensor (¶16, a neural network training tensor in which each floating point number includes a single ON/OFF bit used to either (a) combine the exponent of the respective floating-point number with the shared exponent; or (b) do not combine the respective floating point exponent with the shared exponent (i.e., use only the floating point exponent)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to add the shared exponent disclosed by Nair in the system disclosed by Vasquez in view of Yang in order to represent more values with the same storage (¶16, while beneficially increasing the range of possible test values).

Regarding claim 13, it recites similar limitations to claim 7 and is rejected on the same grounds – see above.

Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasquez in view of Yang and Nair as applied to claim 7 above, and further in view of Mani et al. (US 20170054449 A1), herein Mani.
Regarding claim 8, Vasquez in view of Yang and Nair fails to teach: The computing system of claim 7, wherein a granularity of the shared exponent portion of the parameter tensor is dynamically reconfigurable.
However, in the same field of endeavor, Mani teaches: wherein a granularity of the shared exponent portion of the parameter tensor is dynamically reconfigurable (¶32, In a typical block floating point representation, a block of samples is represented as an exponent common to each sample and a mantissa for each sample. The common exponent is determined for the block of samples based on the largest magnitude sample in the block).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to dynamically reconfigure the shared exponent as disclosed by Mani in the system disclosed by Vasquez in view of Yang and Nair to allow for a wider range of parameters within the neural network (¶32, increase the dynamic range that can be represented by a limited number of bits).

Claim(s) 9, 10, 14 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasquez in view of Yang and Nair as applied to claims 7 and 13 above, and further in view of IEEE 754.
Regarding claim 9, Vasquez in view of Yang and Nair fails to teach: The computing system of claim 7, wherein the quantization controller is configured to selectively infer at least some of the mantissa portion for a parameter of the set of parameters within a parameter tensor of the deep neural network based on whether a first sparsity state or a second sparsity state of the plurality of different sparsity states applies to the parameter.
However, in the same field of endeavor, IEEE 754 teaches: selectively infer at least some of the mantissa portion for a parameter of the set of parameters within a parameter tensor of the deep neural network based on whether a first sparsity state or a second sparsity state of the plurality of different sparsity states applies to the parameter (pg. 19, Section 3.4, the leading bit of the significand, d0, is implicitly encoded in the biased exponent E. If the encoded value is zero, as in a sparsified parameter, then the leading bit is assumed to be 0 – If E = 0 and T = 0… v = (−1)S × (+0). Otherwise, its value is assumed to be 1 – normal numbers have an implicit leading significand bit of 1).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to infer some of the mantissa portion for a parameter as disclosed by IEEE 754 in the system disclosed by Vasquez in view of Yang and Nair to utilize limited bits efficiently (pg. 23, Table 3.5, for 16 bit floats, for example, 11 bits of precision are achieved with 10 bits in the significand).

Regarding claim 10, Vasquez in view of Yang and Nair fails to teach: The computing system of claim 9, wherein inferring at least some of the mantissa portion includes inferring a leading bit previously discarded by quantization based on whether the first sparsity state or the second sparsity state applies to the parameter.
However, in the same field of endeavor, IEEE 754 teaches: wherein inferring at least some of the mantissa portion includes inferring a leading bit previously discarded by quantization based on whether the first sparsity state or the second sparsity state applies to the parameter (pg. 19, Section 3.4, the leading bit of the significand, d0, is implicitly encoded in the biased exponent E. If the encoded value is zero, as in a sparsified parameter, then the leading bit is assumed to be 0 – If E = 0 and T = 0… v = (−1)S × (+0)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to infer some of the mantissa portion for a parameter as disclosed by IEEE 754 in the system disclosed by Vasquez in view of Yang and Nair to utilize limited bits efficiently (pg. 23, Table 3.5, for 16 bit floats, for example, 11 bits of precision are achieved with 10 bits in the significand).

	Regarding claims 14 and 15, they recite similar limitations to claims 9 and 10 respectively and are rejected on the same grounds – see above.

Claim(s) 16, 17 and 18 and is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasquez in view of Yang and IEEE.
Regarding claim 16, Vasquez teaches: A method of operating a deep neural network having an input layer, an output layer, a plurality of interposed hidden layers, and a plurality of nodes disposed within and interconnecting said input, output, and hidden layers, the method comprising, on a computing system: receiving inputs at the input layer; via operation of nodes within the input, hidden, and output layers, processing the inputs and outputting inferences from the output layer over a plurality of inference passes (pg. 5, Tables II and III describe information for the various layers of the network selected)… during one or more of the inference passes, selectively quantizing the set of parameters of the deep neural network in a manner that is sparsity-dependent, such that a degree of quantization applied to at least one or more non-sparsified parameters of the set of parameters within a parameter tensor of the deep neural network is based on which of the plurality of different sparsity states is applied to that parameter tensor (pg. 2, Section C. Activation Density, ¶2, We take advantage of this fact to quantize each layer of a network to lower bit-precisions based on AD – ¶1, AD is defined as the proportion of non-zero activations in a layer – see eq. 2); wherein for each of the parameter tensors to which a sparsity state of the plurality of different sparsity states is applied, the degree of quantization applied to the one or more non-sparsified parameters of that parameter tensor is dependent upon the proportion of the set of parameters that are sparsified within that parameter tensor; and wherein for each non-sparsified parameter that is quantized, the degree of quantization defines a magnitude of a reduction in a quantity of bits that encode that non-sparsified parameter within memory of the computing system (pg. 3, III. Methodology, ¶1, Once ADl stabilizes across all layers, we break the training process and then, perform quantization. The quantization bit-width kl for each layer l is calculated as- kl = round(klinitial ∗ ADl) – the quantization bit-width is determined based on the ratio of nonzero activations of a layer).
Vasquez fails to teach: during the plurality of inference passes, selectively applying a plurality of different sparsity states to parameter tensors of the deep neural network to selectively control parameter density of a set of parameters within the deep neural network, wherein each sparsity state differs from one or more other sparsity states of the plurality of different sparsity states with respect to a proportion of the set of parameters that are sparsified within each parameter tensor of the deep neural network to which that sparsity state is applied… 
However, in the same field of endeavor, Yang teaches: during the plurality of inference passes, selectively applying a plurality of different sparsity states to parameter tensors of the deep neural network to selectively control parameter density of a set of parameters within the deep neural network, wherein each sparsity state differs from one or more other sparsity states of the plurality of different sparsity states with respect to a proportion of the set of parameters that are sparsified within each parameter tensor of the deep neural network to which that sparsity state is applied (pg. 7, Table III, different layers have different levels of activation sparsity).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply different levels of activation sparsity to different layers of a DNN as disclosed by Yang in the method disclosed by Vasquez to reduce unnecessary calculation (pg. 3, Part A., ¶1, selection of the winner rate p for each layer is a tradeoff process between the activation sparsification level and the model inference accuracy – and – Abstract, removing unimportant activations can reduce the amount of data communication and the computation cost).
Vasquez in view of Yang fails to teach: and wherein each parameter tensor holds an exponent portion and a mantissa portion for at least some of the set of parameters of the parameter tensor; during the plurality of inference passes, inferring a value for the mantissa portion of a parameter of the set of parameters within a parameter tensor of the deep neural network based on whether a first sparsity state or a second sparsity state of the plurality of different sparsity states applies to the parameter tensor.
However, in the same field of endeavor, IEEE 754 teaches: and wherein each parameter tensor holds an exponent portion and a mantissa portion for at least some of the set of parameters of the parameter tensor; during the plurality of inference passes, inferring a value for the mantissa portion of a parameter of the set of parameters within a parameter tensor of the deep neural network based on whether a first sparsity state or a second sparsity state of the plurality of different sparsity states applies to the parameter tensor (pg. 19, Section 3.4, the leading bit of the significand, d0, is implicitly encoded in the biased exponent E. If the encoded value is zero, as in a sparsified parameter, then the leading bit is assumed to be 0 – If E = 0 and T = 0… v = (−1)S × (+0). Otherwise, its value is assumed to be 1 – normal numbers have an implicit leading significand bit of 1).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to infer some of the mantissa portion for a parameter as disclosed by IEEE 754 in the system disclosed by Vasquez in view of Yang to utilize limited bits efficiently (pg. 23, Table 3.5, for 16-bit floats, for example, 11 bits of precision are achieved with 10 bits in the significand).

Regarding claim 17, Vasquez in view of Yang fails to teach: The method of claim 16, wherein inferring the value for the mantissa portion of the mantissa portion of the parameter includes inferring a leading bit of the mantissa portion based on whether the parameter is sparsified.
However, in the same field of endeavor, IEEE 754 teaches: wherein inferring the value for the mantissa portion of the mantissa portion of the parameter includes inferring a leading bit of the mantissa portion based on whether the parameter is sparsified (pg. 19, Section 3.4, the leading bit of the significand, d0, is implicitly encoded in the biased exponent E. If the encoded value is zero, as in a sparsified parameter, then the leading bit is assumed to be 0 – If E = 0 and T = 0… v = (−1)S × (+0). Otherwise, its value is assumed to be 1 – normal numbers have an implicit leading significand bit of 1).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to infer a leading bit of the mantissa portion of a parameter as disclosed by IEEE 754 in the system disclosed by Vasquez to utilize limited bits efficiently (pg. 23, Table 3.5, for 16 bit floats, for example, 11 bits of precision are achieved with 10 bits in the significand).

Regarding claim 18, Vasquez in view of Yang fails to teach: The method of claim 17, wherein inferring the leading bit of the mantissa portion includes inferring the leading bit to be a zero value if the parameter is sparsified.
However, in the same field of endeavor, IEEE 754 teaches: wherein inferring the leading bit of the mantissa portion includes inferring the leading bit to be a zero value if the parameter is sparsified (pg. 19, Section 3.4, the leading bit of the significand, d0, is implicitly encoded in the biased exponent E. If the encoded value is zero, as in a sparsified parameter, then the leading bit is assumed to be 0 – If E = 0 and T = 0… v = (−1)S × (+0). Otherwise, its value is assumed to be 1 – normal numbers have an implicit leading significand bit of 1).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to infer the mantissa leading bit when a parameter is sparse as disclosed by IEEE 754 in the system disclosed by Vasquez in view of Yang to utilize limited bits efficiently (pg. 23, Table 3.5, for 16 bit floats, for example, 11 bits of precision are achieved with 10 bits in the significand).

Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasquez in view of Yang and IEEE as applied to claim 16 above, and further in view of Nair.
Regarding claim 19, Vasquez in view of Yang and IEEE 754 fails to teach: The method of claim 16, wherein at least some of the parameter tensors of the deep neural network include a shared exponent portion common to each of the parameters in the parameter tensor, and wherein the exponent portion for each of the parameters in the parameter tensor is a non-shared exponent portion that is useable together with the shared exponent portion to collectively specify an exponent value for the parameter.
However, in the same field of endeavor, Nair teaches: wherein at least some of the parameter tensors of the deep neural network include a shared exponent portion common to each of the parameters in the parameter tensor, and wherein the exponent portion for each of the parameters in the parameter tensor is a non-shared exponent portion that is useable together with the shared exponent portion to collectively specify an exponent value for the parameter (Fig. 2, mantissa portions 208, private exponent portions 206, and shared exponent 154 in tensor 200).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to add the shared exponent disclosed by Nair in the method disclosed by Vasquez in view of Yang and IEEE 754 in order to represent more values with the same storage (¶16, while beneficially increasing the range of possible test values).

Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasquez in view of Yang and IEEE as applied to claim 16 above, and further in view of Mani.
Regarding claim 20, Vasquez in view of Yang and IEEE 754 fails to teach: The method of claim 19, wherein a granularity of the shared exponent portion is dynamically reconfigurable.
However, in the same field of endeavor, Mani teaches: wherein a granularity of the shared exponent portion is dynamically reconfigurable (¶32, In a typical block floating point representation, a block of samples is represented as an exponent common to each sample and a mantissa for each sample. The common exponent is determined for the block of samples based on the largest magnitude sample in the block).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to dynamically reconfigure the shared exponent as disclosed by Mani in the system disclosed by Vasquez in view of Yang, Nair and Vasquez to allow for a wider range of parameters within the neural network (¶32, increase the dynamic range that can be represented by a limited number of bits).

Claim(s) 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasquez in view of Yang as applied to claim 1 above, and further in view of IEEE 754.
Regarding claim 21, Vasquez in view of Yang fails to explicitly teach: The computing system of claim 1, wherein for each non-sparsified parameter that is quantized, at least a leading bit of a mantissa portion of that non-sparsified parameter is discarded.
However, in the same field of endeavor, IEEE 754 teaches: wherein for each non-sparsified parameter that is quantized, at least a leading bit of a mantissa portion of that non-sparsified parameter is discarded (pg. 19, Section 3.4, the leading bit of the significand, d0, is implicitly encoded in the biased exponent E. If the encoded value is zero, as in a sparsified parameter, then the leading bit is assumed to be 0 – If E = 0 and T = 0… v = (−1)S × (+0). Otherwise, its value is assumed to be 1 – normal numbers have an implicit leading significand bit of 1).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to discard the mantissa leading bit when a parameter is sparse as disclosed by IEEE 754 in the system disclosed by Vasquez in view of Yang to utilize limited bits efficiently (pg. 23, Table 3.5, for 16 bit floats, for example, 11 bits of precision are achieved with 10 bits in the significand).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HARRISON CHAN YOUNG KIM whose telephone number is (571)272-0713. The examiner can normally be reached Monday - Thursday 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached at (571) 272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HARRISON C KIM/               Examiner, Art Unit 2145       



/CESAR B PAULA/               Supervisory Patent Examiner, Art Unit 2145
Read full office action
SPARSITY AND QUANTIZATION FOR DEEP NEURAL NETWORKS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

SPARSITY AND QUANTIZATION FOR DEEP NEURAL NETWORKS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email