Last updated: May 04, 2026
Application No. 17/547,158
METHOD AND SYSTEM FOR BIT QUANTIZATION OF ARTIFICIAL NEURAL NETWORK

Non-Final OA §101§103
Filed
Dec 09, 2021
Priority
Feb 25, 2019 — RE 10-2019-0022047 +3 more
Examiner
KAPOOR, DEVAN
Art Unit
2126
Tech Center
2100 — Computer Architecture & Software
Assignee
Deepx Co. Ltd.
OA Round
3 (Non-Final)
Interview Optional

— +16.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 11% grant rate with +16.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 9 resolved cases, 2023–2026
Examiner Intelligence

KAPOOR, DEVAN View full profile →
Grants only 11% of cases
Career Allowance Rate
1 granted / 9 resolved
-43.9% vs TC avg
Strong +17% interview lift
Without
With
+16.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
34 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
37.4%
-2.6% vs TC avg
§103
45.2%
+5.2% vs TC avg
§102
10.5%
-29.5% vs TC avg
§112
5.7%
-34.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 9 resolved cases
Office Action

§101 §103
DETAILED ACTION
This action is responsive to the application filed on 12/23/2025. Claims 1-9, 11-20 are pending and have been examined. This action is Non-final.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C.
120, 121, 365(c), or 386(c) is acknowledged.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/23/2025 has been entered.

Response to Arguments
Applicant's arguments filed 06/13/2025 have been fully considered but they are not persuasive, aside from
otherwise indicated. Please see the following responses below:
Argument 1:  The applicant argues that the pending claims are not directed to an abstract idea under 35 U.S.C. 101 because, when viewed as a whole, the claims are directed to a concrete hardware-based implementation for executing a quantized artificial neural network rather than a mere mathematical algorithm. In particular, the applicant asserts that independent claims 1, 14, and 20 recite specific circuits that store quantized parameters, repeatedly reduce bit width subject to an accuracy constraint, and then execute convolution and accumulation using dedicated hardware, including a feedback-loop accumulation structure. The applicant emphasizes that the claims tie quantization outcomes to how the network is physically stored and executed in an accelerator datapath, rather than merely manipulating numbers or storing data. The applicant further argues that dependent claims 9 and 19 strengthen eligibility by reciting a specific multiplier-array and adder-tree architecture in which arithmetic bit width corresponds to the quantization bits of the neural network, such that the computation itself is performed at the reduced bit width. According to the applicant, these limitations describe a specific technological implementation that improves computer hardware operation and therefore integrate any alleged mathematical concepts into a practical application, making the claims eligible under Section 101.
Examiner Response to Argument 1: The examiner has considered the argument set forth above, however in light of the amendments, responds that the applicant has not shown that the claims are patent eligible under 35 U.S.C. 101. The claims remain directed to abstract ideas because they focus on performing bit quantization, comparing accuracy to a target value, and repeating quantization based on that comparison, which are mathematical concepts and mental processes. The added claim language describing hardware, memory, processing units, multipliers, adder trees, accumulators, and feedback loops does not integrate these abstract ideas into a practical application, because these elements are recited at a high level and perform their ordinary, well known functions of storing data and carrying out arithmetic operations. Using memory to store quantized parameters, using a processor to perform convolution, and using multipliers and adders to multiply and sum values merely apply the abstract quantization logic on generic computing hardware. Tying arithmetic bit width to quantization bits also describes a mathematical relationship rather than a technological improvement. The dependent claims similarly add mental steps, such as selecting parameters or layers and determining final bit sizes, or add routine computer features, such as different memory types, caches, additional processing units, or storing results, which are field of use limitations, extra solution activity, or well understood, routine, and conventional operations. Accordingly, the claims do not integrate the abstract ideas into a practical application and do not provide significantly more than the judicial exception, and the rejection under 35 U.S.C. 101 is maintained.
Argument 2: The applicant argues that the Section 103 rejections are improper because the cited combinations of Guo, Zhou, and the additional references fail to teach or suggest the specific hardware structure recited in the amended claims. For independent claims 1, 14, and 20, the applicant contends that Guo at most teaches iterative quantization and Zhou at most teaches low-bit-width convolution, but neither reference, alone or in combination, discloses or suggests the claimed plurality of multipliers operably connected to an adder tree and an adder-accumulator feedback loop configured for accumulation. The applicant further argues that the rejections improperly rely on Ha and similar references to supply missing structure by citing generic lists of hardware components without disclosing the claimed interconnections or datapath architecture. For claims 9 and 19, the applicant specifically asserts that the prior art does not teach an array of multipliers feeding an adder tree with a feedback-loop accumulator, nor arithmetic that is explicitly sized to correspond to the quantization bit width of the neural network. The applicant also argues that the Office Action fails to provide a reasoned motivation that would lead a person of ordinary skill in the art from the cited quantization and convolution techniques to the specific claimed hardware configuration, asserting that a general motivation to use hardware for arithmetic is insufficient. Based on these points, the applicant concludes that the amended independent claims and their dependent claims are patentable over the cited prior art.
Examiner Response to Argument 2: The examiner has considered the argument set forth above, however in light of the amendments, responds that the cited combination teaches the specific structural datapath that applicant alleges is missing, including the multipliers to adder tree interconnection and the feedback loop accumulation structure. In particular, Appuswamy teaches a plurality of multipliers whose outputs are added into a single sum by an adder tree, which corresponds to the claimed third circuit having a plurality of multipliers operably connected to at least one adder tree. Appuswamy further teaches accumulation using a partial sum register that holds a previously computed partial sum, with the partial sum being fed back to an array of adders to add a new partial sum with the previously computed one, together with feedback paths and registers, which corresponds to the claimed fourth circuit including an adder and an accumulator configured as a feedback loop for accumulation and being operably connected to the processing circuit output. Therefore, the rejection does not rely on a generic list of components to supply the architecture, but instead relies on Appuswamy for the actual datapath structure and interconnections recited in amended claims 1, 9, 14, 19, and 20. With respect to the applicant argument that the memory and processing unit connection is not taught, Jouppi teaches weights staged through an on chip weight FIFO that reads from off chip weight memory and intermediate results held in an on chip unified buffer that serve as inputs to the compute unit, which corresponds to a first circuit memory storing weight kernel values and feature map values and being operably connected to the second circuit processing unit that performs convolution using stored parameters. With respect to the applicant argument that the quantization procedure with repeated quantization based on an accuracy target is not taught in the combination, Guo teaches an iterative quantization procedure that converts full precision weights to low precision representations without loss of model accuracy, using repeated cycles of quantizing and retraining until the weights are converted, which corresponds to selecting parameters having a first bit size, executing bit quantization to reduce the data representation to a smaller bit size, determining whether accuracy meets a target, and responsive to meeting the target repeatedly executing quantization to further reduce bit size to a third bit size. Accordingly, the combination of Appuswamy, Jouppi, and Guo collectively teaches the amended claim 1 structure, including the specific multiplier array and adder tree arrangement, the feedback loop accumulation configuration, the memory structures storing weights and activations and supplying them to the compute unit, and the iterative quantization steps tied to an accuracy constraint. For claim 9 and claim 19, the applicant argues that the prior art does not teach an array of multipliers feeding an adder tree, a feedback loop accumulating summed results, or arithmetic tied to quantization bits, however Appuswamy teaches the multiplier array feeding the adder tree and the feedback based partial sum accumulation, and Jouppi teaches quantized integer multiply and add operations at defined bit widths for inference, which corresponds to performing elementwise multiplication using quantized operands and using arithmetic having bit widths that correspond to the quantization precision used in the network. Finally, the proposed combination is supported by a clear rationale because Jouppi explains the advantage of quantization for efficient inference by transforming floating point numbers into narrow integers, and Guo teaches iterative quantization while maintaining model accuracy, so it would have been obvious to apply Guo’s bit conversion technique within the neural network inference hardware of Appuswamy and with the weight and activation storage and staging taught by Jouppi to reduce memory and compute cost while maintaining accuracy during accelerated inference. Therefore, the applicant arguments do not overcome the rejection of amended claims 1, 9, 14, 19, and 20 under 35 U.S.C. 103.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition
of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the
conditions and requirements of this title. 
Claims 1-9, 11-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1, 
Step 1: The claim is directed to hardware, which falls under the category of machine. The claim satisfies step 1. 
Step 2A Prong 1: 
“executing bit quantization on the at least one parameter to reduce a size of a data representation for the selected at least one parameter, the at least one parameter having a second bit size that is less than the first bit size after the bit quantization,” -- The limitation is directed to performing bit quantization on parameter groups to reduce the size of data representation. Bit quantization is a known mathematical concept that is being used to perform the limitation, and thus it is directed to math.
 “selecting at least one parameter from the parameters the at least one parameter having first bit size…determining whether accuracy of the artificial neural network according to the bit quantization applied to the selected at least one parameter is greater than or equal to a target value with the at least one parameter having the second bit size,”– This limitation is directed to determining an accuracy of a neural network based on the application of bit quantization on selected parameters, and observing/evaluating the parameter than is greater than or equal to a “target” value, and to continue execution of bit quantization once the target value has been considered. The claim can be performed in the human mind as well as using pen and paper to complete the tasks, thus it is considered a mental process.
Step 2a Prong 2 and Step 2B: 
“A hardware for an artificial neural network configured to process a particularly quantized artificial neural network having particular elements with a particular bit size for acceleration purpose, the hardware comprising: a first circuit provided for a memory configured to store element values of a weight kernel or element values of a feature map of the artificial neural network in which all parameters are quantized, wherein the artificial neural network is quantized by:”-- The limitation recites hardware and configured memory’s instructions and additional elements to be applied merely onto a computer, (the artificial neural network, memory, processing, circuit for memory configured), which cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(f)). 
“storing the at least one parameter having the third bit size in the first circuit” -- The limitation recites storing the quantized parameter. Storing data to be manipulated or used by a computer is an insignificant extra-solution activity, which does not integrate the judicial exception into a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, storing data in memory/circuit storage is a well-understood, routine, and conventional activity (WURC) and cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)).
“a second circuit provided for a processing unit configured to perform convolution using the parameters stored in the first circuit including the at least one parameter having the third bit size”  -- The limitation recites performing convolution using stored parameters, including the quantized parameter. This is merely using a processing unit to carry out computations based on the results of the quantization/accuracy logic and does not integrate the judicial exception into a practical application. The limitation amounts to applying the exception with a generic processor performing its ordinary function (see MPEP 2106.05(f)).
“the second circuit including a third circuit provided for a plurality of multipliers operably connected to at least one adder tree” -- The limitation recites multipliers and an adder tree as arithmetic components to support convolution. Multipliers and adders/adder trees are generic computing components used for routine arithmetic operations. Reciting these generic components in functional terms does not integrate the judicial exception into a practical application, nor add significantly more than the judicial exception (see MPEP 2106.05(f)). 
“and a fourth circuit provided for an accumulation operation of an output of the second circuit, the fourth circuit including an adder and an accumulator to configure a feedback-loop for 29 the accumulation operation, the fourth circuit being operably connected to the second circuit, the 30 second circuit being operably connected to the first circuit.” -- he limitation recites accumulation using an adder and accumulator arranged as a feedback loop, and generic “operably connected” interconnections. This describes a conventional accumulation arrangement for summing results, and is recited at a high level as generic circuit elements performing their ordinary functions. As such, it does not integrate the judicial exception into a practical application and does not provide significantly more than the exception (see MPEP 2106.05(f)). 
Thus, claim 1 is non-patent eligible. Claim 14 and 20 are analogous to claim 1, with the exception of type of claim, and thus claims 14 and 20 would face the same rejection set forth above.
Regarding claim 2, 
Step 1: The claim is directed to a machine, which is an allowable subject matter. The claim satisfies step 1.
Step 2A Prong 1:
“The hardware of claim 1, wherein in the quantized artificial neural network, the at least one parameter is sequentially quantized based on an amount of computation or an amount of memory.” – The limitation is directed to sequentially quantizing at least one parameter or a group of parameters once the computation amount and memory is observed and evaluated, which is directed to a mental process.  
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 2 is non-patent eligible. 

Regarding claim 3, 
Step 1: The claim is directed to a machine, which is an allowable subject matter. The claim satisfies step 1.
There are no elements to be evaluated under step 2A Prong 1. 
Step 2A Prong 2 and Step 2B:
“The hardware of claim 1, wherein the processing unit is further configured to process the quantized artificial neural network by at least one of a computational cost bit quantization method, a forward bit quantization method, or a backward bit quantization method.” – The limitation recites that the processing unit will be further configured with further limitations on the different types of quantization methods, which is not an integration in a practical application, nor will it provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Thus, claim 3 is non-patent eligible. 

Regarding claim 4, 
Step 1: The claim is directed to a machine, which is an allowable subject matter. The claim satisfies step 1.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B:
“The hardware of claim 2, wherein the amount of computation and the amount of memory of the quantized artificial neural network are relatively reduced compared to those before quantization,” – The limitation is directed to the computation amount and memory being reduced compared to the values before quantization takes place, which is considered to be an insignificant extra-solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under step 2B, the act reducing computation amount and memory is a well-understood, routine, and conventional activity (WURC) that cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
“and a number of bits of each data of the at least one parameter stored in the memory is reduced.” – The limitation recites number of bits of a parameter and its group that is stored in memory is reduced. This limitation recites an insignificant, extra-solution activity as reducing/manipulating stored values cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under step 2B, the act of storing data and/or retrieving data that is in memory is a well-understood, routine and conventional activity (WURC), and thus cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
Thus, claim 4 is non-patent eligible. 

Regarding claim 5, 
Step 1: The claim is directed to a machine, which is an allowable subject matter. The claim satisfies step 1.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B:
“The hardware of claim 1, wherein the memory includes at least one of a buffer memory, a register memory, or a cache memory.” – The limitation recites that the memory will merely be instructed to apply different types of memory (buffer, register, or cache), which cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(f)). 
Thus, claim 5 is non-patent eligible. 

Regarding claim 6, 
Step 1: The claim is directed to a machine, which is an allowable subject matter. The claim satisfies step 1.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B:
“The hardware of claim 1, wherein the quantized artificial neural network includes a plurality of layers” – The limitation recites that the quantized artificial NN will further include a group of layers, which merely just recites further limitations as a means to limit the field of use/environment, and thus cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)). 
“wherein a size of a data bit of a data path through which data of a specific layer among the plurality of layers is transmitted is reduced in a unit of bits.” – The limitation recites a transmitting data bit sizes through a data path to be transmitted through layers of the neural network, which is considered to be an insignificant extra-solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, the act of transmitting/receiving data over a network, which cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)).  
Thus, claim 6 is non-patent eligible. 

Regarding claim 7, 
Step 1: The claim is directed to a machine, which is an allowable subject matter. The claim satisfies step 1.
Step 2A Prong 1:
Step 2A Prong 2 and Step 2B:
“The hardware of claim 1, wherein in the quantized artificial neural network, bit quantization is executed to reduce a storage size in memory” – The limitation recites that bit quantization will be executed onto the quantized NN (computer) for the purpose of reducing size of storage in memory, which is considered mere instructions to apply onto a computer in a generic manner, which cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(f)). 
“configured to store the at least one parameter. – The limitation recites configured memory to store a parameter or a group of parameters, which is considered an insignificant, extra-solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under step 2B, the act of storing and/or retrieving data in memory is a well-understood, routine and conventional activity (WURC), and thus cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
Thus, claim 7 is non-patent eligible. 

Regarding claim 8, 
Step 1: The claim is directed to a machine, which is an allowable subject matter. The claim satisfies step 1.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B:
“The hardware of claim 1, wherein the memory further includes at least one of a weight kernel cache or an input feature map cache.” – The limitation recites memory that will further include a weight kernel or input feature map cache, which is merely limiting the type of memory that will be included in the hardware, which cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 8 is non-patent eligible. 

Regarding claim 9,Step 1: The claim is directed to a machine, which is an allowable subject matter. The claim satisfies step 1.
Step 2A Prong 1:(a) “configured to perform elementwise multiplication using quantized operands…the adder tree being configured to sum result values of the elementwise multiplication using arithmetic having a bit width corresponding to a number of quantization bits of the quantized artificial neural network,” -- The limitation is directed to performing elementwise multiplication and summing result values, as well as an adder tree to compute sum of result values of the multiplication using arithmetic. The limitation is directed mathematical operations/calculation/relationship, and thus it is directed to math.(b) “using arithmetic having a bit width corresponding to a number of quantization bits of the quantized artificial neural network” -- The limitation is directed to applying a numerical relationship/constraint (bit width corresponding to quantization bits) to arithmetic operations, which is a mathematical concept, and thus it is directed to math.
Step 2A Prong 2 and Step 2B:(a) “The hardware of claim 1, wherein the plurality of multipliers forms an array of multipliers and the at least one adder tree has an input operably connected to an output of the array of multipliers” -- This limitation recites generic hardware components (multipliers and an adder tree) and a generic interconnection (“operably connected”) to perform the recited mathematical operations, which merely limits the execution of the judicial exception to a particular machine environment and/or field of use (ANN hardware execution) and therefore cannot integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).(b) “wherein the feedback loop is configured to accumulate the summed result values output by the adder tree” -- This limitation recites that the feedback loop will be configured to accumulate the summed result values that were outputted by the output tree. The limitation amounts no more than mere further limiting to a field of use/environment, and does not integrate to a practical application, nor provides significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 9 is non-patent eligible. Claim 19 is analogous to claim 9, aside from claim type (machine vs process), and thus the same rejection can be applied set forth above. 

Regarding claim 11, 
Step 1: The claim is directed to a machine, which is an allowable subject matter. The claim satisfies step 1.
There are no elements to be evaluated under Step 2A Prong 1.
Step 2A Prong 2 and Step 2B:
“The hardware of claim 1, further comprising: an output activation map cache configured to store a result value of convolution of the processing unit.” – The limitation recites an output activation map cache that will be configured to store data (the result value of the convolution operation). This limitation is directed to mere data gathering, and it’s an insignificant, extra-solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, the act of storing data/retrieving in memory or a device is a well-understood, routine, and conventional activity (WURC), and it cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
Thus, claim 11 is non-patent eligible. 

Regarding claim 12, 
Step 1: The claim is directed to a machine, which is an allowable subject matter. The claim satisfies step 1.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B:
“The hardware of claim 1, wherein the processing unit further includes a plurality of convolution processing units.” –The limitation recites a processing unit that will further me limited to include a plurality of convolution processing units, which is merely just limiting the processing unit to a particular environment/field of use, and thus cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 12 is non-patent eligible. 

Regarding claim 13,
Step 1: The claim is directed to a machine, which is an allowable subject matter. The claim satisfies step 1.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 1:
“sum result values of convolution of each of the plurality of convolution processing units.” – The limitation recites sum result values for convolution for each of the convolution processing units. The limitation is directed to the use of a mathematical concept to perform, and thus the limitation is directed to math. 
Step 2A Prong 2 and Step 2B:
“The hardware of claim 12, wherein the processing unit further includes a tree adder configured to” The limitation recites the processing unit will further include a tree adder The limitation is merely limiting the processing unit to a particular field of use/environment by include a tree adder to sum convoluted sums for the processing units, and thus it cannot be integrated to a practical application, nor can it provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Thus, claim 13 is non-patent eligible. 

Regarding claim 15,
Step 1: The claim is directed to a method, which is considered to be under the category of process. The claim satisfies step 1. 
Step 2A Prong 1:
“The method of claim 14, further comprising: determining the third bit size of the data representation for the at least one parameter of the selected layer satisfies the accuracy greater than or equal to the target value as a final number of bits for the at least one parameter of the selected layer, responsive to the accuracy of the artificial neural network being less than the target value using a fourth bit size for the at least one parameter that is less than the third bit size.” – The limitation is directed to determining a size of data representations for parameters of a selected layer that will satisfy an accuracy threshold compared to the target value. The act of determining size of representations and comparing a value toa threshold can be performed using evaluation, observation, and judgment in the human mind, and thus the limitation is directed to a mental process. 
There are no further elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 15 is non-patent eligible. 

Regarding claim 16,
Step 1: The claim is directed to a method, which is considered to be under the category of process. The claim satisfies step 1. 
Step 2A Prong 1:
“The method of claim 15, further comprising: selecting at least one layer in which the final number of bits for the parameter is not determined among the plurality of layers”—The limitation is directed to selecting a layer where the layer would not determine a final number of bits compared to a group of layers. The act of selecting and comparing amongst a group of layers is directed to a process that can be performed using the human mind, with pen and paper if needed, thus the process is directed to a mental process. 
Step 2A Prong 2 and Step 2B:
“and repeatedly executing the bit quantizing to determine the final number of bits of the selected at least one layer in which the final number of bits is not determined.” – The limitation merely recites instructions to apply executing bit quantization (calculation) to determine a final number of bits from an selected layer. The limitation cannot be integrated to a practical application, nor can it provide significantly more than the judicial exception (see MPEP 21006.05(f)). 
Thus, claim 16 is non-patent eligible. 

Regarding claim 17,
Step 1: The claim is directed to a method, which is considered to be under the category of process. The claim satisfies step 1. 
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B:
“The method of claim 14, wherein the third bit size is 1 bit.” – The limitation recites that the third bit size is 1 bit. This limitation is merely limiting the bit quantization performed in earlier claims to a particular field of use/environment, and it cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 17 is non-patent eligible. 

Regarding claim 18,
Step 1: The claim is directed to a method, which is considered to be under the category of process. The claim satisfies step 1. 
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B:
“The method of claim 14, wherein the at least one parameter of the selected at least one layer includes at least one of weight data, feature map data, or activation map data.” –The limitation recites the parameter of the selected layer will further include different types of data, which is merely limiting the parameter to a particular field of use/environment and it cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 18 is non-patent eligible. 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this
Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not
identically disclosed as set forth in section 102, if the differences between the claimed invention and the
prior art are such that the claimed invention as a whole would have been obvious before the effective filing
date of the claimed invention to a person having ordinary skill in the art to which the claimed invention
pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are
summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness. 

Claim(s) 1,9, 14, and 19-20 is/are rejected under 35 U.S.C. § 103 as being unpatentable over Appuswamy et al. US10621489B2 (referred herein as Appuswamy) in view of  NPL reference, “In-Datacenter Performance Analysis of a Tensor Processing Unit.” by Jouppi et. al. (referred herein as Jouppi) further in view of Guo et al. “Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights” (referred herein as Guo).

Regarding claim 1, Appuswamy teaches:
 A hardware for an artificial neural network configured to process a particularly quantized artificial neural network having particular elements with a particular bit size for acceleration purpose, ([Appuswamy, col. 7, lines 30-35] “A typical neuron activation takes an n-element vector X, nxm weight matrix W, calculates … and produces an m-element output vector Y… [computed by] a vector matrix multiplier”, and [Appuswamy, page 1] “Massively parallel neural inference computing elements are provided. A plurality of multipliers is arranged in a plurality of equal-sized groups. Each of the plurality of multipliers is adapted to, in parallel, apply a weight to an input activation to generate an output.”, wherein the examiner interprets “neural inference computing elements” with “multipliers” that “apply a weight to an input activation” to be the same as hardware for an artificial neural network configured to process a particularly quantized artificial neural network because both are directed to specialized hardware circuits that perform neural network computations using weight parameters applied to input data in a massively parallel architecture).
the second circuit including a third circuit provided for a plurality of multipliers operably connected to at least one adder tree; and ([Appuswamy, col. 9, lines 1-8] “inputs … are distributed to n multipliers… Each multiplier computes a product, with the products being added into a single sum by the adder tree.”, and [Appuswamy, col. 5, lines 25-26: “n parallel multipliers followed by an adder tree…”, wherein the examiner interprets “n multipliers” with “products being added into a single sum by the adder tree” to be the same as a third circuit provided for a plurality of multipliers operably connected to at least one adder tree because both are directed to a structural arrangement where multiple multiplier units feed their outputs into an adder tree that sums the products, representing the exact interconnection of multipliers and adder tree recited in the claim).
a fourth circuit provided for an accumulation operation of an output of the second circuit, the fourth circuit including an adder and an accumulator to configure a feedback-loop for the accumulation operation, ([Appuswamy, col. 10, lines 1-5] “The vector register holds the previously computed partial sum… This partial sum is fed back to the array of adders… [which] adds the new partial sum…an array of adders, registers, feedback paths…”, and [Appuswamy, col. 5, lines 60-67] “a partial sum register can store partial sum vectors… and m parallel adders to add the new … and previously computed ones…”, wherein the examiner interprets “vector register” that “holds the previously computed partial sum” which is “fed back to the array of adders” that “adds the new partial sum” along with “registers, feedback paths” to be the same as a fourth circuit provided for an accumulation operation of an output of the second circuit, the fourth circuit including an adder and an accumulator to configure a feedback-loop for the accumulation operation because both are directed to a circuit structure with storage elements (registers/accumulators) and adders arranged such that previously computed partial sums are fed back and combined with new partial sums, implementing the claimed feedback-loop accumulation architecture).
the fourth circuit being operably connected to the second circuit, ([Appuswamy, col. 10, lines 1-10] “The vector register holds the previously computed partial sum .. This partial sum is fed back to the array of adders”, wherein the examiner interprets the explicit datapath from the multiplier/adder tree output to the partial sum registers with feedback to adders to be the same as the fourth circuit being operably connected to the second circuit because both are directed to the connectivity where the processing circuit's output connects to the accumulation circuit).
Appuswamy does not teach the hardware comprising: a first circuit provided for a memory configured to store element values of a weight kernel or element values of a feature map of the artificial neural network in which all parameters are quantized, wherein the artificial neural network is quantized by: selecting at least one parameter from the parameters, the at least one parameter having a first bit size, executing bit quantization on the at least one parameter to reduce a size of a data representation for the selected at least one parameter, the at least one parameter having a second bit size that is less than the first bit size after the bit quantization, determining whether accuracy of the artificial neural network according to the bit quantization applied to the selected at least one parameter is greater than or equal to a target value with the at least one parameter having the second bit size, responsive to the accuracy of the artificial neural network being greater than or equal to the target value, repeatedly executing the bit quantization on the at least one parameter one or more additional times such that the at least one parameter has a third bit size that is less than the second bit size; size, and storing the at least one parameter having the third bit size in the first circuit; and a second circuit provided for a processing unit configured to perform convolution using the parameters stored in the first circuit including the at least one parameter having the third bit size, … the second circuit being operably connected to the first circuit.
Jouppi teaches the hardware comprising: a first circuit provided for a memory configured to store element values of a weight kernel or element values of a feature map of the artificial neural network in which all parameters are quantized, ([Jouppi, page 3] “The weights for the matrix unit are staged through an on-chip Weight FIFO that reads from an off-chip 8 GiB DRAM called Weight Memory (for inference, weights are read-only; 8 GiB supports many simultaneously active models). The weight FIFO is four tiles deep. The intermediate results are held in the 24 MiB on-chip Unified Buf er, which can serve as inputs to the Matrix Unit.” and [Jouppi, page 1] “A step called quantization transforms floating-point numbers into narrow integers - often just 8 bits - which are usually good enough for inference. Eight-bit integer multiplies can be 6X less energy and 6X less area than IEEE 754 16-bit floating-point multiplies”, wherein the examiner interprets “Weight Memory” that stores “weights for the matrix unit” and “Unified Buffer” that holds “intermediate results” serving as “inputs to the Matrix Unit” to be the same as a first circuit provided for a memory configured to store element values of a weight kernel or element values of a feature map because both are directed to on-chip or off-chip memory structures that hold the parameters and activation data used in neural network layer computations. The examiner further interprets “quantization transforms floating-point numbers into narrow integers, often just 8 bits” to be the same as the artificial neural network in which all parameters are quantized because both are directed to reducing the precision of neural network parameters from higher-precision representations to lower-precision integer representations for efficient inference).
 and storing the at least one parameter having the third bit size in the first circuit; ([Jouppi, page 3] “The weights for the matrix unit are staged through an on-chip Weight FIFO that reads from an off-chip 8 GiB DRAM called Weight Memory”, wherein the examiner interprets “weights” being “staged” and stored in “Weight Memory” to be the same as storing the at least one parameter having the third bit size in the first circuit because both are directed to placing the quantized neural network parameters into memory storage that will be accessed during inference computations).
and a second circuit provided for a processing unit configured to perform convolution using the parameters stored in the first circuit including the at least one parameter having the third bit size, ([Jouppi, page 3] “It reads and writes 256 values per clock cycle and can perform either a matrix multiply or a convolution.”, wherein the examiner interprets the hardware that “can perform either a matrix multiply or a convolution” to be the same as a second circuit provided for a processing unit configured to perform convolution because both are directed to a dedicated compute unit that executes convolution operations).
 the second circuit being operably connected to the first circuit. ([Jouppi, page 3] “The weights for the matrix unit are staged through an on-chip Weight FIFO that reads from an off-chip 8 GiB DRAM called Weight Memory … The intermediate results are held in the 24 MiB on-chip Unified Buffer, which can serve as inputs to the Matrix Unit”, wherein the examiner interprets the staging of “weights” from “Weight Memory” through “Weight FIFO” to “the matrix unit” and the “Unified Buffer” serving as “inputs to the Matrix Unit” to be the same as the second circuit being operably connected to the first circuit because both are directed to the connectivity where the memory circuit supplies parameters and input data to the processing/compute circuit).
Appuswamy and Jouppi do not teach wherein the artificial neural network is quantized by: selecting at least one parameter from the parameters, the at least one parameter having a first bit size, executing bit quantization on the at least one parameter to reduce a size of a data representation for the selected at least one parameter, the at least one parameter having a second bit size that is less than the first bit size after the bit quantization, determining whether accuracy of the artificial neural network according to the bit quantization applied to the selected at least one parameter is greater than or equal to a target value with the at least one parameter having the second bit size, responsive to the accuracy of the artificial neural network being greater than or equal to the target value, repeatedly executing the bit quantization on the at least one parameter one or more additional times such that the at least one parameter has a third bit size that is less than the second bit size;
Guo teaches wherein the artificial neural network is quantized by: selecting at least one parameter from the parameters, the at least one parameter having a first bit size, executing bit quantization on the at least one parameter to reduce a size of a data representation for the selected at least one parameter, the at least one parameter having a second bit size that is less than the first bit size after the bit quantization, determining whether accuracy of the artificial neural network according to the bit quantization applied to the selected at least one parameter is greater than or equal to a target value with the at least one parameter having the second bit size; ([Guo, page 3] “Suppose a pre-trained full-precision (i.e., 32-bit floating-point) CNN model can be represented by {W_i: 1 ≤ l ≤ L}, … Given a pre-trained full-precision CNN model, the main goal of our INQ is to convert all 32-bit floating-point weights to be either powers of two or zero without loss of model accuracy”; [Guo, page 2] “Weight partition … divides the weights … into two disjoint groups … The weights in the first group are quantized … The weights in the other group are re-trained … these three operations are repeated … until all the weights are converted … lossless network quantization,”, wherein the examiner interprets “Given a pre-trained full-precision CNN model” with “32-bit floating-point” to be the same as at least one parameter having a first bit size because both are directed to the network's original full precision word length before any quantization begins. The examiner further interprets “Weight partition … divides the weights … into two disjoint groups … The weights in the first group are quantized” to be the same as selecting at least one parameter from the parameters and executing bit quantization on the at least one parameter to reduce a size of a data representation for the selected at least one parameter because both are directed to partitioning parameters and applying quantization to a selected subset to convert them from full precision to a reduced bit representation. The examiner further interprets “convert all 32-bit floating-point weights to be either powers of two or zero” to be the same as the at least one parameter having a second bit size that is less than the first bit size after the bit quantization because both are directed to transforming parameters from 32-bit floating-point to a lower-precision representation. The examiner further interprets “quantized …re-trained …. repeated until all the weights are converted … lossless network quantization” to be the same as determining whether accuracy of the artificial neural network according to the bit quantization applied to the selected at least one parameter is greater than or equal to a target value because both are directed to an iterative process where quantization and retraining are repeated until the network maintains acceptable accuracy, with the lossless criterion serving as the target accuracy threshold that gates further quantization).
Guo teaches responsive to the accuracy of the artificial neural network being greater than or equal to the target value, repeatedly executing the bit quantization on the at least one parameter one or more additional times such that the at least one parameter has a third bit size that is less than the second bit size; ([Guo, page 2] “these three operations are repeated … until all the weights are converted … lossless network quantization,”, wherein the examiner interprets the repeated cycle of “quantized…re-trained….repeated until all the weights are converted” to be the same as responsive to the accuracy of the artificial neural network being greater than or equal to the target value, repeatedly executing the bit quantization on the at least one parameter one or more additional times such that the at least one parameter has a third bit size that is less than the second bit size because both are directed to an iterative multi-stage quantization process where, upon meeting the accuracy criterion in one quantization stage, additional quantization operations are applied in subsequent stages to further reduce parameter precision while maintaining the lossless accuracy requirement).
Appuswamy, Jouppi, Guo, and the instant application are analogous art because they are all directed to specialized neural network inference hardware architectures that accelerate neural network computations.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the multiplier adder system disclosed by Appuswamy to include the 8 GiB RAM for weight memory disclosed by Jouppi. One would be motivated to do so to efficiently stage and supply neural network weight parameters from memory to the compute unit for high-throughput inference, as suggested by Jouppi ([Jouppi, page 3] “The weights for the matrix unit are staged through an on-chip Weight FIFO that reads from an off-chip 8 GiB DRAM called Weight Memory”).It would have also been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the multiplier adder system disclosed by Appuswamy to include the bit conversion technique disclosed by Guo. One would be motivated to do so to effectively reduce parameter precision while maintaining acceptable model accuracy so that the neural network computations performed by the multiplier/adder-tree datapath consume less memory and compute resources, as suggested by Guo ([Guo, page 3] “convert all 32-bit floating-point weights to be either powers of two or zero without loss of model accuracy”). Claims 14 and 20 are analogous to claim 1, and thus would face the same rejection set forth above. 

Regarding claim 9, Appuswamy, Jouppi and Guo teaches, The hardware of claim 1, (see rejection of claim 1). 
Appuswamy further teaches:
wherein the plurality of multipliers forms an array of multipliers [Appuswamy, page 1, col. 2, lines 10-18] “inputs … are distributed to n multipliers… Each multiplier computes a product, with the products being added into a single sum by the adder tree.”, [Appuswamy, page 1, col. 2, lines 20-22] “In parallel multipliers followed by an adder tree”, wherein the examiner interprets “n multipliers” and “n parallel multipliers” to be the same as the plurality of multipliers forms an array of multipliers because both are directed to multiple multiplier units arranged together in a parallel configuration that collectively form an array structure for simultaneous computation).
and the at least one adder tree has an input operably connected to an output of the array of multipliers, ([Appuswamy, page 1, col. 2, lines 10-18]: “inputs … are distributed to n multipliers… Each multiplier computes a product, with the products being added into a single sum by the adder tree.”, wherein the examiner interprets “Each multiplier computes a product, with the products being added into a single sum by the adder tree” to be the same as the at least one adder tree has an input operably connected to an output of the array of multipliers because both are directed to the structural interconnection where the output products from the multiplier array serve as the inputs that feed into the adder tree for summation).
the adder tree being configured to sum result values of the elementwise multiplication ([Appuswamy, page 1, col. 2, lines 10-18] “inputs … are distributed to n multipliers… Each multiplier computes a product, with the products being added into a single sum by the adder tree.”, wherein the examiner interprets “the products being added into a single sum by the adder tree” to be the same as the adder tree being configured to sum result values of the elementwise multiplication because both are directed to the adder tree's functional operation of combining and summing the product results that are output from the elementwise multiplication operations performed by the multiplier array).
and wherein the feedback loop is configured to accumulate the summed result values output by the adder tree. ([Appuswamy, page 1, col. 2, lines 35-42] “The vector register holds the previously computed partial sum… This partial sum is fed back to the array of adders… [which] adds the new partial sum…”, [Appuswamy, page 1, col. 2, lines 50-53] “a partial sum register can store partial sum vectors… and m parallel adders to add the new … and previously computed ones…”, wherein the examiner interprets “The vector register holds the previously computed partial sum… This partial sum is fed back to the array of adders… [which] adds the new partial sum” and “a partial sum register can store partial sum vectors… and m parallel adders to add the new … and previously computed ones” to be the same as the feedback loop is configured to accumulate the summed result values output by the adder tree because both are directed to a feedback-loop architecture where the adder tree's summed output (the partial sum) is stored in a register and fed back through the feedback path to adders that combine the previously computed partial sum with new partial sums, thereby performing accumulation of the summed result values through the iterative feedback mechanism).
Jouppi further teaches: 
the array of multipliers being configured to perform elementwise multiplication using quantized operands including the at least one parameter having the third bit size, ([Jouppi, page 2, lines 5-10] “It contains 256x256 MACs that can perform 8-bit multiply-and-adds on signed or unsigned integers.”, [Jouppi, page 1, lines 62-67] “A step called quantization transforms floating-point numbers into narrow integers often just 8 bits which are usually good enough for inference. Eight-bit integer multiplies can be 6X less energy and 6X less area than IEEE 754 16-bit floating-point multiplies”, wherein the examiner interprets “256x256 MACs that can perform 8-bit multiply-and-adds on signed or unsigned integers” combined with “quantization transforms floating-point numbers into narrow integers often just 8 bits” to be the same as the array of multipliers being configured to perform elementwise multiplication using quantized operands including the at least one parameter having the third bit size because both are directed to multiplier arrays that operate on low-precision integer operands that have been quantized from higher precision representations, with the 8-bit quantized integers representing the reduced bit size parameters used in the multiplication operations).
using arithmetic having a bit width corresponding to a number of quantization bits of the quantized artificial neural network, ([Jouppi, page 2, lines 10-15] “The 16-bit products are collected in the 4 MiB of 32-bit Accumulators below the matrix unit.”, [Jouppi, page 2, lines 5-10] “It contains 256x256 MACs that can perform 8-bit multiply-and-adds on signed or unsigned integers.”, wherein the examiner interprets “8-bit multiply-and-adds” that produce “16-bit products” which are “collected in the 4 MiB of 32-bit Accumulators” to be the same as using arithmetic having a bit width corresponding to a number of quantization bits of the quantized artificial neural network because both are directed to arithmetic operations whose bit widths are scaled and determined based on the quantization precision of the network, where 8-bit quantized operands produce 16-bit intermediate products and 32-bit accumulated results, with each arithmetic stage having a bit width that corresponds to and is derived from the number of quantization bits used in the network).
Appuswamy, Jouppi, Guo, and the instant application are analogous art because they are all directed to hardware architectures for accelerating neural network computations.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the hardware claim 1 disclosed by Appuswamy, Jouppi, and Guo to include the multiply and add process disclosed by Jouppi. One would be motivated to do so to efficiently reduce energy consumption and hardware area while maintaining inference capability, as suggested by Jouppi ([Jouppi, page 1] “Eight-bit integer multiplies can be 6X less energy and 6X less area than IEEE 754 16-bit floating-point multiplies.”). Claim 19 is analogous to claim 9, aside from claim type, and thus it faces the same rejection. 


Claims 2-3, 11-12, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Appuswamy in view of Jouppi in view of Guo further in view of US-11593625-B2 by Kang et. al. (referred herein as Kang). 

Regarding claim 2, Appuswamy, Jouppi, and Guo teaches The hardware of claim 1, (see rejection of claim 1). 
Appuswamy, Jouppi, and Guo do not teach wherein in the quantized artificial neural network, the at least one parameter is sequentially quantized based on an amount of computation or an amount of memory.
Kang teaches wherein in the quantized artificial neural network, the at least one parameter is sequentially quantized based on an amount of computation or an amount of memory. ([Kang, col 8, lines 26-36] “The neural network quantization apparatus may perform quantization of floating-point parameters of the trained neural network 11 to fixed-point parameters of a predetermined number of bits, e.g., the quantization may also be selectively implemented based on a determined status and processing of resources based on the processing performance of the example device of the neural network inference apparatus 20 that is to implement the quantized neural network 21, and the neural network quantization apparatus may transmit the quantized neural network 21 to the neural network inference apparatus 20 that is to implement the quantized neural network 21.” wherein the examiner interprets “selectively implemented based on a determined status and processing of resources” to be the same as “sequentially quantized based on an amount of computation or an amount of memory,” as resources refers to memory and processing and both are directed to quantizing parameters. Also, “selectively implemented” and “sequentially quantized” to be the same as they are both quantizing and doing so sequentially is a form of selectively.)
Appuswamy, Jouppi, Guo, Kang, and the instant application are analogous art because they are all directed to optimally quantizing an artificial neural network while maintaining a target performance metric.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the hardware of claim 1 disclosed by Appuswamy, Jouppi, and Guo to include the technique to the “apparatus [that] may perform quantization of floating-point parameters …to fixed-point parameters” in a neural network based on available resources (memory, computation power) taught by Kang. One would be motivated to do so to effectively quantize while staying within the amount of memory and processing power available, as suggested by Kang (Kang, [Kang, col 8, lines 26-36] “predetermined number of bits, e.g., the quantization may also be selectively implanted based on a determined status and processing of resources based on the processing performance”).

Regarding claim 3, Appuswamy, Jouppi, and Guo teaches The hardware of claim 1, (see rejection of claim 1). 
	Appuswamy, Jouppi, and Guo do not teach wherein the processing unit is further configured to process the quantized artificial neural network by at least one of a computational cost bit quantization method, a forward bit quantization method, or a backward bit quantization method.
Kang teaches wherein the processing unit is further configured to process the quantized artificial neural network by at least one of a computational cost bit quantization method, a forward bit quantization method, or a backward bit quantization method. ([Kang, col 1, lines 49-60] “In a general aspect, a processor implemented method includes performing training or an inference operation with a neural network, by obtaining a parameter for the neural network in a floating-point format, applying a fractional length of a fixed-point format to the parameter in the floating-point format, performing an operation with an integer arithmetic logic unit (ALU) to determine whether to round off a fixed point based on a most significant bit among bit values to be discarded after a quantization process, and performing an operation of quantizing the parameter in the floating-point format to a parameter in the fixed-point format, based on a result of the operation with the ALU.”, and [Kang, col 8, lines 5-12] “As a result, in examples, in order to implement a neural network within an allowable accuracy loss while sufficiently reducing the number of operations in the above devices, the floating-point format parameters processed in the neural network may be quantized. The parameter quantization may signify a conversion of a floating-point format parameter having high precision to a fixed-point format parameter having lower precision.” wherein the examiner interprets the conversion of a floating-point format parameter having high precision to a fixed-point format parameter having lower precision via quantization to be the same as processing the quantized artificial neural network (NN) by at least one of a computational cost bit quantization method. Parameters (e.g. weights) are quantized then trained, and then assessed performance (i.e. inference is made) is done for the quantized NN. Note, “Forward bit quantization” refers to the process of converting full-precision (e.g., 32-bit floating-point) weights and activations into lower-precision, fixed-point representations for the forward (inference) pass. “backward bit quantization” deals with the quantization of gradients computed during the backward pass (training phase). Instead of performing backpropagation with full-precision gradients, the gradients are quantized to lower bitwidths; both are done in this case).
Appuswamy, Jouppi, Guo, Kang, and the instant application are analogous art because they are all directed to optimally quantizing an artificial neural network while maintaining a target performance metric.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the hardware of claim 1 disclosed by Appuswamy, Jouppi, and Guo to include “a processor  implemented method [that] includes performing training or an inference operation” on a quantized parameter values taught by Kang. One would be motivated to do so to effectively train and test (inference) quantize while staying within the amount of memory and processing power available, as suggested by Kang (Kang, [Kang, col 8, lines 26-36] “predetermined number of bits, e.g., the quantization may also be selectively implanted based on a determined status and processing of resources based on the processing performance”).

Regarding claim 11, Appuswamy, Jouppi, and Guo teaches The hardware of claim 1, (see rejection of claim 1). 
	Appuswamy, Jouppi, and Guo do not teach further comprising: an output activation map cache configured to store a result value of convolution of the processing unit.
Kang teaches further comprising: an output activation map cache configured to store a result value of convolution of the processing unit. ([Kang, col 10, lines 62-67] “The memory 220 is hardware configured to store various pieces of data processed in the neural network inference apparatus 20. For example, the memory 220 may store data that has been processed and data that is to be processed in the neural network inference apparatus 20. Furthermore, the memory 220 may store applications and drivers that are to be executed by the neural network inference apparatus 20.”and ([Kang, col 6, lines 50-52] “In addition, weighted connections may further include kernels for convolutional layers and/or recurrent connections for recurrent layers.”, wherein the examiner interprets the “memory is hardware configured to store various pieces of data processed in the neural network inference apparatus” and “kernels for convolutional layers” to be the same as “an output activation map cache configured to store a result” because both are regarding storage and parts of a neural network (including an activation map).
Appuswamy, Jouppi, Guo, Kang, and the instant application are analogous art because they are all directed to optimally quantizing an artificial neural network while maintaining a target performance metric.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the hardware of claim 1 disclosed by Appuswamy, Jouppi, and Guo to include the hardware configured to store various pieces of data processed by a NN taught by Kang. One would be motivated to do so to efficiently utilize kernels for convolutional and recurrent layers, as suggested by Kang (Kang, [Kang, col 6, lines 50-52] “In addition, weighted connections may further include kernels for  convolutional layers and/or recurrent connections for recurrent layers.”).
 
Regarding claim 12, Appuswamy, Jouppi, and Guo teaches The hardware of claim 1, (see rejection of claim 1). 
	Appuswamy, Jouppi, and Guo do not teach wherein the processing unit further includes a plurality of convolution processing units.
Kang teaches wherein the processing unit further includes a plurality of convolution processing units. ([Kang, col 6, lines 50-52] “In addition, weighted connections may further include kernels for convolutional layers and/or recurrent connections for recurrent layers.”, wherein the examiner interprets kernels for convolutional layers to be the same as a plurality of convolution processing units, as both are directed to performing convolution operations).
Appuswamy, Jouppi, Guo, Kang, and the instant application are analogous art because they are all directed to optimally quantizing an artificial neural network while maintaining a target performance metric.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the hardware of claim 1 disclosed by Appuswamy, Jouppi, and Guo to include the weighted connectivity taught by Kang. One would be motivated to do so to efficiently utilize kernels for convolutional and recurrent layers, as suggested by Kang (Kang, [Kang, col 6, lines 50-52] “In addition, weighted connections may further include kernels for convolutional layers and/or recurrent connections for recurrent layers.”).

Regarding claim 16, Appuswamy, Jouppi, and Guo teaches The method of claim 15, (see rejection of claim 15).  
	Appuswamy, Jouppi, and Guo do not teach further comprising: selecting at least one layer in which the final number of bits for the parameter is not determined among the plurality of layers, and repeatedly executing the bit quantizing to determine the final number of bits of the selected at least one layer in which the final number of bits is not determined
Kang teaches further comprising: selecting at least one layer in which the final number of bits for the parameter is not determined among the plurality of layers, and repeatedly executing the bit quantizing to determine the final number of bits of the selected at least one layer in which the final number of bits is not determined. ([Kang, col 2, lines 64-67, and col 3, lines 1-5] “The method may include converting the quantized parameter in the fixed-point format to the floating-point format based on processing conditions of a first layer of the neural network that receives the parameter in the floating-point the parameter in the floating-point format to the first layer, and performing the operation with the integer ALU to quantize the parameter in the floating-point format processed in the first layer back to a parameter in the fixed-point format.”, wherein the examiner interprets converting the quantized parameter from the fixed-point format to the floating-point format based on processing conditions of a first layer, and performing the operation with the integer ALU to quantize the parameter in the floating-point format processed in the first layer back to a parameter in the fixed-point format, to be the same as selecting at least one layer in which the final number of bits for the parameter is not determined and repeatedly executing the bit quantizing to determine the final number of bits, as both are directed to adjusting the quantization parameters for a particular layer).
Appuswamy, Jouppi, Guo, Kang, and the instant application are analogous art because they are all directed to optimally quantizing an artificial neural network while maintaining a target performance metric.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 15 disclosed by Appuswamy, Jouppi, and Guo to include the method for converting floating-point to fixed-point format and vice versa taught by Kang. One would be motivated to do so to effectively quantize the NN using an ALU, as suggested by Kang ([Kang, col 2, lines 64-67, and col 3, lines 1-5] “performing the operation with the integer ALU to quantize the parameter in the floating-point format processed in the first layer back to a parameter in the fixed-point format”).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Appuswamy in view of Jouppi in view of Guo in view of Kang further in view of NPL reference “DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING” by Dally et. al. (referred herein as Dally).
 
Regarding claim 4, Guo, Zhou, Appuswamy, and Kang teaches The hardware of claim 2, (see rejection of claim 2).  
	Guo, Zhou, Appuswamy, and Kang do not teach wherein the amount of computation and the amount of memory of the quantized artificial neural network are relatively reduced compared to those before quantization, and a number of bits of each data of the at least one parameter stored in the memory is reduced.
Dally teaches wherein the amount of computation and the amount of memory of the quantized artificial neural network are relatively reduced compared to those before quantization, and a number of bits of each data of the at least one parameter stored in the memory is reduced. ([Dally, p. 3, sec. 3] “Network quantization and weight sharing further compresses the pruned network by reducing the number of bits required to represent each weight. We limit the number of effective weights we need to store by having multiple connections share the same weight, and then fine-tune those shared weights…. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights” wherein the examiner interprets “Network quantization and weight sharing further compresses the pruned network by reducing the number of bits required” to be the same as “at least one parameter stored in the memory is reduced” reducing the number of bits required to represent each weight through quantization and weight sharing to be the same as reducing a number of bits of each data stored in memory and relatively reducing the computation and memory of the network).
Guo, Zhou, Appuswamy, Kang, Dally and the instant application are analogous art because they are all directed to optimally quantizing an artificial neural network while maintaining a target performance metric.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the hardware of claim 2 disclosed by Guo, Zhou, and Kang to include the process of fine-tuning the weights taught by Dally. One would be motivated to do so to efficiently use quantized weights via binning, as suggested by Dally (Dally [Dally, p. 3, sec. 3] “The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index”).

Claims 5 is rejected under 35 U.S.C. 103 as being unpatentable over Appuswamy in view of Jouppi in view of Guo further in view of US-11836603-B2, by Ha et. al. (referred herein as Ha). 

Regarding claim 5, Appuswamy, Jouppi, and Guo teaches The hardware of claim 1, (see rejection of claim 1). 
	Appuswamy, Jouppi, and Guo do not teach wherein the memory includes at least one of a buffer memory, a register memory, or a cache memory.
Ha teaches wherein the memory includes at least one of a buffer memory, a register memory, or a cache memory. ([Ha, col 9, lines 1-12] “The memory 120 may be DRAM, but the memory 120 is not limited thereto. The memory 120 may include at least one of a volatile memory or a nonvolatile memory. The nonvolatile memory includes read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable and programmable ROM (EEPROM), a flash memory, PRAM, magnetic RAM (MRAM), resistive RAM (RRAM), and ferroelectric RAM (FeRAM), or the like. The volatile memory includes DRAM, static RAM (SRAM), synchronous DRAM (SDRAM), PRAM, MRAM, RRAM, and ferroelectric RAM (FeRAM).”, wherein the examiner interprets the disclosed memory embodiments for storing data to be the same as the claimed memory limited to a buffer memory, a register memory, or a cache memory).
Appuswamy, Jouppi, Guo, Ha, and the instant application are analogous art because they are all directed to optimally quantizing an artificial neural network while maintaining a target performance metric.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the hardware of claim 1 disclosed by Appuswamy, Jouppi, and Guo to include “The memory may be DRAM, but the memory is not limited thereto.” disclosed by Ha. One would be motivated to do so to efficiently and flexibly store volatile and nonvolatile memory, as suggested by Ha (Ha, [Ha, col 9, lines 1-12] “The memory may include at least one of a volatile memory or a nonvolatile memory. The nonvolatile memory includes read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM) …”).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Appuswamy in view of Jouppi in view of Guo in view of Kang further in view of Ha. 

Regarding claim 13, Appuswamy, Jouppi, Guo, and Kang teaches The hardware of claim 12, (see rejection of claim 12).
	Appuswamy, Jouppi, Guo, and Kang do not teach wherein the processing unit further includes a tree adder configured to sum result values of convolution of each of the plurality of convolution processing units.
Ha teaches wherein the processing unit further includes a tree adder configured to sum result values of convolution of each of the plurality of convolution processing units. 
Ha teaches wherein the processing unit further includes a tree adder configured to sum result values of convolution of each of the plurality of convolution processing units. ([Ha, col 7, lines 20-67] “Any of the layer 2 and layer 3 may be convolution layers, recurrent, or fully connected layers. … Each of the channels may thus be processed by the nodes, as respective computational units or processing elements, which receives input and output the output activation” and [Ha, col 18, lines 55-63] “Examples of hardware components that may be used to perform the operations described in this application where appropriate include controllers, sensors, generators, drivers, memories, comparators, arithmetic logic units, adders, subtractors, multipliers, dividers, integrators, and any other electronic components configured to perform the operations described in this application.”, wherein the examiner interprets “adders” to be the same as “a tree adder configured to sum result values of convolution”, as both are directed to summing result values produced by convolution processing units that are interconnected, layer by layer, similar to a ladder structure).
Appuswamy, Jouppi, Guo, Kang, Ha, and the instant application are analogous art because they are all directed to optimally quantizing an artificial neural network while maintaining a target performance metric.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the hardware of claim 12 disclosed by Appuswamy, Jouppi, Guo, and Kang to include the fully connected convolution and recurrent layers as taught by Ha. One would be motivated to do so to efficiently arithmetic operations, as suggested by Ha ([Ha, p. 2, col 18, lines 55-63] sec. 1] “hardware components that may be used to perform the operations ... adders, subtractors, multipliers, …. and any other electronic components configured to perform the operations described in this application.”).

Claim(s) 15 and 17-18 is/are rejected under 35 U.S.C. § 103 as being unpatentable over Appuswamy in view of Jouppi in view of Guo and further in view of NPL reference, DOREFA-NET: TRAINING LOW BITWIDTH CONVOLUTIONAL NEURAL NETWORKS WITH LOW BITWIDTH GRADIENTS. by Zhou et. al. (referred herein as Zhou). 

Regarding claim 15, Appuswamy, Jouppi, and Guo teaches The method of claim 14, (see rejection of claim 1, claim 14 and claim 1 are analogous).  
	Appuswamy, Jouppi, and Guo do not teach further comprising: determining the third bit size of the data representation for the at least one parameter of the selected layer satisfies the accuracy greater than or equal to the target value as a final number of bits for the at least one parameter of the selected layer, responsive to the accuracy of the artificial neural network being less than the target value using a fourth bit size for that at least one parameter that is less than the third bit size.
Zhou teaches:
 further comprising: determining the third bit size of the data representation for the at least one parameter of the selected layer satisfies the accuracy greater than or equal to the target value as a final number of bits for the at least one parameter of the selected layer, responsive to the accuracy of the artificial neural network being less than the target value using a fourth bit size for that at least one parameter that is less than the third bit size. ([Zhou, p. 10, sec 4] “we conduct extensive experiments and show that our quantized CNN models with 5-bit, 4-bit, 3-bit and even 2-bit ternary weights have improved or at least comparable accuracy against their full-precision baselines, including AlexNet, VGG-16, GoogleNet and ResNets.” wherein the examiner interprets the demonstration that quantized CNN models maintain or improve accuracy using various low-bit quantizations to be the same as determining the final number of bits for a parameter based on achieving a target accuracy, as both are directed to selecting a bit precision that meets a required accuracy threshold).
Appuswamy, Jouppi, Guo, Zhou, and the instant application are analogous art because they are all directed to optimally quantizing an artificial neural network while maintaining a target performance metric.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the hardware of claim 1 by Appuswamy, Jouppi, and Guo to include the multi-level bit quantization taught by Zhou. One would be motivated to do so to effectively quantize the NN without sacrificing accuracy, as suggested by Zhou ([Zhou, p. 10, sec. 4] “5-bit, 4-bit, 3-bit and even 2-bit ternary weights have improved or at least comparable accuracy against their full-precision baselines”).

Regarding claim 17, Appuswamy, Jouppi, and Guo teaches The method of claim 14, (see rejection of claim 14).  
	Appuswamy, Jouppi, and Guo does not teach wherein the third bit size is 1 bit.
 	Zhou teaches wherein the third bit size is 1 bit. ([Zhou, p. 2] “We explore the configuration space of bitwidth for weights, activations and gradients for DoReFa-Net. E.g., training a network using 1-bit weights, 1-bit activations and 2-bit gradients can lead to 93% accuracy on SVHN dataset.” wherein the examiner interprets “training a network using 1-bit weights, 1-bit activations” to be the same as a [third] bit size being 1 bit,  as both are directed to a configuration employing a 1-bit quantization scheme).
Appuswamy, Jouppi, Guo, Zhou, and the instant application are analogous art because they are all directed to optimally quantizing an artificial neural network while maintaining a target performance metric.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 14 disclosed by Appuswamy, Jouppi, and Guo to include the configuration space of bit width for weights, activations, and gradients taught by Zhou. One would be motivated to do so to effectively quantize the NN using a low number of bits (1 or 2 bits) while maintaining high accuracy, as suggested by Zhou ([Zhou, p. 1] “training a network using 1-bit weights, 1-bit activations and 2-bit gradients can lead to 93% accuracy”).

Regarding claim 18, Appuswamy, Jouppi, and Guo teaches The method of claim 14, (see rejection of claim 1, claim 14 and claim 1 are analogous).  
	Appuswamy, Jouppi, and Guo do not teach wherein the at least one parameter of the selected at least one layer includes at least one of weight data, feature map data, or activation map data.
Zhou teaches wherein the at least one parameter of the selected at least one layer includes at least one of weight data, feature map data, or activation map data. ([Zhou, page 1-2, introduction and page 6, sec 2.7] “both weights and input activations of convolutional layers1 are binarized… We generalize the method of binarized neural networks to allow creating DoReFa-Net, a CNN that has arbitrary bitwidth in weights, activations, and gradients… For the first layer, the input is often an image, which may contain 8-bit features”, Wherein the examiner interprets “both weights and input activations of convolutional layers1 are binarized… weights, activations, and gradients” and “8-bit features” to be the same as a layer that includes weight, featured/activation map data.)
Appuswamy, Jouppi, Guo, Zhou, and the instant application are analogous art because they are all directed to layers that include weight, feature map, and activation map data. 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 14 disclosed by Appuswamy, Jouppi, and Guo to include iterative quantization process taught by Zhou. One would be motivated to do so to effectively quantize the NN while compensating for accuracy loss, as suggested by Zhou ([Zhou, page 1-2, introduction and page 6, sec 2.7] ““both weights and input activations of convolutional layers1 are binarized… We generalize the method of binarized neural networks to allow creating DoReFa-Net, a CNN that has arbitrary bitwidth in weights, activations, and gradients… For the first layer, the input is often an image, which may contain 8-bit features”.)


Claims 6-8 are rejected under 35 U.S.C. 103 as being unpatentable over Appuswamy in view of Jouppi in view of Guo further in view of Dally. 

Regarding claim 6, Appuswamy, Jouppi, and Guo teaches The hardware of claim 1, (see rejection of claim 1).  
	Appuswamy, Jouppi, and Guo do not teach wherein the amount of computation and the amount of memory of the quantized artificial neural network are relatively reduced compared to those before quantization, and a number of bits of each data of the at least one parameter or the at least one parameter group stored in the memory is reduced.
Dally teaches wherein the quantized artificial neural network includes a plurality of layers, wherein a size of a data bit of a data path through which data of a specific layer among the plurality of layers is transmitted is reduced in a unit of bits. ([Dally, p. 1] “Quantization then reduces the number of bits that represent each connection from 32 to 5”, wherein the examiner interprets the phrase “a size of a data bit of a data path through which data of a specific layer among the plurality of layers is transmitted is reduced in a unit of bits” to be the same as “reduces the number of bits that represent each connection” because each layer’s data (i.e. the weights transmitted along the data paths) in a multi-layer network is represented with fewer bits (reduced by a unit of bits) compared to a conventional 32‐bit implementation; this is done on a per connection basis. This reduction in bit-width per connection in a multi‐layer network).
Appuswamy, Jouppi, Guo, Dally, and the instant application are analogous art because they are all directed to optimally quantizing an artificial neural network while maintaining a target performance metric.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the hardware of claim 1 disclosed by Appuswamy, Jouppi, and Guo to include the process to reduce the number of bits taught by Dally. One would be motivated to do so to effectively reduce the number of bits significantly, as suggested by Dally (Dally [Dally, p. 1] “Quantization then reduces the number of bits that represent each connection from 32 to 5…”).

Regarding claim 7, Appuswamy, Jouppi, and Guo teaches The hardware of claim 1, (see rejection of claim 1).  
	Appuswamy, Jouppi, and Guo do not teach wherein the amount of computation and the amount of memory of the quantized artificial neural network are relatively reduced compared to those before quantization, and a number of bits of each data of the at least one parameter or the at least one parameter group stored in the memory is reduced.
Dally teaches wherein in the quantized artificial neural network, bit quantization is executed to reduce a storage size of the memory configured to store the at least one parameter. ([Dally, p. 12, Sec. 9] “quantizing the network using weight sharing, and then applying Huffman coding. We highlight our experiments on AlexNet which reduced the weight storage by 35× without loss of accuracy”, wherein the examiner interprets “quantizing the network” to be the same as “bit quantization is executed to reduce a storage size of the memory configured to store the at least one parameter”, as both are directed to reducing storage requirements for network parameters, including at least one parameter).
Appuswamy, Jouppi, Guo, Dally, and the instant application are analogous art because they are all directed to optimally quantizing an artificial neural network while maintaining a target performance metric.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the hardware of claim 1 disclosed by Appuswamy, Jouppi, and Guo to include the quantization process taught by Dally. One would be motivated to do so to effectively and robustly reduce the number of bits and hence storage significantly, as suggested by Dally (Dally [Dally, p. 12, sec. 9] “We highlight our experiments on AlexNet which reduced the weight storage by 35× without loss of accuracy”).

Regarding claim 8, Appuswamy, Jouppi, and Guo teaches The hardware of claim 1, (see rejection of claim 1).  
	Appuswamy, Jouppi, and Guo do not teach wherein the memory further includes at least one of a weight kernel cache or an input feature map cache.
Dally teaches wherein the memory further includes at least one of a weight kernel cache or an input feature map cache. ([Dally, p. 2, sec 1] “trained quantization are able to compress the network without interfering each other, thus lead to surprisingly high compression rate. It makes the required storage so small (a few megabytes) that all weights can be cached on chip instead of going to off-chip DRAM which is energy consuming.” wherein the examiner interprets “all weights can be cached on chip” to be the same as “at least one of a weight kernel cache or an input feature map cache,” as both terms are directed to enabling on-chip caching of weights).
Appuswamy, Jouppi, Guo, Dally, and the instant application are analogous art because they are all directed to optimally quantizing an artificial neural network while maintaining a target performance metric.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the hardware of claim 1 disclosed by Appuswamy, Jouppi, and Guo to include the non-interfering quantization process for high NN compression taught by Dally. One would be motivated to do so to effectively compress the NN such that it can be cached on chip, as suggested by Dally (Dally [Dally, p. 2, sec. 1] “..thus lead to surprisingly high compression rate. It makes the required storage so small (a few megabytes) that all weights can be cached on chip.”).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DEVAN KAPOOR whose telephone number is (703)756-1434. The examiner can normally be reached Monday - Friday: 9:00AM - 5:00 PM EST (times may vary).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/DEVAN KAPOOR/Examiner, Art Unit 2126                  
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Show 4 earlier events
May 14, 2025
Examiner Interview Summary
May 16, 2025
Response Filed
Jul 18, 2025
Final Rejection — §101, §103
Dec 23, 2025
Request for Continued Examination
Jan 16, 2026
Response after Non-Final Action
Feb 09, 2026
Non-Final Rejection — §101, §103
Apr 28, 2026
Applicant Interview (Telephonic)
Apr 28, 2026
Examiner Interview Summary
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
11%
Grant Probability
28%
With Interview (+16.7%)
4y 5m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 9 resolved cases by this examiner. Grant probability derived from career allowance rate.