DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is responsive to the application filed on 06/03/2022. Claims 1-20 are presented in the case. Claims 1, 8 and 14 are independent claims.
Priority
Applicant's claim for the benefit of a Chinese patent applications Serial No. 202110679295.X filed on June 18, 2021 is acknowledged.
Information Disclosure Statement
The information disclosure statements submitted on 06/03/2022, 02/14/2023, 11/03/2023, 08/20/2024 and 03/24/2025 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
The following title is suggested: NEURAL NETWORK PROCESSING UNIT, NEURAL NETWORK PROCESSING METHOD AND DEVICE FOR QUANTIZATION and CONVOLUTION OPERATION.
Claim Objections
Claims 4 and 10 are objected to because of the following informalities:
Claims 4 and 10 recite ‘the shot type input data’; however, it should recite ‘the short type input data’.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are:
“the quantizing unit is configured to obtain float type input data”, “the operation unit is configured to perform at least of a matrix-vector operation and a convolution operation” in claim 1;
“the main interface is configured to send a memory copy function to the DSP through the bus” in claim 3;
“the accumulator is configured to obtain a dot product result” in claims 6 and 12;
“an activating unit, configured to obtain an activation result” in claim 7 and 13;
“the DSP is configured to store float type input data”, “the PSRAM is configured to store network parameters”, “the quantizing unit is configured to obtain float type input data”, “the operation unit is configured to perform at least of a matrix-vector operation and a convolution operation” in claim 8.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may:
(1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or
(2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Since the claim limitation(s) invokes 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, claims 1, 3, 6-8 and 13 have been interpreted to cover the corresponding structure described in the specification that achieves the claimed function, and equivalents thereof.
A review of the specification shows that the following appears to be the corresponding structure described in the specification for the 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph limitations: a NPU in Figs. 1, 2 and 4-5, para [0077].
If applicant wishes to provide further explanation or dispute the examiner’s interpretation of the corresponding structure, applicant must identify the corresponding structure with reference to the specification by page and line number, and to the drawing, if any, by reference characters in response to this Office action.
If applicant does not intend to have the claim limitation(s) treated under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112 , sixth paragraph, applicant may amend the claim(s) so that it/they will clearly not invoke 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, or present a sufficient showing that the claim recites/recite sufficient structure, material, or acts for performing the claimed function to preclude application of 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance With 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: Claims 1-7 are directed to a method, claims 8-14 are directed to a device and claims 15-20 are directed to a medium. Therefore, the claims are eligible under Step 1 for being directed to a process, a machine and a manufacture respectively.
Independent claims 1, 8 and 14:
Step 2A Prong 1:
Claims recite:
quantize the float type input data to obtain quantized input data - Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating data and generating data based on judgement, which is observing, evaluating and judging that is practically capable of being performed in the human mind with the assistance of pen and paper.
provide the quantized input data to the operation unit to obtain an operation result - Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating data and generating data based on judgement, which is observing, evaluating and judging that is practically capable of being performed in the human mind with the assistance of pen and paper.
perform inverse quantization to the operation result output by the operation unit to obtain an inverse quantization result - Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating data and generating data based on judgement, which is observing, evaluating and judging that is practically capable of being performed in the human mind with the assistance of pen and paper.
the operation unit is configured to perform at least of a matrix-vector operation and a convolution operation to the quantized input data to obtain the operation result of the quantized input data - Under its broadest reasonable interpretation in light of the specification, this limitation encompasses a mathematical concept of a mathematical calculation of computing a matrix-vector operation and a convolution operation.
Step 2A Prong 2: This judicial exception is not integrated into a practical application because they recite the additional elements:
A neural network processing unit (NPU), comprising: a quantizing unit and an operation unit; A processing device, comprising: an NPU, a PSRAM and a DSP connected through a bus; A neural network processing method, applied to an NPU comprising a quantizing unit and an operation unit - These limitations amount to components of a general purpose computer that applies a judicial exception, by use of conventional computer functions (see MPEP § 2106.05(b)).
the quantizing unit is configured to obtain float type input data; obtaining by the quantizing unit float type input data - the steps recited at a high level of generality, and amounts to mere data gathering which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are thus directed to the abstract idea.
Step 2B: The claims do not include additional elements that amount to significantly more than the judicial exception.
The additional elements:
A neural network processing unit (NPU), comprising: a quantizing unit and an operation unit; A processing device, comprising: an NPU, a PSRAM and a DSP connected through a bus; A neural network processing method, applied to an NPU comprising a quantizing unit and an operation unit - These limitations amount to components of a general purpose computer that applies a judicial exception, by use of conventional computer functions (see MPEP § 2106.05(b)).
the quantizing unit is configured to obtain float type input data; obtaining by the quantizing unit float type input data - the steps recited at a high level of generality, and amounts to mere data gathering which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
Accordingly, these additional elements do not amount to significantly more than the judicial exception. As such, the claims are ineligible.
Dependent claims 2, 9 and 15:
Step 2A Prong 1:
Claims recite:
obtain a first parameter for quantization and a second parameter for inverse quantization, based on the float type input data - Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating data and generating data based on judgement, which is observing, evaluating and judging that is practically capable of being performed in the human mind with the assistance of pen and paper.
obtain a multiplied value by multiplying a float value to be quantized in the float type input data by the first parameter, and round the multiplied value into a numerical value to obtain numerical input data - Under its broadest reasonable interpretation in light of the specification, this limitation encompasses a mathematical concept of a mathematical calculation of computing a numerical input data.
convert the operation result obtained by the operation unit into a float type result - Under its broadest reasonable interpretation in light of the specification, this limitation encompasses a mathematical concept of a mathematical calculation of computing a float type result.
multiplying the float type result by the second parameter - Under its broadest reasonable interpretation in light of the specification, this limitation encompasses a mathematical concept of a mathematical calculation.
Step 2A Prong 2: This judicial exception is not integrated into a practical application because they recite the additional elements:
the float type input data stored in a memory of a digital signal processor (DSP) - the steps recited at a high level of generality, and amounts to mere data storing which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
send the numerical input data to the operation unit - the steps recited at a high level of generality, and amounts to mere data transmitting which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
send the inverse quantization result to the memory of the DSP for storage - the steps recited at a high level of generality, and amounts to mere data storing which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are thus directed to the abstract idea.
Step 2B: The claims do not include additional elements that amount to significantly more than the judicial exception.
The additional elements:
the float type input data stored in a memory of a digital signal processor (DSP) - which is a well-understood, routine, conventional activity similar to Storing and retrieving information in memory described in MPEP 2106.05(d)(II).
send the numerical input data to the operation unit - which is a well-understood, routine, conventional activity similar to Storing and retrieving information in memory described in MPEP 2106.05(d)(II).
send the inverse quantization result to the memory of the DSP for storage - which is a well-understood, routine, conventional activity similar to Storing and retrieving information in memory described in MPEP 2106.05(d)(II).
Accordingly, these additional elements do not amount to significantly more than the judicial exception. As such, the claims are ineligible.
Dependent claims 3 and 16:
Step 2A Prong 1: The claim recites the abstract ideas of claims 2 and 15.
Step 2A Prong 2: This judicial exception is not integrated into a practical application because they recite the additional elements:
wherein the NPU further comprises a main interface of a bus - These limitations amount to components of a general purpose computer that applies a judicial exception, by use of conventional computer functions (see MPEP § 2106.05(b)).
the main interface is configured to send a memory copy function to the DSP through the bus - the steps recited at a high level of generality, and amounts to mere data transmitting which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
so as to access the memory of the DSP and obtain the float type input data stored in the memory of the DSP - the steps recited at a high level of generality, and amounts to mere data gathering, which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are thus directed to the abstract idea.
Step 2B: The claims do not include additional elements that amount to significantly more than the judicial exception.
The additional elements:
wherein the NPU further comprises a main interface of a bus - These limitations amount to components of a general purpose computer that applies a judicial exception, by use of conventional computer functions (see MPEP § 2106.05(b)).
the main interface is configured to send a memory copy function to the DSP through the bus - the steps recited at a high level of generality, and amounts to mere data transmitting which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
so as to access the memory of the DSP and obtain the float type input data stored in the memory of the DSP - the steps recited at a high level of generality, and amounts to mere data gathering which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
Accordingly, these additional elements do not amount to significantly more than the judicial exception. As such, the claims are ineligible.
Dependent claims 4, 10 and 17:
Step 2A Prong 1:
Claims recite:
the quantizing unit is configured to: convert the float type input data into a short type input data - Under its broadest reasonable interpretation in light of the specification, this limitation encompasses a mathematical concept of a mathematical calculation of converting the float type input data into a short type input data.
the operation unit is configured to perform the convolution operation to the shot type input data - Under its broadest reasonable interpretation in light of the specification, this limitation encompasses a mathematical concept of a mathematical calculation of performing the convolution operation to the shot type input data.
Step 2A Prong 2 & Step 2B: There are no additional elements recited so the claims do not provide a practical application and is not considered to be significantly more. As such, the claims are ineligible.
Dependent claims 5, 11 and 18:
Step 2A Prong 1: The claim recites the abstract ideas of claims 4, 10 and 15.
Step 2A Prong 2: This judicial exception is not integrated into a practical application because they recite the additional elements:
the NPU is connected to a random access memory (RAM) through a high-speed access interface - These limitations amount to components of a general purpose computer that applies a judicial exception, by use of conventional computer functions (see MPEP § 2106.05(b)).
the RAM is configured to transfer the short type input data to the RAM - the steps recited at a high level of generality, and amounts to mere data transmitting which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are thus directed to the abstract idea.
Step 2B: The claims do not include additional elements that amount to significantly more than the judicial exception.
The additional elements:
the NPU is connected to a random access memory (RAM) through a high-speed access interface - These limitations amount to components of a general purpose computer that applies a judicial exception, by use of conventional computer functions (see MPEP § 2106.05(b)).
the RAM is configured to transfer the short type input data to the RAM - which is a well-understood, routine, conventional activity similar to receiving or transmitting data over a network described in MPEP 2106.05(d)(II).
Accordingly, these additional elements do not amount to significantly more than the judicial exception. As such, the claims are ineligible.
Dependent claims 6, 12 and 19:
Step 2A Prong 1:
Claims recite:
perform a dot product operation to the at least part of the network parameters read within each cycle and the corresponding input data in the first register - Under its broadest reasonable interpretation in light of the specification, this limitation encompasses a mathematical concept of a mathematical calculation of computing a dot product.
the accumulator is configured to obtain a dot product result and perform accumulation according to the dot product result so as to obtain the operation result of the convolution operation - Under its broadest reasonable interpretation in light of the specification, this limitation encompasses a mathematical concept of a mathematical calculation of computing accumulation.
Step 2A Prong 2: This judicial exception is not integrated into a practical application because they recite the additional elements:
wherein the operation unit comprises a first register, a second register and an accumulator - These limitations amount to components of a general purpose computer that applies a judicial exception, by use of conventional computer functions (see MPEP § 2106.05(b)).
the first register is configured to read the short type input data from the RAM within a first cycle - the steps recited at a high level of generality, and amounts to mere data gathering, which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
the second register is configured to read at least part of network parameters stored in a pseudo static random access memory (PSRAM) within a plurality of cycles after the first cycle - the steps recited at a high level of generality, and amounts to mere data gathering, which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
to send the operation result of the convolution operation to a memory of a DSP for storage - the steps recited at a high level of generality, and amounts to mere data storing which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are thus directed to the abstract idea.
Step 2B: The claims do not include additional elements that amount to significantly more than the judicial exception.
The additional elements:
wherein the operation unit comprises a first register, a second register and an accumulator - These limitations amount to components of a general purpose computer that applies a judicial exception, by use of conventional computer functions (see MPEP § 2106.05(b)).
the first register is configured to read the short type input data from the RAM within a first cycle - the steps recited at a high level of generality, and amounts to mere data gathering, which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
the second register is configured to read at least part of network parameters stored in a pseudo static random access memory (PSRAM) within a plurality of cycles after the first cycle - the steps recited at a high level of generality, and amounts to mere data gathering, which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
to send the operation result of the convolution operation to a memory of a DSP for storage - which is a well-understood, routine, conventional activity similar to Storing and retrieving information in memory described in MPEP 2106.05(d)(II).
Accordingly, these additional elements do not amount to significantly more than the judicial exception. As such, the claims are ineligible.
Dependent claims 7, 13 and 20:
Step 2A Prong 1:
Claims recite:
an activating unit, configured to obtain an activation result by performing activation using an activation function according to the operation result of the convolution operation stored in the DSP - Under its broadest reasonable interpretation in light of the specification, this limitation encompasses a mathematical concept of a mathematical calculation of computing an activation result.
Step 2A Prong 2: This judicial exception is not integrated into a practical application because they recite the additional elements:
provide the activation result to the DSP for storage - the steps recited at a high level of generality, and amounts to mere data storing which is well known which is a form of insignificant extra-solution activity (see MPEP § 2106.05(g)).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are thus directed to the abstract idea.
Step 2B: The claims do not include additional elements that amount to significantly more than the judicial exception.
The additional elements:
provide the activation result to the DSP for storage - which is a well-understood, routine, conventional activity similar to Storing and retrieving information in memory described in MPEP 2106.05(d)(II).
Accordingly, these additional elements do not amount to significantly more than the judicial exception. As such, the claims are ineligible.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Vantrease et al. (hereinafter Vantrease), US 20190294413 A1.
Regarding independent claim 1, Vantrease teaches a neural network processing unit (NPU) (Fig. 5, 502; [0073]), comprising: a quantizing unit and an operation unit (Fig. 5, 524, 536, 528, 538); wherein,
the quantizing unit is configured to obtain float type input data (Fig. 7B; [0090] inputs, such as weights and test samples, that are represented by floating point values may be received at the input of the convolution layer at block 710. In some embodiments, the minimum and maximum values may be pre-calculated. For example, the weights are generally known at load time and their ranges can be stored as constants together with the weights. In some cases, the inputs to the convolution layer (e.g., inputs for images are usually RGB values within the range of 0.0 to 255.0) and the outputs of many activation functions may have known ranges too, and thus it may not be necessary to analyze the inputs to a convolution layer to determine the range. In some embodiments, the minimum and maximum values of the inputs, for example, the inputs that are the outputs of the previous layer, may not be known and may be determined at blocks 742 and 744. For example, the sum of two 8-bit numbers may have 9 bits, the product of two 8-bit numbers may have 16 bits, the sum of a series of 8-bit multiplications in the matrix multiplication may have more than 16 bits, such as 20 to 32 bits); quantize the float type input data to obtain quantized input data ([0091] At block 750, the floating point inputs may be quantized to signed or unsigned integers, such as 8-bit signed integers (INT8), 8-bit unsigned integers (UINT8), 16-bit signed integers (INT16), or 16-bit unsigned integers (UINT16)); provide the quantized input data to the operation unit to obtain an operation result ([0091] At block 760, the convolution or other matrix multiplication may be performed using the quantized integer); and perform inverse quantization to the operation result output by the operation unit to obtain an inverse quantization result ([0091] At block 770, the integer outputs of the convolution may be de-quantized based on the minimum and maximum values so that the floating point outputs at block 730 may be used by a subsequent layer (e.g., an activation layer) that may require floating point operations); and
the operation unit is configured to perform at least of a matrix-vector operation and a convolution operation to the quantized input data to obtain the operation result of the quantized input data ([0091] At block 760, the convolution or other matrix multiplication may be performed using the quantized integers; [0048]-[0049] FIGS. 3A and 3B illustrate the convolution operations performed on an input pixel array 320 using a filter 310 by a convolution layer in a convolutional neural network).
Regarding dependent claim 2, Vantrease teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Vantrease further teaches wherein when the operation unit is configured to perform the matrix-vector operation, the quantizing unit is configured to:
obtain a first parameter for quantization and a second parameter for inverse quantization, based on the float type input data stored in a memory of a digital signal processor (DSP) ([0071] DMA controller 516 may be configured to perform DMA operations to transfer data between neural network processor 502 and the host device. For example, the host device may store the instructions, input data, the weights, and other parameters of the neural network at memory 512. The host device may provide the memory addresses for the stored instructions, data, weights, and other parameters of the neural network to neural network processor 502 (e.g., in the form of memory descriptors). Neural network processor 502 may then obtain the stored instructions, data, weights, or other parameters of the neural network based on the memory addresses provided by the host device; [0092] Examiner note: SX is a scaling factor, 1/SX equates first parameter and SX equates second parameter; [0102] memory 825 may also store the scaling factors; [0106] In various implementations, the above described convolution operation or matrix multiplication may be performed in hardware, software, or a combination of software and hardware. For example, in some implementations, the above described convolution operation or matrix multiplication may be implemented in software that may be executed by, for example, a parallel processing unit, a vector processor, a digital signal processor, a graphic processing unit, a tensor processing unit, a network processing unit, an FPGA, an ASIC, etc.);
obtain a multiplied value by multiplying a float value to be quantized in the float type input data by the first parameter, and round the multiplied value into a numerical value to obtain numerical input data ([0094] Equation (13) shows Xq is a multiplied value of a float value X and first parameter 1/SX; [0086] In some implementations, the quantization and data size reduction may be achieved by storing the minimum and maximum values of the network parameters for each layer, and then asymmetrically quantizing each float point value to a closest integer (e.g., a 8-bit integer) in a linear set of integers within a range (e.g., 0-255 for 8-bit unsigned integers));
send the numerical input data to the operation unit ([0091] At block 760, the convolution or other matrix multiplication may be performed using the quantized integers);
convert the operation result obtained by the operation unit into a float type result (At block 770, the integer outputs of the convolution may be de-quantized based on the minimum and maximum values so that the floating point outputs at block 730 may be used by a subsequent layer (e.g., an activation layer) that may require floating point operations); and
send the inverse quantization result obtained by multiplying the float type result by the second parameter to the memory of the DSP for storage ([0070] Memory 512 may also be used to store the output of neural network processor 502 (e.g., one or more image recognition decisions on the input images) or some intermediary data; [0094] Equation (13) shows X is the FP32 real value and second parameter SX).
Regarding dependent claim 3, Vantrease teaches all the limitations as set forth in the rejection of claim 2 that is incorporated. Vantrease further teaches wherein the NPU further comprises a main interface of a bus (Fig. 11, 1108; [0126] In one example, the computing device 1100 may include processing logic 1102, a bus interface module 1108, memory 1110, and a network interface module 1112; [0130] The bus interface module 1108 may enable communication with external entities, such as a host device and/or other components in a computing system, over an external communication medium. The bus interface module 1108 may include a physical interface for connecting to a cable, socket, port, or other connection to the external communication medium), the main interface is configured to send a memory copy function to the DSP through the bus, so as to access the memory of the DSP and obtain the float type input data stored in the memory of the DSP ([0071] DMA controller 516 may be configured to perform DMA operations to transfer data between neural network processor 502 and the host device. For example, the host device may store the instructions, input data, the weights, and other parameters of the neural network at memory 512. The host device may provide the memory addresses for the stored instructions, data, weights, and other parameters of the neural network to neural network processor 502 (e.g., in the form of memory descriptors). Neural network processor 502 may then obtain the stored instructions, data, weights, or other parameters of the neural network based on the memory addresses provided by the host device. Neural network processor 502 may also store the results of computations (e.g., one or more image recognition decisions or intermediary data) at memory 512, and provide the memory addresses for the stored results to the host device).
Regarding dependent claim 4, Vantrease teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Vantrease further teaches wherein when the operation unit is configured to perform the convolution operation ([0091] At block 760, the convolution or other matrix multiplication may be performed using the quantized integers; [0048]-[0049] FIGS. 3A and 3B illustrate the convolution operations performed on an input pixel array 320 using a filter 310 by a convolution layer in a convolutional neural network),
the quantizing unit is configured to: convert the float type input data into a short type input data ([0085] One way to reduce the data size of the parameters for a neural network is to quantize the parameters in floating point format to shorter integers, such as 8-bit (i.e., one-byte) integers); and
the operation unit is configured to perform the convolution operation to the shot type input data ([0018] According to some embodiments, floating point data used for the computation during the inference using a neural network may be asymmetrically quantized (e.g., to 8-bit numbers) and pre-processed (e.g., shifted) before the matrix multiplication (e.g., convolution)).
Regarding dependent claim 5, Vantrease teaches all the limitations as set forth in the rejection of claim 4 that is incorporated. Vantrease further teaches wherein the NPU is connected to a random access memory (RAM) through a high-speed access interface, and the RAM is configured to transfer the short type input data to the RAM (Fig. 5; 0074] State buffer 522 may be configured to provide caching of data used for computations at computing engine 524. The data cached at state buffer 522 may include, for example, the input data and weights obtained from memory 512, output data from computing engine 524, and/or output data from post-processor 528. The caching may reduce the effect of memory access bottleneck (e.g., caused by the latencies at memory 512, DMA controller 516, interconnect 518, etc.) on the performance of computing engine 524. State buffer 522 may be an on-chip memory device and may include, for example, static random access memory (SRAM). In some embodiments, state buffer 522 may be partitioned based on the organization of computing engine 524. For example, state buffer 522 may include multiple SRAM banks, where each bank may be configured to store input data and weights for a row of computing engine 524).
Regarding dependent claim 6, Vantrease teaches all the limitations as set forth in the rejection of claim 5 that is incorporated. Vantrease further teaches wherein the operation unit comprises a first register, a second register and an accumulator ([0078] The operations of each PE of computing engine 600 may be synchronized to a clock signal to improve the interoperability between computing engine 600 and other components of the neural network processor (e.g., neural network processor 502). Each PE may also include sequential logic circuitries (e.g., registers, latches, flip-flops, state machines, etc.) to store input data, weights, and output data for the adder and multiplier circuitry, and to synchronize the flow of the data into and out of the circuitry);
the first register is configured to read the short type input data from the RAM within a first cycle ([0078] The sequential logic circuitry of each PE can be clocked by either the same clock signal or a replica of the clock signal, such that data may be synchronously shifted into and/or out of the PE sequentially during the clock cycles. For example, in a first clock cycle, a PE 620 b of the second row may receive a first input data element of pixel group 612 as well as a partial sum comprising weighted first input data element of pixel group 610 from PE 620 a of the first row);
the second register is configured to read at least part of network parameters stored in a pseudo static random access memory (PSRAM) within a plurality of cycles after the first cycle, and perform a dot product operation to the at least part of the network parameters read within each cycle and the corresponding input data in the first register ([0078] Within the first clock cycle, a PE 620b may multiply the input data element with a weight, add the multiplication product to the partial sum to generate an updated partial sum, and store the updated partial sum in an internal register. In the second clock cycle, PE 620b may forward the updated partial sum to a PE 620c on the third row below, which may perform the multiplication and accumulation to generate an updated partial sum. In the third clock cycle, PE 620c may forward the updated partial sum to a next PE on the fourth row below, which may perform the multiplication and accumulation to generate an updated partial sum. The updated partial sum may be propagated down along each column until it is output by PE 620d on the Mth row at the Mth clock cycle to an output buffer 630 (also referred to as a PSUM buffer)); and
the accumulator is configured to obtain a dot product result and perform accumulation according to the dot product result so as to obtain the operation result of the convolution operation, and to send the operation result of the convolution operation to a memory of a DSP for storage ([0049] convolution output value 332 may be the sum of multiplication results between weights in filter 310 and corresponding pixels in region 322 according to Σi=0 nxiwi, that is, a dot-product between a matrix representing filter 310 and a matrix representing pixel values of region 322; [0078] The updated partial sum may be propagated down along each column until it is output by PE 620d on the Mth row at the Mth clock cycle to an output buffer 630 (also referred to as a PSUM buffer)).
Regarding dependent claim 7, Vantrease teaches all the limitations as set forth in the rejection of claim 6 that is incorporated. Vantrease further teaches wherein the NPU further comprises:
an activating unit, configured to obtain an activation result by performing activation using an activation function according to the operation result of the convolution operation stored in the DSP, and provide the activation result to the DSP for storage (Fig. 5; [0080] Activation engine 528a may perform one or more activation (non-linear) functions, such as tan h, sigmoid, ReLU, etc., on the outputs of a convolution layer to generate the output data, and store the output data at state buffer 522).
Regarding independent claim 8, claim 8 contains substantially similar limitations to those found in claim 1. Therefore, it is rejected for the same reason as claim 1 above. Vantrease further teaches a processing device, comprising: an NPU (Fig. 5, 502; [0073]), a PSRAM (Fig. 5, 512; [0070]) and a DSP connected through a bus (Fig. 5, 516; [0071]; Fig. 11, 1108; [0126]);
wherein the DSP is configured to store float type input data to be processed in an internal memory, and store operation results obtained by the NPU based on the input data ([0071] DMA controller 516 may be configured to perform DMA operations to transfer data between neural network processor 502 and the host device. For example, the host device may store the instructions, input data, the weights, and other parameters of the neural network at memory 512. The host device may provide the memory addresses for the stored instructions, data, weights, and other parameters of the neural network to neural network processor 502 (e.g., in the form of memory descriptors). Neural network processor 502 may then obtain the stored instructions, data, weights, or other parameters of the neural network based on the memory addresses provided by the host device. Neural network processor 502 may also store the results of computations (e.g., one or more image recognition decisions or intermediary data) at memory 512, and provide the memory addresses for the stored results to the host device);
the PSRAM is configured to store network parameters of a neural network ([0071] the host device may store the instructions, input data, the weights, and other parameters of the neural network at memory 512).
Regarding dependent claim 9, claim 9 contains substantially similar limitations to those found in claim 2. Therefore, it is rejected for the same reason as claim 2 above.
Regarding dependent claim 10, claim 10 contains substantially similar limitations to those found in claim 4. Therefore, it is rejected for the same reason as claim 4 above.
Regarding dependent claim 11, claim 11 contains substantially similar limitations to those found in claim 5. Therefore, it is rejected for the same reason as claim 5 above.
Regarding dependent claim 12, claim 12 contains substantially similar limitations to those found in claim 6. Therefore, it is rejected for the same reason as claim 6 above.
Regarding dependent claim 13, claim 13 contains substantially similar limitations to those found in claim 7. Therefore, it is rejected for the same reason as claim 7 above.
Regarding independent claim 14, claim 14 contains substantially similar limitations to those found in claim 1. Therefore, it is rejected for the same reason as claim 1 above.
Regarding dependent claim 15, claim 15 contains substantially similar limitations to those found in claim 2. Therefore, it is rejected for the same reason as claim 2 above.
Regarding dependent claim 16, claim 16 contains substantially similar limitations to those found in claim 3. Therefore, it is rejected for the same reason as claim 3 above.
Regarding dependent claim 17, claim 17 contains substantially similar limitations to those found in claim 4. Therefore, it is rejected for the same reason as claim 4 above.
Regarding dependent claim 18, claim 18 contains substantially similar limitations to those found in claim 5. Therefore, it is rejected for the same reason as claim 5 above.
Regarding dependent claim 19, claim 19 contains substantially similar limitations to those found in claim 6. Therefore, it is rejected for the same reason as claim 6 above.
Regarding dependent claim 20, claim 20 contains substantially similar limitations to those found in claim 7. Therefore, it is rejected for the same reason as claim 7 above.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action.
KANG et al. (US 20200242456 A1) discloses a neural network accelerator for improving the accuracy of calculation which decreases due to an error according to channel loop tiling and an operating method thereof.
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. In re Heck, 699 F.2d 1331, 1332-33, 216 U.S.P.Q. 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 U.S.P.Q. 275, 277 (C.C.P.A. 1968)).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMY P HOANG whose telephone number is (469)295-9134. The examiner can normally be reached M-TH 8:30-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JENNIFER WELCH can be reached at 571-272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AMY P HOANG/Examiner, Art Unit 2143
/JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143