DETAILED ACTION
1. This communication is in response to the request for continued examination filed on April 18, 2025 for Application No. 17/210,050 in which claims 1-20 are presented for examination.
Notice of Pre-AIA or AIA Status
2. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
3. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 04/18/2025 has been entered.
Response to Arguments
4. The amendments filed on April 18, 2025 have been considered. Claims 1-5, 8-12, and 15-19 have been amended. Thus, Claims 1-20 are pending and presented for examination.
5. Applicant’s arguments filed April 18, 2025 with respect to the 35 U.S.C. 112(a) rejection have been fully considered and are persuasive. The 35 U.S.C. 112(a) rejection has been withdrawn.
6. Applicant's arguments filed April 18, 2025 with respect to the 35 U.S.C. 112(b) rejections regarding the recitation of “sensitivity to precision”, “high level of accuracy”, and “tolerate imprecision” in Claims 2, 9, 16, and respective dependents and the recitation of “high weight” and “low weight” in Claims 5, 12, 19, and respective dependents have been fully considered and are persuasive. Thus, the 35 U.S.C. 112(b) rejection with respect to these limitations has been withdrawn. However, Applicant’s arguments regarding the recitation of “efficiency”/ “processing power efficiency” are not persuasive.
Applicant’s Arguments on Pgs. 1-2 of Arguments/Remarks state:
“Furthermore, the claims as amended recite "processing power efficiency" rather than merely "efficiency." The term "processing power efficiency" would be readily understood by one of ordinary skill in the art based on the specification and usage in the relevant technical field. Additionally, a person of ordinary skill in the art would be able to identify a variety of techniques that may be utilized to determine a processing power efficiency based on the present application.
For example, as described in the specification, techniques that may be implemented include comparing the processing power utilized to perform auxiliary operations such as "power consumed in peripheral circuitries like ADC and DAC" or "power consumed to transfer network parameters" with the processing power utilized to perform "useful tasks like performing the actual computation." See Specification at least at para. [[0056]] and [[0058]].”
Examiner respectfully disagrees. The limitation “processing power efficiency” is still indefinite, as there is no requisite/degree to calculate such a “processing power efficiency” and, moreover, to compare such an efficiency between accelerators. This limitation is indefinite as it is not clear, nor defined by the claim, what comprises a “processing power efficiency” and how exactly such an efficiency may be calculated. Examiner asserts that a “processing power efficiency” may refer to any metric that relates to the ability of a neural network to achieve optimal results while utilizing minimal computational resources – including metrics such as accuracy, computational efficiency, usage of computational resources, etc. Furthermore, even when evaluating the term “processing power” on its own, it is still not clear, as the calculation of such a “processing power” of an accelerator may refer to many different metrics which influence such a “processing power” (i.e., throughput, latency, operations per second, numerical precision, etc.). Although Applicant mentions above that techniques are described in the specification regarding comparing processing power utilized to perform different operations, there is still no disclosure of how the “processing power efficiency” is calculated. Instead, the specification simply recites that power is consumed to perform operations such as “transferring network parameters” or “performing computations” but this simply reiterates that power is consumed by the accelerators during neural network execution and does not explain or expand upon how this consumption of power is used to calculate efficiency for both accelerators. This renders the claims indefinite.
Thus, the 35 U.S.C. 112(b) rejection is maintained.
7. Applicant’s arguments filed April 18, 2025 with respect to the 35 U.S.C. 103 rejections have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 112
8. The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
9. Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
10. The term “processing power efficiency” in Claims 1, 8, 15, and respective dependents is a relative term which renders the claim indefinite. The term “processing power efficiency” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The claims recite “calculating a processing power efficiency of executing a plurality of layers […]” and “[…] being more efficient than the processing power efficiency […]”, however, it is not clear nor defined by the claim how this “processing power efficiency” is calculated. The clam does not recite any standard or threshold for which an accelerator may be “more efficient” in executing a plurality of layers than another accelerator. Furthermore, while amended dependent claim 2 recites that “calculating the processing power efficiency […] comprises determining a computational accuracy level of each layer of the plurality of layers […]” and amended dependent claim 3 recites “wherein the processing power efficiency is calculated for the digital accelerator based on an amount of processing power used to transfer the input data to the digital accelerator and to execute the computations using the digital accelerator […]” there is still no threshold/requisite for how the actual “processing power efficiency” is calculated. Amended dependent claim 2 merely states that the calculation of the processing power efficiency simply “comprises determining a computational accuracy level […]” without significantly more – this does not further explain how the efficiency is calculated and instead simply mentions that an accuracy is determined without explaining how said accuracy is used to calculate the efficiency itself (the claim simply equates efficiency to accuracy with no further explanation). Amended dependent claim 3 merely states that the processing power efficiency is calculated “based on” an amount of processing power but again there is no further explanation of how the efficiency is calculated merely “based on” an arbitrary “processing power” where the processing power is again undefined in context of the claim language and it is unclear how the determination of “processing power” ultimately equates to the “efficiency” metric calculated for a “processing power efficiency” (i.e., is processing power equated to a metric such as latency? Or is processing power equated to a metric such as throughput? etc.). This renders the claims indefinite. For the purpose of examination, Examiner uses the broadest reasonable interpretation of the limitations “calculating a processing power efficiency […]” and “comparing the processing power efficiency […]”, such that a “processing power efficiency” may refer to any metric that relates to the ability of a neural network to achieve optimal results while utilizing minimal computational resources – including metrics such as accuracy, computational efficiency, usage of computational resources, etc.
Claim Rejections - 35 USC § 103
11. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
12. Claims 1, 3-8, 10-15, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Manipatruni et al. (hereinafter Mani) (US PG-PUB 20200242459), in view of Knag et al. (hereinafter Knag) (US PG-PUB 20200097807).
Regarding Claim 1, Mani teaches a computer-implemented method for accelerating computations in applications, at least a portion of the method being performed by a computing device comprising one or more processors (Mani, Par. [0037], “As illustrated in FIG. 9, in an embodiment, method 900 for analog in-memory neural network processing commences at operation 910, by decoding and executing one or more instructions from an AI instruction set. In some embodiments, the decoding and executing is performed, at least in part, by the NPU 115.”, therefore, a method for accelerating computations being performed by a computing device (NPU, CPU, analog in-memory processor, etc.) is disclosed), the computer-implemented method comprising:
evaluating input data for computations to identify first data and second data of the input data (Mani, Par. [0051], “The analog in-memory AI processor is configured to perform analog in-memory computations based on the execution of an AI instruction set, operating on neural network weighting factors and input data provided by the CPU, as described previously.”, thus, input data is evaluated for computations), comprising:
calculating a processing power efficiency (Mani teaches the use of an “efficiency” but not explicitly calculating a processing power efficiency – See introduction of Knag reference below for teaching of calculating a processing power efficiency) of executing a plurality of layers of the computations in a digital accelerator (Mani, Figure 1, labels 110 and 115, which depict a central processing unit (CPU) and neural processing unit (NPU) respectively. Further, See Mani Claim 1 which describes how the NPU executes instructions from an AI instruction set and the analog AI processor performs computations based on a neural network layer) and in an in-memory computing accelerator (Mani, Figure 1, labels 130 and 140, which depict a memory (MRAM) and analog in-memory AI processor respectively);
comparing the processing power efficiency of executing each layer of the plurality of layers of the computations in the digital accelerator to the processing power efficiency of executing a corresponding layer of the plurality of layers of the computations in the in- memory computing accelerator (See introduction of Knag reference below for teaching of comparing the efficiency across accelerators);
identifying a portion of the input data as the first data (Mani, Par. [0021], “Digital access circuits 210 are also configured to receive input data associated with the NN layer. The input data can be input 125 from the CPU, or a subset of that input data associated with the NN layer. In some embodiments, the input data can be output from another (e.g., a previous) NN layer.”, therefore, a portion of the input data may be identified as first data to be executed by digital circuits/accelerators) based on the processing power efficiency of executing corresponding layers of the plurality of layers of the computations in the digital accelerator being more efficient than the processing power efficiency of executing the corresponding layers of the plurality of layers of the computations in the in-memory computing accelerator (Mani, Abstract, “The analog processing circuitry is configured to perform analog calculations on the stored weighting factors and the stored input data in accordance with the execution, by the NPU, of instruction from the AI instruction set. The AI instruction set includes instructions to perform dot products, multiplication, differencing, normalization, pooling, thresholding, transposition, and backpropagation training.”, therefore, the neural processing unit (NPU) is a digital accelerator which is deemed more efficient at executing operations such as dot products, multiplication, differencing, normalization, etc. than the analog in-memory accelerator which is configured only for performing analog calculations. Mani Figure 2 also depicts how this process may work regarding execution of a particular layer); and
identifying a portion of the input data as the second data (Mani, Par. [0014], “The analog in-memory computations operate on NN weighting factors and input data provided by the CPU. “, therefore, a portion of the input data may be identified as second data to be executed by the analog in-memory accelerator) based on the processing power efficiency of executing other corresponding layers of the plurality of layers of the computations in the in-memory computing accelerator being more efficient than the processing power efficiency of executing the other corresponding layers of the plurality of layers of the computations in the digital accelerator (Mani, Par. [0014], “The analog in-memory computations are performed in a parallel manner, as analog voltage values are read from the cells of memory circuits of the analog in-memory AI processor. That is to say, the arithmetic processing occurs in the memory circuits as a part of the data fetch. In some embodiments, 512 to 1024 calculations may be performed in parallel for each memory circuit.”, thus, the analog in-memory ai processor is an in-memory computing accelerator which is deemed more efficient at executing operations performed in a parallel manner, than the digital accelerator which is configured to operate digital circuitry for computations. Mani Figure 2 also depicts how this process may work regarding execution of a particular layer);
processing the first data using at least one digital accelerator as part of the computations (Mani, Claim 1, “a neural processing unit (NPU), integrated with the CPU, the NPU to execute instructions from an AI instruction set;”, therefore, the neural processing unit is configured to process the first data (input data, subset of input data for that layer, output from a previous layer) and execute instructions from an AI instruction set (for performing dot products, multiplication, differencing, normalization, pooling, thresholding, transposition, and backpropagation training) as part of the computations); and
processing the second data using at least one in-memory computing accelerator as part of the computations (Mani, Claim 1, “an AI processor coupled to the CPU, the AI processor to perform analog in-memory computations based on (1) neural network (NN) weighting factors provided by the CPU, (2) input data provided by the CPU, and (3) the AI instruction set executed by the NPU.”, therefore, the analog in-memory accelerator is configured to process the second data (weighting factors, input data, subset of input data) to perform analog in-memory computations).
While Mani teaches the consideration of a computational efficiency (See Par. [0014]), Mani does not explicitly disclose:
calculating a processing power efficiency of executing a plurality of layers of the computations in a digital accelerator and in an in-memory computing accelerator;
comparing the processing power efficiency of executing each layer of the plurality of layers of the computations in the digital accelerator to the processing power efficiency of executing a corresponding layer of the plurality of layers of the computations in the in- memory computing accelerator;
However, Knag teaches:
calculating a processing power efficiency of executing a plurality of layers of the computations in a digital accelerator and in an in-memory computing accelerator (Knag, Par. [0029], “In an embodiment, a compute near memory binary neural network accelerator with digital circuits achieves energy efficiencies comparable to or surpassing a compute near memory binary neural network accelerator with analog circuits. The compute near memory binary neural network accelerator with digital circuits is also more process scalable, robust to process, voltage, temperature (PVT) variations, and immune to circuit noise. The compute near memory binary neural network accelerator with digital circuits utilizes multiple circuit design techniques for energy efficient operation such as compute near memory design principles, multiple voltage domains with clock skew tolerant pipelining, Near Threshold Voltage operation, lightweight pipelining for maximum Near Threshold Voltage energy efficiency, and energy balanced memory/interconnect/compute utilizing highly parallel execution units.”, thus, a processing power efficiency of executing a plurality of layers (See Knag Par. [0039] for recitation of layer-by-layer execution) of computations in both a digital and analog accelerator is calculated. Knag also utilizes multiple different metrics (near threshold voltage, tera operations per second) that factor in to the “efficiency” metric that is calculated – See 35 U.S.C. 112(b) rejection above regarding interpretation of “calculating a processing power efficiency”);
comparing the processing power efficiency of executing each layer of the plurality of layers of the computations in the digital accelerator to the processing power efficiency of executing a corresponding layer of the plurality of layers of the computations in the in- memory computing accelerator (Knag, Par. [0090], “The fully digital Compute Near Memory Binary Neural Network accelerator achieves energy efficiencies comparable to or surpassing analog based Binary Neural Network accelerators. The fully digital design is more process scalable, robust to PVT (process, voltage, temperature) variations, not vulnerable to circuit noise, and requires lower design effort. The Compute Near Memory Binary Neural Network accelerator works over a wide range of voltage operation points for energy and performance tradeoff. The fully digital Compute Near Memory Binary Neural Network accelerator also achieves high area efficiency (TOPS/mm2) largely because of the high area cost of passive analog components like capacitors and resistors as well as the requirement for larger devices to improve device matching.”, therefore, the processing power efficiency of executing computations by each of the digital and analog accelerators are compared – Knag states that the fully digital accelerator is comparable to or surpasses analog accelerators & more specifically relates this comparison to the tera operations per second as related to area efficiency. Further supporting information regarding the area efficient design and recitation of layer-by-layer execution are disclosed in Knag Par. [0039]);
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method for accelerating computations in applications, as disclosed by Mani to include calculating a processing power efficiency of executing a plurality of layers of the computations in a digital accelerator and in an in-memory computing accelerator and comparing the processing power efficiency of executing each layer of the plurality of layers of the computations in the digital accelerator to the processing power efficiency of executing a corresponding layer of the plurality of layers of the computations in the in- memory computing accelerator, as disclosed by Knag. One of ordinary skill in the art would have been motivated to make this modification to enable more efficient processing by both the analog and digital accelerators for operations with varying computational complexity (Knag, Par. [0028-0029], “Also, analog circuits do not allow for arbitrarily large dot product sizes because precision limitations caused by circuit noise and device mismatch. A dot product is a scalar value that is the result of an operation of two equal length sequences of numbers called vectors. In an embodiment, a compute near memory binary neural network accelerator with digital circuits achieves energy efficiencies comparable to or surpassing a compute near memory binary neural network accelerator with analog circuits.”).
Regarding Claim 3, Mani in view of Knag teaches the computer-implemented method of claim 1, wherein the input data includes network parameters and activations of a neural network and the computations relate to specific layers of the neural network (Mani, Figure 1, labels 120 and 125, which depict that the inputs include both input data and network parameters such as weights. Further, the computations relate to specific layers of the neural network, as the AI instruction set may be executed by the NPU to perform operations such as dot products, multiplication, normalization, pooling, backpropagation, etc. The inference pseudocode following Par. [0034] also describes the use of an activation function/corresponding activations), and
wherein the processing power efficiency is calculated for the digital accelerator based on an amount of processing power used to transfer the input data to the digital accelerator and to execute the computations using the digital accelerator and the processing power efficiency is calculated for the in-memory computing accelerator based on an amount of processing power used to transfer the input data to the in-memory computing accelerator and to execute the computations using the in-memory computing accelerator (Knag, Par. [0027], “A Binary Neural Network accelerator can use analog or digital circuits for computation. The analog circuit can include switch capacitors to perform analog addition to accumulate the results of binary multiplication which are passed through a comparator to digitize the analog value to a one-bit digital value.”, thus, the calculation of processing power efficiency for both the digital and analog circuits is based on an amount of processing power used to transfer data (See Par. [0005-0006] for recitation of data transfer) and to execute various computations (multiply and accumulate)).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.
Regarding Claim 4, Mani in view of Knag teaches the computer-implemented method of claim 3, wherein:
calculating the processing power efficiency of executing the plurality of layers of the computations in the digital accelerator and in the in-memory computing accelerator comprises calculating a number of network parameters in each layer of the neural network (Knag, Par. [0039], “Area efficient design is also critical for energy efficiency. Reducing area reduces the distance of data movement between the sets of latches in latch-based Compute Near Memory array 204 and the wide vector inner product execution units 202. A wide vector inner product execution unit 202 has a higher compute density (Tera Operations Per Second (TOPS)/mm2) compared to an outer product execution unit. Vector latches are used in place of standard latches to further increase area efficiency of local weight memory storage. In addition, the memory footprint is reduced by reusing a local activation memory to store both intermediate inputs and output activations of a single layer. After a layer of operations is completed, previous input activations can be freed and reused to store the incoming output activations of a new layer. This recycling of the activation memory results in a reduction in activation memory proportional to the number of layers in the binary neural network to significantly reduce the size and cost of activation of memory access.”, thus, the efficiency is calculated by considering the number of parameters (inputs, weights, activations, etc.) in each layer),
the portion of the input data corresponding to the layers of the neural network having a larger number of network parameters is identified as the second data (Mani, Par. [0019], “The analog in-memory AI processor 140 is configured to receive weighting factors 120 and input data 125 (e.g., an image) from the CPU 110, for storage in the memory 130, through digital access circuits, and to perform analog neural network processing based on those weights and data.”, therefore, the portion of input data corresponding to a larger number of network parameters (i.e., weighting factors, inputs, etc.) is identified as second data processed by the analog accelerator. This is better illustrated by Figure 1 which depicts the analog accelerator accepting these values), and
the portion of the input data corresponding to the layers of the neural network having a smaller number of network parameters is identified as the first data (Mani, Par. [0019], “The results of the NN processing (e.g., an image classification or recognition) are provided back to the CPU 110 as outputs 150, also through digital access circuits.”, thus, the portion of input data corresponding to a smaller number of network parameters (i.e., outputs that are used as corresponding inputs for subsequent layer processing – see Mani Par. [0021] for further support on the iterative processing executed by the digital accelerators) is identified as first data processed by the digital accelerator. This is better illustrated by Figure 1 which depicts the digital accelerator accepting these values).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.
Regarding Claim 5, Mani in view of Knag teaches the computer-implemented method of claim 3, wherein:
calculating the processing power efficiency of executing the plurality of layers of the computations in the digital accelerator and in the in-memory computing accelerator comprises calculating a number of times that network parameters are reused in each layer of the neural network (Knag, Par. [0044], “Data reuse and low data movement are critical for an energy efficient Binary Neural Network accelerator. High parallelism is required to create a system that balances the energy of memory access with computation. Data reuse is a way of describing how a memory access is shared across multiple operations to create a balanced system. When using Binary Neural Networks, the cost of memory access relative to computation is significantly increased because of the lower cost of computation.”, thus, parameter reuse is considered in the calculation of efficiency for both the analog and digital accelerators), and
each layer in the portion of the input that is identified as the first data has a higher weight of network parameter reuse than each layer in the portion of the input data that is identified as the second data (Knag, Par. [0035], “The Binary Neural Network accelerator 102 is implemented as a two level hierarchy with a static Random Access Memory 110 and controller 118 surrounded by an interleaved memory compute 112 that includes a latch-based Compute Near Memory array 204 interleaved with wide vector inner product execution units 202. The latch-based Compute Near Memory array 204 can also be referred to as a near memory latch array. A set of weights is stored in vector latches 208, 210 located in the inner product execution units 202. The set of weights can be reused many times over the course of the convolution operation. Storing two sets of weights in vector latches 208, 210 allows for 2 times more input data reuse compared to a single set of weights, and also reduces the input energy/switch-activity by a factor of 2.”, thus, the portion of input data with a higher weight of parameter reuse may be identified as first data to be processed by the digital accelerator).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.
Regarding Claim 6, Mani in view of Knag teaches the computer-implemented method of claim 3, wherein the at least one digital accelerator and the at least one in-memory computing accelerator are configured to implement a same layer of the neural network (Mani, Par. [0021], “Digital access circuits 210 are configured to receive, from the CPU, weighting factors 120, or a subset of those weights associated with the NN layer. Digital access circuits 210 are also configured to receive input data associated with the NN layer. The input data can be input 125 from the CPU, or a subset of that input data associated with the NN layer. In some embodiments, the input data can be output from another (e.g., a previous) NN layer. Digital access circuits 310 are also configured to provide outputs 250 back to the CPU or to another (e.g., a next) NN layer.”, therefore, data is sent for execution layer-by-layer – hence, both accelerators are configured to implement a same layer of the neural network simultaneously).
Regarding Claim 7, Mani in view of Knag teaches the computer-implemented method of claim 1, wherein:
the at least one digital accelerator includes a first digital accelerator located on a first hybrid chip and a second digital accelerator located on a second hybrid chip, the at least one in-memory computing accelerator includes a first in-memory computing accelerator located on the first hybrid chip and a second in-memory computing accelerator located on the second hybrid chip (Mani, Par. [0015], “The disclosed techniques can be implemented, for example, in integrated circuitry on a common substrate, or a chip set. In one such example case, the techniques are implemented in the memory of a computing system or device such as an integrated circuit processor (e.g., on-chip memory or cache), although other embodiments will be apparent. The memory is configured to perform analog in-memory computations. In accordance with an embodiment, a hybrid AI processing system implementing the techniques includes a central processing unit (CPU) configured to execute instructions from a general-purpose instruction set and a neural processing unit (NPU), which may be integrated with the CPU, and is configured to execute instructions from an AI instruction set.”, therefore, both the digital accelerator and in-memory computing accelerator are implemented on a hybrid chip. Further, the digital accelerator may consist of both the CPU and NPU (shown by Figure 1) and the in-memory accelerator may consist of the memory (MRAM) and analog in-memory processor), and
the first and second hybrid chips are connected together by a shared bus or through a daisy chain connection (Mani, Par. [0044], “In some embodiments, platform 1000 may comprise any combination of a processor 110 including NPU 115, a CPU memory 1030, an analog in-memory AI processor 140 and associated memory 130 configured to perform analog in-memory neural network calculations, a network interface 1040, an input/output (I/O) system 1050, a user interface 1060, an imaging sensor 1090, and a storage system 1070. As can be further seen, a bus and/or interconnect 1092 is also provided to allow for communication between the various components listed above and/or other components not shown.”, thus, the chips may be connected by a shared bus).
Regarding Claim 8, Mani in view of Knag teaches one or more non-transitory computer-readable media comprising one or more computer-readable instructions that, when executed by one or more processors of a server device, cause the server device to perform a method for accelerating computations in applications (Mani, Par. [0036], “In still other embodiments, the methodology depicted can be implemented as a computer program product including one or more non-transitory machine-readable mediums that when executed by one or more processors cause the methodology to be carried out.”, therefore, one or more non-transitory computer-readable media comprising computer-readable instructions to be executed by one or more processors is disclosed. Further, Par. [0043] mentions that the platform to perform the processing may include a server system), the method comprising: […]
The rest of the claim language in Claim 8 recites substantially the same limitations as Claim 1, in the form of a non-transitory computer-readable media, therefore it is rejected under the same rationale.
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.
Claim 10 recites substantially the same limitations as Claim 3 in the form of a non-transitory computer-readable media, therefore it is rejected under the same rationale.
Claim 11 recites substantially the same limitations as Claim 4 in the form of a non-transitory computer-readable media, therefore it is rejected under the same rationale.
Claim 12 recites substantially the same limitations as Claim 5 in the form of a non-transitory computer-readable media, therefore it is rejected under the same rationale.
Claim 13 recites substantially the same limitations as Claim 6 in the form of a non-transitory computer-readable media, therefore it is rejected under the same rationale.
Claim 14 recites substantially the same limitations as Claim 7 in the form of a non-transitory computer-readable media, therefore it is rejected under the same rationale.
Regarding Claim 15, Mani in view of Knag teaches a system for accelerating computations in applications (Mani, Figure 10, label 1000, which depicts a device platform/system), the system comprising: a memory storing programmed instructions (Mani, Figure 10, label 130, which depicts a memory); at least one digital accelerator (Mani, Figure 10, label 115, which depicts a neural processing unit (digital accelerator); at least one in-memory computing accelerator (Mani, Figure 10, label 140, which depicts an analog in-memory ai processor); and a processor configured to execute the programmed instructions (Mani, Figure 10, label 110, which depicts a processor configured to execute programmed instructions) to: […]
The rest of the claim language in Claim 15, recites substantially the same limitations as Claim 1, in the form of a system, therefore it is rejected under the same rationale.
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.
Claim 17 recites substantially the same limitations as Claim 3 in the form of a system, therefore it is rejected under the same rationale.
Claim 18 recites substantially the same limitations as Claim 4 in the form of a system, therefore it is rejected under the same rationale.
Claim 19 recites substantially the same limitations as Claim 5 in the form of a system, therefore it is rejected under the same rationale.
Claim 20 recites substantially the same limitations as Claim 7 in the form of a system, therefore it is rejected under the same rationale.
13. Claims 2, 9, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Manipatruni et al. (hereinafter Mani) (US PG-PUB 20200242459), in view of Knag et al. (hereinafter Knag) (US PG-PUB 20200097807), further in view of Kenney et al. (hereinafter Kenney) (US PG-PUB 20200272795).
Regarding Claim 2, Mani in view of Knag teaches the computer-implemented method of claim 1.
Mani in view of Knag does not explicitly disclose:
calculating the processing power efficiency of executing the plurality of layers of the computations in the digital accelerator and in the in-memory computing accelerator comprises determining a computational accuracy level of each layer of the plurality of layers, a computational accuracy of the digital accelerator, and a computational accuracy of the in-memory computing accelerator,
the portion of the input data that corresponds to the layers of the plurality of layers having a computational accuracy level greater than the computational accuracy of the in-memory computing accelerator is identified as the first data, and the portion of the input data that corresponds to the layers of the plurality of layers having a computational accuracy level less than the computational accuracy of the in-memory computing accelerator is identified as the second data.
However, Kenney teaches:
calculating the processing power efficiency of executing the plurality of layers of the computations in the digital accelerator and in the in-memory computing accelerator comprises determining a computational accuracy level of each layer of the plurality of layers, a computational accuracy of the digital accelerator, and a computational accuracy of the in-memory computing accelerator (Kenney, Par. [0158], “The inventors have further appreciated that representing digital values with large numbers of bits, while providing high degrees of representational accuracy, can lead to a significant increase in power consumption. Consider for example analog-to-digital converters (ADCs). The power consumption of an ADC depends, among other factors, upon the number of bits with which values are represented. The energy required for some ADCs grows exponentially with the number of output bits. In essence, there is a trade-off between representational accuracy and power consumption.”, therefore, a computational accuracy is determined for each layer (See Par. [0227] which mentions how error/accuracy is layer-dependent based on precision) with respect to both the analog and digital accelerators)
the portion of the input data that corresponds to the layers of the plurality of layers having a computational accuracy level greater than the computational accuracy of the in-memory computing accelerator is identified as the first data, and the portion of the input data that corresponds to the layers of the plurality of layers having a computational accuracy level less than the computational accuracy of the in-memory computing accelerator is identified as the second data (Kenney, Par. [0214], “Unlike accelerator 1-100, accelerator 1-700 is configured to perform mathematical operations in a single pass. Therefore, digital accumulator 1-110 is omitted. However, as described in detail further above, techniques for improving the accuracy of low-precision fixed-point representations may be used with accelerators arranged to perform multiple passes. In particular, multiple passes can be performed between different levels of precision of the input data and the values encoded in the analog processor.”, therefore, accelerator 1-100 (digital accelerator) may require multiple passes to improve accuracy of mathematical operations – hence, this data would be identified as first data to be assigned to the digital accelerator. Accelerator 1-700 (including a digital-to-analog converter for analog computations) may be used for low-precision (tolerating imprecision/lesser accuracy) mathematical operations – hence, this data would be identified as second data to be assigned to the analog accelerator).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method for accelerating computations in applications per claim 1, as disclosed by Mani in view of Knag to include determining a computational accuracy level of each layer of the plurality of layers, a computational accuracy of the digital accelerator, and a computational accuracy of the in-memory computing accelerator, the portion of the input data that corresponds to the layers of the plurality of layers having a computational accuracy level greater than the computational accuracy of the in-memory computing accelerator is identified as the first data, and the portion of the input data that corresponds to the layers of the plurality of layers having a computational accuracy level less than the computational accuracy of the in-memory computing accelerator is identified as the second data, as disclosed by Kenney. One of ordinary skill in the art would have been motivated to make this modification to enable the hybrid analog-digital processor to obtain higher energy efficiencies and improved numerical accuracy for computations of varied precision (Kenney, Abstract, “Techniques for gain adjustment in a finite-sized hybrid analog-digital matrix processor are described which enable the system to obtain higher energy efficiencies, greater physical density and improved numerical accuracy. In some embodiments, these techniques enable maximization of the predictive accuracy of a GEMM-based convolutional neural network using low-precision data representations.”).
Claim 9 recites substantially the same limitations as Claim 2 in the form of a non-transitory computer-readable media, therefore it is rejected under the same rationale.
Claim 16 recites substantially the same limitations as Claim 2 in the form of a system, therefore it is rejected under the same rationale.
Conclusion
14. Any inquiry concerning this communication or earlier communications from the examiner should be directed to Devika S Maharaj whose telephone number is (571)272-0829. The examiner can normally be reached Monday - Thursday 8:30am - 5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571)270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/D.S.M./
Examiner, Art Unit 2123
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123