DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to communications filed on 01/08/2026. Claims 1-20 are pending and have been examined.
Information Disclosure Statement
The information disclosure statement (IDS) submitted was filed on 01/08/2026. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 15-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 15 was amended to recite “the inference model associating a first weight parameter to the node that is based at least on a loss function that integrates the injected noise, the first weight parameter learned through training on a dataset that noisy as a result of an estimation of an intrinsic electrical noise of the analog multiply-and-accumulation circuit”. However, the specification does not support the above features. The specification describes “during inference, when NN model(s) 120 execute on hardware accelerator 108, the intrinsic electrical noise effectively alters the data being analyzed. Thus, the weight parameters learned during training are not optimized for noisy data… adding stochastic (e.g., randomly determined) noise into the loss function used during the training of the NN model(s) 120. In particular, the intrinsic noise of hardware accelerator 108 may be modeled as noise generated at an output of ADC 206 of MAC circuit 200 thereof, which is an estimation of the intrinsic noise generated by the components of MAC circuit… injecting noise into an output value generated by certain nodes of the NN model(s) 120, where the injected noise emulates the noise generated at the output” (e.g. in paragraphs 61-62). It is noted that “the weight parameters learned during training are not optimized for noisy data” appears to be meant to illustrate an issue where training data input into a model does not have intrinsic electrical noise, but real-time (during inference) data input into the model has “intrinsic electrical noise”, and thus the weights trained do not account for such noisy data. Paragraph 62 then starts by stating “adding stochastic (e.g., randomly determined) noise into the loss function”; note that the noise is stochastic or random, i.e. does not estimate the intrinsic noise. The paragraph then states “the intrinsic noise of hardware accelerator 108 may be modeled as noise generated at an output of ADC 206 of MAC circuit 200 thereof, which is an estimation of the intrinsic noise generated by the components of MAC circuit… injecting noise into an output value generated by certain nodes of the NN model(s) 120, where the injected noise emulates the noise generated at the output”, but this estimation of intrinsic noise is added to generated outputs within the model, i.e. not the loss function and is not actually training on a noisy dataset such as described in paragraph 61 as “the data being analyzed” and understood by those of ordinary skill in the art as a training dataset refers to input into the model. Furthermore, the specification is silent as to weights with respect to estimation of intrinsic electrical noise. As such, the claim lacks written description. Due at least to their dependency upon claim 15, dependent claims 16-20 also fail to comply with the written description requirement.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 15-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
As per claim 15, it is not clear how to interpret “training on a dataset that noisy as a result of an estimation” (e.g. “training on a dataset that is noisy as a result of an estimation”, “training on a dataset that is not noisy as a result of an estimation”, etc.). Due at least to their dependency upon claim 15, dependent claims 16-20 also are indefinite.
Response to Arguments
With respect to claims 1-14, applicant’s arguments with respect to the newly amended features have been considered but are moot in view of new grounds of rejection. See Lee et al. (US 20230077987 A1) below for details. However, in response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). In this case, for example, Stevens is not relied upon to teach estimation of power and the newly amended features including modifying a loss function based on the estimated power. It is noted that Chai teaches estimation of power and modifying a loss function (e.g. in paragraphs 37, 68, 72, 136, 145, and 147, “consume an estimated…Tflops/s… estimate of the parameters… cost function can be one of size, weight, power and cost… the cost function may be the loss function as described previously as Equation (34), where λ1 and λ2 are set based to optimize…power…of the hardware architecture… In reference to the loss function selection the lambda parameters (λ.sub.1, λ.sub.2 and λ.sub.3) in equation (34) based in hardware parameters P, this disclosure describes hardware parameters P that may be updated during training to affect the selection of the annealing constraints… selection of hardware parameters P during training allows for a pareto-optimal selection of hardware resources with respect to…power”). See also newly Lee et al. (US 20230077987 A1) below for details. As such, the combination teaches the claimed features. Moreover, Chai is not relied upon to teach non-zero midterms. This is taught by Deng, which shows that it was well-known for power consumption to be based on the number of non-zero weights/mid-terms of a DNN (which corresponds to nodes) corresponding to MAC operations (e.g. in paragraphs 25 and 36, “Optimally reducing the number of non-zero weights, in turn, optimally reduces the computational burden… consume less power because the DNN includes fewer multiply and accumulate (MAC) operations” and figure 1). As such, the combination teaches the claimed features.
With respect to claims 15-20, applicant's arguments have been fully considered but they are not persuasive. In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., intrinsic electrical noise estimate being implemented as part of the loss function) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). In this case, intrinsic electrical noise is not required to be the same noise as injected noise in the loss function. Cherupally teaches the first weight parameter learned through training on a dataset that noisy as a result of an estimation of an intrinsic electrical noise of the analog multiply-and-accumulation circuit (e.g. in paragraphs 6, 41, 65, 69, and 73, “IMC performs MAC computation… noise-aware DNN training is performed… hardware noise is injected [i.e. noisy training dataset] by emulating the IMC macro's dot-product computation and then using the conditional probability tables to transform the smaller chunks of dot-product values (i.e., partial sums) in a similar way to the actual IMC hardware [i.e. estimation of intrinsic electrical noise]… error from the loss function 18 is passed through a backward pass of the IMC engine 10 and used to update the weights w0, w1, w2, w3”). As such, the combination teaches the claimed features.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-3, 7-10, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Stevens et al. (US 20220067513 A1) in view of Chai et al. (US 20200134461 A1), Lee et al. (US 20230077987 A1), Cherupally et al. (US 20220318628 A1), and Deng et al. (US 20190180184 A1).
As per independent claim 1, Stevens teaches a system, comprising:
at least one processor circuit (e.g. in paragraphs 42-43, “processor”); and
at least one memory that stores program code configured to be executed by the at least one processor circuit (e.g. in paragraphs 42-43, “non-transitory machine readable media comprising machine-executable instructions”), the program code comprising:
a neural network model trainer configured to: receive a configuration file that specifies characteristics of hardware including multiply-and-accumulation circuits utilized to implement nodes of a particular layer of a neural network (e.g. in paragraphs 32, 34, 42, 45, 88 and 90-92, ““Neural network” refers to an algorithm or computational system based on a collection of connected units or nodes called artificial neurons… neural networks…acquire differences during training (e.g., be trained to have different weights from one another)… comprise “layers” that perform operations on vector inputs to generate vector or scalar outputs… instructions...comprises settings and values (such as resistance, impedance, capacitance, inductance, current/voltage ratings, etc.)… each local processing element may utilize one or more collectors (e.g., small register-files): one in front of a weight buffer, another one in front of an accumulation buffer, and another in front of an input activation buffer… Each of the vector multiply-accumulate units 1202 includes a weight collector 1320 buffer having a configurable depth (e.g., number of distinct registers or addresses in a register file used by the vector multiply-accumulate units 1202 during computations) of WD and a width V×N×WP (WP is also called the weight precision)… Some or all of WD, WP, IAP, AD, and AP may be configurable”);
generate an inference model based at least on a training session of the neural network with features causing output values generated by the multiply-and-accumulation circuits to have reduced precision (e.g. in paragraphs 32 and 70, “inference on a trained model [i.e. generated] with less precise data representations to increase performance (improve throughput or latency per inference) and reduce computational energy expended per inference”),
but does not specifically teach the hardware including analog multiply-and-accumulation circuits and during a training session of the neural network: determine an estimate of an amount of power consumed by the hardware during execution thereof based at least one a number of non-zero midterms generated by the nodes; and modify a loss function of the neural network based at least on the estimate, the modified loss function comprising an equation that includes a value indicative of the estimate of the power consumed by the hardware and wherein the features include the modified loss function causing weight parameters of the inference model to have a sparse bit representation.
However, Chai teaches during a training session of a neural network: determine an estimate of an amount of power consumed by hardware during execution thereof and modify a loss function of the neural network based at least on power, the modified loss function comprising an equation that includes a value (e.g. in paragraphs 37, 68, 72, 136, 145, and 147, “consume an estimated…Tflops/s… estimate of the parameters… map the neural network software architecture to appropriate processors in the system architecture. Machine learning system 104 may use a cost function to select the best mapping (e.g., a best fit algorithm can be use). The cost function can be one of size, weight, power and cost… the cost function may be the loss function as described previously as Equation (34), where λ1 and λ2 are set based to optimize…power…of the hardware architecture… In reference to the loss function selection the lambda parameters (λ.sub.1, λ.sub.2 and λ.sub.3) in equation (34) based in hardware parameters P, this disclosure describes hardware parameters P that may be updated during training to affect the selection of the annealing constraints… selection of hardware parameters P during training allows for a pareto-optimal selection of hardware resources with respect to…power”), wherein features include a modified loss function causing weight parameters of the inference model to have a sparse bit representation (e.g. in paragraphs 42 and 72, “accommodate low precision weights… sparsity may refer to the use of low-precision weights, which may only take a limited (sparse) number of values… where low-precision weights 116 are constrained…machine learning system 104 may use the loss function”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Stevens to include the teachings of Chai because one of ordinary skill in the art would have recognized the benefit of optimizing a neural network,
but does not specifically teach the hardware including analog multiply-and-accumulation circuits, modify the loss function based at least on the estimate, the modified loss function comprising an equation that includes a value indicative of the estimate of the power consumed by the hardware, and power based at least one a number of non-zero midterms generated by the nodes.
However, Lee teaches modify a loss function based at least on an estimate of power consumed by hardware, the modified loss function comprising an equation that includes a value indicative of the estimate of the power consumed by the hardware (e.g. in paragraphs 75, 81, 94-95, 98-99, and 124, “the cost estimation network may predict hardware metrics using the cost function, and the cost function may be defined as a linear combination of the latency, the area, and the energy consumption, or may be defined as the combination and the product between the latency, the area, and the energy consumption… execution of multiple MAC (Multiply-Accumulate) operations, which are the most common operations in recent CNNs… cost estimation network may generate as output the three cost metrics of interest (i.e., latency, area, and energy consumption) based on the ground truth generated by the evaluation software… controlling λ.sub.E, λ.sub.L, and λ.sub.A, conditions for how to measure the balance between each cost metric may be set… dynamic energy consumption may mainly depend on the number of MAC operations”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Lee because one of ordinary skill in the art would have recognized the benefit of optimizing a neural network,
but does not specifically teach the hardware including analog multiply-and-accumulation circuits and power based at least one a number of non-zero midterms generated by the nodes.
However, Cherupally teaches hardware including analog multiply-and-accumulation circuits (e.g. in paragraphs 3, 6-7, 9, and 59, “multiply-and-accumulate (MAC)… (IMC)-based deep neural network (DNN) hardware is provided… performing the analog MAC computation”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Cherupally because one of ordinary skill in the art would have recognized the benefit of incorporating well-known types of operations (also amounts a simple substitution that yields predictable results [e.g. see KSR Int'l Co v. Teleflex Inc., 550 US 398,82 USPQ2d 1385,1396 (U.S. 2007) and MPEP § 2143(B)]),
but does not specifically teach power based at least one a number of non-zero midterms generated by the nodes.
However, Deng teaches power based at least one a number of non-zero midterms generated by nodes (e.g. in paragraphs 25 and 36, “Optimally reducing the number of non-zero weights, in turn, optimally reduces the computational burden… consume less power because the DNN includes fewer multiply and accumulate (MAC) operations” and figure 1). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Deng because one of ordinary skill in the art would have recognized the benefit of determining power based on relevant factors.
As per claim 2, the rejection of claim 1 is incorporated and the combination further teaches wherein the particular layer comprises at least one of a fully-connected layer; or a convolutional layer (e.g. Stevens, in paragraph 82, “convolutional and fully-connected layers”).
As per claim 3, the rejection of claim 1 is incorporated and the combination further teaches wherein the characteristics comprise at least one of: a bit width for input data provided as an input for each of the analog multiply- and-accumulation circuits; a bit width for a second weight parameter provided as an input for each of the analog multiply-and-accumulation circuits; a bit width for output data output by analog-to-digital converters of the analog multiply-and-accumulation circuits; or a vector size supported by the analog multiply-and-accumulation circuits (e.g. Stevens, in paragraphs 90-92, “weight collector 1320 buffer having a configurable depth (e.g., number of distinct registers or addresses in a register file used by the vector multiply-accumulate units 1202 during computations) of WD and a width V×N×WP... input activations have width IAP. Each of the vector multiply-accumulate units 1202 also includes an accumulation collector 1322 having a configurable operational depth AD and width N×AP… WP×N×V bits wide and is able to supply different weight vectors… values of V and N may be adjusted”).
As per claim 7, the rejection of claim 1 is incorporated, but Stevens does not specifically teach wherein the neural network model trainer is further configured to: inject noise into output values generated by the nodes, the injected noise emulating noise generated at outputs of analog-to-digital converters of the analog multiply-an-accumulation circuits, wherein the modified loss function incorporates the injected noise. However, Cherupally teaches a neural network model trainer configured to inject noise into output values generated by the nodes, the injected noise emulating noise generated at outputs of analog-to-digital converters of analog multiply-an-accumulation circuits (e.g. in paragraphs 6, 10-11, 57, and 63, “IMC performs MAC computation inside the on-chip memory (e.g., SRAM) by activating multiple/all rows of the memory array. The MAC result is represented by analog bitline voltage/current and subsequently digitized by an analog-to-digital converter (ADC) in the peripheral of the array… During DNN training, embodiments perform noise injection at the partial sum level, which matches with the crossbar structure of IMC hardware, and the injected noise data is directly based on measurements of actual IMC prototype chips… the injected noise is directly from IMC chip measurement results on the quantized ADC outputs for different partial sum (MAC) values”), wherein a modified loss function incorporates the injected noise (e.g. in paragraphs 41, 69, and 73, “IMC hardware noise-aware training… error from the loss function 18 is passed through a backward pass of the IMC engine 10 and used to update the weights w0, w1, w2, w3”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Cherupally because one of ordinary skill in the art would have recognized the benefit of improving neural network accuracy.
Claims 8-10 and 14 are the method claims corresponding to system claims 1-3 and 7, and are rejected under the same reasons set forth.
Claims 4-5 and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Stevens et al. (US 20220067513 A1) in view of Chai et al. (US 20200134461 A1), Lee et al. (US 20230077987 A1), Cherupally et al. (US 20220318628 A1), and Deng et al. (US 20190180184 A1) and further in view of Mallinson (US 8766841 B2).
As per claim 4, the rejection of claim 1 is incorporated, but the combination does not specifically teach, as a whole, for each node of the nodes: determining the number of non-zero midterms generated by the node; determining a computational precision value of the node; combining the number of non-zero midterms generated by the node and the computational precision value of the node to generate a node estimate of an amount of power consumed by an analog multiply-and accumulation circuit of the analog multiply-and accumulation circuits corresponding to the node; and combining the node estimates to generate the estimate of the amount of power consumed by the analog multiply-and-accumulation circuits. However, the combination teaches for each node of nodes, determining a number of non-zero midterms generated by the node (e.g. Chai in paragraphs 37, 68, 72, 136, 145, and 147, “estimate of the parameters… map the neural network software architecture to appropriate processors in the system architecture. Machine learning system 104 may use a cost function to select the best mapping (e.g., a best fit algorithm can be use). The cost function can be one of…power… the cost function may be the loss function as described previously as Equation (34), where λ1 and λ2 are set based to optimize…power…of the hardware architecture… selection of hardware parameters P during training allows for a pareto-optimal selection of hardware resources with respect to…power”; Deng, in paragraphs 25 and 36, “Optimally reducing the number of non-zero weights, in turn, optimally reduces the computational burden… consume less power because the DNN includes fewer multiply and accumulate (MAC) operations”, i.e. power determined by non-zero midterms, and figure 1 showing nodes) and features including estimate of an amount of power consumed by analog multiply-and accumulation circuits/a multiply-and accumulation circuit of analog multiply-and accumulation circuits corresponding to a node (e.g. Stevens, in paragraphs 32, 43 and 79, ““Neural network” refers to an algorithm or computational system based on a collection of connected units or nodes… multiply-accumulate units 1202”; Chai, in paragraphs 37, 136, and 147, “consume an estimated…Tflops/s… estimate of the parameters… map the neural network software architecture to appropriate processors in the system architecture… selection of hardware parameters P during training allows for a pareto-optimal selection of hardware resources with respect to…power”) and Makkinson teaches determining a number of non-zero midterms generated by a node (e.g. in column 1 lines 40-67 and column 4 lines 35-47, “0.8, 0.5, 0.7, 0.6, 0.9, 0.7, 0.8, 0.6, 0.7, 0.8 [i.e. non-zero midterms]… a resistor of value 2R is similarly connected (typically in parallel), as well as resistors of values 4R, 8R and so forth”), determining a computational precision value of the node (e.g. in column 5 lines 36-48, “code control bits of a given significance… sets of three uppermost bits in each branch of FIG. 3 code for the 3 most significant control bits… control signal”), combining the number of non-zero midterms generated by the node and the computational precision value of the node to generate a node feature (e.g. in column 1 lines 40-57, column 3 lines 47--65, and column 5 lines 36-48, “used only one time to merge each of the segments after they are summed--into the final output… voltage present at the node”), and combining the node features to generate a sum of features (e.g. in column 1 lines 40-57, “multiple MDAC's to form a sum-of-products… voltage on that output node”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Mallinson because one of ordinary skill in the art would have recognized the benefit of facilitating relevant calculations.
As per claim 5, the rejection of claim 4 is incorporated and the combination further teaches wherein the computational precision value is based at least on a most significant bit of an output value generated by the node (e.g. Mallinson, in column 5 lines 36-48, “code control bits of a given significance… sets of three uppermost bits in each branch of FIG. 3 code for the 3 most significant control bits… control signal”).
Claims 11-12 are the method claims corresponding to system claims 4-5, and are rejected under the same reasons set forth.
Claims 6 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Stevens et al. (US 20220067513 A1) in view of Chai et al. (US 20200134461 A1), Lee et al. (US 20230077987 A1), Cherupally et al. (US 20220318628 A1), and Deng et al. (US 20190180184 A1) and further in view of Abuhatzera et al. (US 20200320375 A1).
As per claim 6, the rejection of claim 1 is incorporated, but the combination does not specifically teach wherein the neural network model trainer is further configured to: apply a gradient descent optimization algorithm to the modified loss function during the training session to determine the weight parameters. However, Abuhatzera teaches a neural network model trainer is further configured to apply a gradient descent optimization algorithm to a modified loss function during a training session to determine weight parameters (e.g. in paragraph 15, “When training a machine learning model, such as a neural network or DNN, input data is transformed to some output, and a loss or error function is used to compare if the model predicts an output value close to an expected value. The amount of calculated error is then propagated back from the output to the inputs of the model using stochastic gradient descent (or another training algorithm) and the process repeats until the error is acceptably low enough or a maximum number of iterations is achieved. The parameters learned during this training process are the weights that connect each node”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Abuhatzera because one of ordinary skill in the art would have recognized the benefit of facilitating minimizing of error.
Claim 13 is the method claim corresponding to system claim 6, and is rejected under the same reasons set forth.
Claims 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Stevens et al. (US 20220067513 A1) in view of Cherupally et al. (US 20220318628 A1) and Hedge et al. (US 20200356858 A1).
As per independent claim 15, Stevens teaches a method, comprising:
receiving a configuration file that specifies characteristics of a multiply-and-accumulation circuit utilized to implement a node of a particular layer of a neural network (e.g. in paragraphs 32, 34, 42, 45, 88 and 90-92, ““Neural network” refers to an algorithm or computational system based on a collection of connected units or nodes called artificial neurons… neural networks…acquire differences during training (e.g., be trained to have different weights from one another)… comprise “layers” that perform operations on vector inputs to generate vector or scalar outputs… instructions...comprises settings and values (such as resistance, impedance, capacitance, inductance, current/voltage ratings, etc.)… each local processing element may utilize one or more collectors (e.g., small register-files): one in front of a weight buffer, another one in front of an accumulation buffer, and another in front of an input activation buffer… Each of the vector multiply-accumulate units 1202 includes a weight collector 1320 buffer having a configurable depth (e.g., number of distinct registers or addresses in a register file used by the vector multiply-accumulate units 1202 during computations) of WD and a width V×N×WP (WP is also called the weight precision)… Some or all of WD, WP, IAP, AD, and AP may be configurable”); and
generating an inference model based at least on the training session of the neural network (e.g. in paragraphs 32 and 70, “inference on a trained model [i.e. generated] with less precise data representations to increase performance (improve throughput or latency per inference) and reduce computational energy expended per inference”),
but does not specifically teach an analog multiply-and-accumulation circuit and during a training session of the neural network: injecting noise into an output value generated by the node, the injected noise being based at least on the characteristics specified by the configuration file, the injected noise emulating noise generated at an output of an analog-to-digital converter of the analog multiply-an-accumulation circuit; and the inference model associating a first weight parameter to the node that is based at least on a loss function that integrates the injected noise, the first weight parameter learned through training on a dataset that noisy as a result of an estimation of an intrinsic electrical noise of the analog multiply-and-accumulation circuit.
However, Cherupally teaches an analog multiply-and-accumulation circuit (e.g. in paragraphs 3, 6-7, 9, and 59, “multiply-and-accumulate (MAC)… (IMC)-based deep neural network (DNN) hardware is provided… performing the analog MAC computation”) and during a training session of a neural network, injecting noise into an output value generated by the node, the injected noise being based at least on characteristics, the injected noise emulating noise generated at an output of an analog-to-digital converter of the analog multiply-an-accumulation circuit (e.g. in paragraphs 6, 10-11, 57, 63, and 73, “IMC performs MAC computation inside the on-chip memory (e.g., SRAM) by activating multiple/all rows of the memory array. The MAC result is represented by analog bitline voltage/current and subsequently digitized by an analog-to-digital converter (ADC) in the peripheral of the array… During DNN training, embodiments perform noise injection at the partial sum level, which matches with the crossbar structure of IMC hardware, and the injected noise data is directly based on measurements of actual IMC prototype chips… the injected noise is directly from IMC chip measurement results on the quantized ADC outputs for different partial sum (MAC) values… three different activation/weight precision values of 1-bit, 2-bit, and 4-bit”), and a model associating a first weight parameter to the node that is based at least on the injected noise (e.g. in paragraphs 41, 69, and 73, “IMC hardware noise-aware training… error from the loss function 18 is passed through a backward pass of the IMC engine 10 and used to update the weights w0, w1, w2, w3”), the first weight parameter learned through training on a dataset that noisy as a result of an estimation of an intrinsic electrical noise of the analog multiply-and-accumulation circuit (e.g. in paragraphs 6, 41, 65, 69, and 73, “IMC performs MAC computation… noise-aware DNN training is performed… hardware noise is injected [i.e. noisy training dataset] by emulating the IMC macro's dot-product computation and then using the conditional probability tables to transform the smaller chunks of dot-product values (i.e., partial sums) in a similar way to the actual IMC hardware [i.e. estimation of intrinsic electrical noise]… error from the loss function 18 is passed through a backward pass of the IMC engine 10 and used to update the weights w0, w1, w2, w3”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Cherupally because one of ordinary skill in the art would have recognized the benefit of incorporating well-known types of operations (also amounts a simple substitution that yields predictable results [e.g. see KSR Int'l Co v. Teleflex Inc., 550 US 398,82 USPQ2d 1385,1396 (U.S. 2007) and MPEP § 2143(B)]) and/or improving neural network accuracy,
but the combination does not specifically teach a loss function that integrates the injected noise.
However, Hedge teaches a loss function that integrates injected noise (e.g. in abstract and paragraphs 204 and 208, “enhance privacy of edge weights by adding noise during a forward pass and a backward pass… important that noise is injected at various phases of the learning so that a malicious user, for example, is not able to effectively “reverse engineer” the graph (in particular, the edge weights or the existence of edges) using a large number of observed inductive learning outputs… computing one or more private gradients of the loss function by… adding (2) a noise matrix of the one or more noise matrices”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Hedge because one of ordinary skill in the art would have recognized the benefit of enhancing privacy.
As per claim 16, the rejection of claim 15 is incorporated and the combination further teaches wherein the particular layer comprises at least one of: a fully-connected layer; or a convolutional layer (e.g. Stevens, in paragraph 82, “convolutional and fully-connected layers”).
As per claim 17, the rejection of claim 15 is incorporated and the combination further teaches wherein the characteristics comprise at least one of: a bit width for input data provided as an input to the analog multiply-and-accumulation circuit; a bit width for a second weight parameter provided as an input to the analog multiply-and-accumulation circuit; a bit width for output data output by the analog-to-digital converter; an alpha parameter specifying a dominance level of the noise injected into the output value; or a vector size supported by the analog multiply-and-accumulation circuit (e.g. Stevens, in paragraphs 90-92, “weight collector 1320 buffer having a configurable depth (e.g., number of distinct registers or addresses in a register file used by the vector multiply-accumulate units 1202 during computations) of WD and a width V×N×WP... input activations have width IAP. Each of the vector multiply-accumulate units 1202 also includes an accumulation collector 1322 having a configurable operational depth AD and width N×AP… WP×N×V bits wide and is able to supply different weight vectors… values of V and N may be adjusted”).
As per claim 18, the rejection of claim 17 is incorporated and the combination further teaches wherein the noise injected into the output value is randomized in accordance with a distribution function (e.g. Cherupally, in paragraphs 60, 65, and 78, “distributions… fitted Gaussian model… returns a random float in [0, 1]… random samplings of ADC quantization outputs were performed from each probability table for random inputs”).
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Stevens et al. (US 20220067513 A1) in view of Cherupally et al. (US 20220318628 A1) and Hedge et al. (US 20200356858 A1) and further in view of Aggarwal et al. (US 20090060095 A1).
As per claim 19, the rejection of claim 18 is incorporated and the combination further teaches wherein the distribution function is a normal distribution (e.g. Cherupally, in paragraphs 60, 65, and 78, “distributions… fitted Gaussian [i.e. normal] model… returns a random float in [0, 1]… random samplings of ADC quantization outputs were performed from each probability table for random inputs… injecting noise at the weight-level drawn from Gaussian distributions”), but does not specifically teach having a zero mean and a predetermined variance. However, Aggarwal teaches a distribution function having a zero mean and a predetermined variance (e.g. in paragraph 23, “example of such a distribution would be a gaussian kernel with kernel width h. This corresponds to the gaussian distribution with zero mean and variance h.sup.2”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Aggarwal because one of ordinary skill in the art would have recognized the benefit of incorporating well-known types of distributions (also amounts a simple substitution that yields predictable results [e.g. see KSR Int'l Co v. Teleflex Inc., 550 US 398,82 USPQ2d 1385,1396 (U.S. 2007) and MPEP § 2143(B)]).
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Stevens et al. (US 20220067513 A1) in view of Cherupally et al. (US 20220318628 A1), Hedge et al. (US 20200356858 A1), and Aggarwal et al. (US 20090060095 A1) and further in view of Chen et al. (US 20200097823 A1) and Sutherland et al. (US 5214745).
As per claim 20, the rejection of claim 19 is incorporated, but the combination does not specifically teach wherein the predetermined variance is based at least on the bit width for the output data that is outputted by the analog-to-digital converter and the alpha parameter.
However, the combination teaches output data that is outputted by the analog-to-digital converter (e.g. Cherupally, in paragraphs 6 and 52, “IMC performs MAC computation inside the on-chip memory (e.g., SRAM) by activating multiple/all rows of the memory array. The MAC result is represented by analog bitline voltage/current and subsequently digitized by an analog-to-digital converter (ADC) in the peripheral of the array… quantize the analog voltage/current into digital values”) and Chen teaches variance being based on a bit width for data (e.g. in abstract and paragraph 33, “variance values for the different master bit-widths that are evaluated for a current layer/channel”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Chen because one of ordinary skill in the art would have recognized the benefit of incorporating relevant features associated with variance (also amounts a simple substitution that yields predictable results [e.g. see KSR Int'l Co v. Teleflex Inc., 550 US 398,82 USPQ2d 1385,1396 (U.S. 2007) and MPEP § 2143(B)]),
but does not specifically teach wherein the predetermined variance is based at least on the alpha parameter.
However, the combination also teaches an output associated with noise injected into the output value (e.g. Cherupally, in paragraph 10, “embodiments perform noise injection at the partial sum level”) and Sutherland teaches variance being based at least on an alpha parameter specifying a dominance level associated with an output (e.g. in column 14 lines 60-66, “displays a varying dominance or magnitude (.lambda..sub.p) which statistically is inversely proportional to the pattern variance (eq. 23). In other words, the encoded patterns most similar to the input stimulus pattern [S]* produce the more dominant contribution within the generated response output”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Sutherland because one of ordinary skill in the art would have recognized the benefit of incorporating relevant features associated with variance (also amounts a simple substitution that yields predictable results [e.g. see KSR Int'l Co v. Teleflex Inc., 550 US 398,82 USPQ2d 1385,1396 (U.S. 2007) and MPEP § 2143(B)]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
For example,
He et al. (US 20230196103 A1) teaches “the weight precision of each layer in the neural network is tried to be reduced… the resource utilization rate and the performance are improved, and the power consumption is reduced” (e.g. in paragraph 205).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WONG whose telephone number is (571)270-1399. The examiner can normally be reached Monday-Friday 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, TAMARA KYLE can be reached at (571)272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/W.W/Examiner, Art Unit 2144 01/24/2026
/TAMARA T KYLE/Supervisory Patent Examiner, Art Unit 2144