Office Action Analysis: 17931682 — METHODS OF TRAINING DEEP NEURAL NETWORKS (DNN) USING SIGNAL NON-IDEALITIES AND QUANTIZATION ASSOCIATED WITH IN-MEMORY OPERATIONS AND RELATED DEVICES

Office Action

§103
DETAILED ACTION
This Office Action is sent in response to the Applicant’s Communication received on 10/16/2025 for application number 17/931,682. The Office hereby acknowledges receipt of the following and placed of record in file: Specification, Drawings, Abstract, Oath/Declaration, IDS, and Claims.
Claims 2, 4, 8-20, 22, 23, and 25 are canceled.
Claims 1, 3, 5-7, 21, 24, and 26-33 are amended. 
Claims 1, 3, 5-7, 21, 24, and 26-38 are pending.	
	

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 3, 5, and 6 are objected to because of the following informalities:
Claim 3 should read “The method of claim 1…”
Claim 5 should read “The method of claim 3…”
Claim 6 should read “The method of claim 3…” 
Appropriate correction is required.

Response to Arguments
In regards to the 35 USC 103 rejections, Applicant’s arguments with respect to claim(s) 1, 21, and 29 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1 is rejected under 35 U.S.C. 103 as being unpatentable over He et al. (Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness against Adversarial Attack, published 2018), hereinafter He, in view of Zhou et al. (NOISY MACHINES: UNDERSTANDING NOISY NEURAL NETWORKS AND ENHANCING ROBUSTNESS TO ANALOG HARDWARE ERRORS USING DISTILLATION, published 2020), hereinafter Zhou, and Reisser et al. (US 20210073650 A1), hereinafter Reisser.

Regarding claim 1, He teaches,
A method for strengthening a deep neural network (DNN) against adversarial attacks [Abstract, we explore to utilize the regularization characteristic of noise injection to improve DNN’s robustness against adversarial attack], the method comprising: training the DNN [Abstract, Training the network with Gaussian noise] 

He teaches the above limitations of claim 1, including the DNN. However, He does not teach providing the neural network on in-memory computing (IMC) hardware; the noise coming from IMC hardware.

Zhou teaches,
Providing (Sect 3, Figure 1, Deploying) the neural network (Sect 3, Figure 1, neural network layer) on in-memory computing (IMC) hardware including a crossbar array (Sect 3, Figure 1, on an analog in-memory crossbar) [Sect 3, Figure 1: Deploying a neural network layer, l, on an analog in-memory crossbar involves first flattening the filters for a given layer into weight matrix W1, which is then programmed into an array of NVM devices which provide differential conductances G1 for analog multiplication];

    PNG
    media_image1.png
    379
    462
    media_image1.png
    Greyscale

Zhou is analogous to the claimed invention as they both relate to training neural networks with noisy data. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified He’s teachings to incorporate the teachings of Zhou and provide the neural network on in-memory computing (IMC) hardware and the noise coming from IMC hardware [Zhou, Abstract] in order to achieve lower power consumption and higher accuracy.

He-Zhou does not teach injecting, during training, variations in analog partial sum signals measured from in-memory computing hardware of a same type into partial sum signals generated at a crossbar of IMC hardware; and quantizing the partial sum signals during training to a reduced resolution relative to corresponding full-precision partial sum signals.

Reisser teaches,
Injecting (Para 0033, generate bit-line voltage variations), during training, variations in analog partial sum signals measured from in-memory computing hardware of a same type into partial sum signals generated at a crossbar of IMC hardware [Para 0022, such methods include accounting for effects of noise in the training of neural networks destined for use on a binary system so that the resultant neural network models are better suited for operations using CIM arrays. In other words, the training process includes simulating noise effects of CIM devices; Para 0032, In order to train neural network models that are robust to CIM specific effects, a low-level circuit simulation (e.g. SPICE) of a CIM array is generated in order to have a low-level noise model, which is abstracted into a high-level differentiable CIM-array simulator. The CIM array simulation is then integrated into a CIM chip simulator; Para 0033, Monte Carlo simulations are used to generate bit-line voltage variations at each individual population count in [0, N] according to the hardware noise model; Para 0036, The simulation controls how input activations and elements of each layer's weight matrix are routed to the one or more CIM arrays. As described elsewhere herein, the algorithm 400 shows the splitting up, if necessary, of the input, the performance of XNOR operations, the injections of various types of noises, the conversions between the voltage domain and the population-count domain, the comparison with a threshold (digital or analog, depending on whether the input needs to be split up or not), and the output of feature map results; Para 0037, the matrix-vector operation is split across several CIM arrays and the partial population counts are digitized using the ADC in order to be summed in the digital domain. The training simulations may be used to adjust the design of the CIM chip by trying different array heights to compromise between reduced ADC use and reduced resolution and signal-to-noise ration (SNR)]; and 
quantizing the partial sum signals during training to a reduced resolution relative to corresponding full-precision partial sum signals [Para 0037, the matrix-vector operation is split across several CIM arrays and the partial population counts are digitized using the ADC in order to be summed in the digital domain. The training simulations may be used to adjust the design of the CIM chip by trying different array heights to compromise between reduced ADC use and reduced resolution and signal-to-noise ration (SNR); Para 0041, relaxed quantization uses the concrete distribution to sample weights while slowly annealing the variance of the distribution during training].
Reisser is analogous to the claimed invention as they both relate to CIM applications. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified He and Zhou’s teachings to incorporate the teachings of Reisser and provide injecting variations and quantization in order to [Reisser, para 0016] make a system more efficient by reducing the power used.

Claim(s) 3 is rejected under 35 U.S.C. 103 as being unpatentable over He in view of Zhou and Reisser, and in further view of Kolter et al. (US 20220027723 A1), hereinafter Kolter.

Regarding claim 3, He-Zhou-Reisser teach the limitations of claims 1, including the DNN (see claim 1).

Reisser further teaches,
quantizing during training [Para 0041, relaxed quantization uses the concrete distribution to sample weights while slowly annealing the variance of the distribution during training].
Reisser is analogous to the claimed invention as they both relate to CIM applications. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified He and Zhou’s teachings to incorporate the teachings of Reisser and provide injecting variations and quantization in order to [Reisser, para 0016] make a system more efficient by reducing the power used.

He-Zhou-Reisser does not teach wherein NN has weights stored in crossbar array of in-memory computing hardware and wherein quantizing partial sum signals is performed by analog- to-digital converters (ADCs) coupled to outputs of crossbar array

Kolter further teaches,
Wherein NN has weights stored in crossbar array of in-memory computing hardware [Para 0167, An analog circuit configuration called a crossbar network can be used for the purpose of matrix multiply and add operations. Such network (e.g., illustrated in FIG. 36)… Along each word line, multiple weight elements are placed at crossings with columns (bit lines). These weight elements are implemented by means of impedances (conductances), where each element is an integer W.sub.ij multiple of a unit conductance G, resulting in a conductance of G.Math.W.sub.ij.];
and wherein quantizing (Para 0169, digitized) partial sum signals is performed by (Para 0167, Each bit line… implements a summation node) analog- to-digital converters (ADCs) (Para 0169, analog-to-digital converter (ADC)) coupled to outputs of crossbar array [Fig 36; Para 0167, An analog circuit configuration called a crossbar network can be used for the purpose of matrix multiply and add operations. Such network (e.g., illustrated in FIG. 36) applies the integer neuron activation values… Along each word line, multiple weight elements are placed at crossings with columns (bit lines). These weight elements are implemented by means of impedances (conductances), where each element is an integer W.sub.ij multiple of a unit conductance G, resulting in a conductance of G.Math.W.sub.ij. Each bit line crosses multiple word lines with corresponding weights at their crossings and therefore implements a summation node to add the currents; Para 0169, This voltage V.sub.j is then digitized to an integer Y.sub.j by means of an analog-to-digital converter (ADC)].
Kolter is analogous to the claimed invention as they both relate to in-memory computing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified He and Zhou’s teachings to incorporate the teachings of Kolter and provide storing neural network weights in a crossbar and quantizing neural network results in order to improve neural network robustness using in-memory computation.

Claim(s) 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over He in view of Zhou, Reisser, and Kolter, and in further view of Park et al. (Weighted-Entropy-Based Quantization for Deep Neural Networks, published 2017).

Regarding claim 5, He-Zhou-Reisser-Kolter teach the limitations of claims 3 including the partial sums (see claim 3).

He-Zhou-Reisser-Kolter do not teach wherein quantizing comprises quantizing to 1-bit, 2-bit, 3-bit, or 4-bit values.

Park teaches,
wherein quantizing (Sect 5.1.4, para 2, aggressive bitwidth optimization) comprises quantizing (Sect 5.1.4, para 2, quantization) to 1-bit, 2-bit, 3-bit, or 4-bit values (Sect 5.1.4, para 2, two to six bits for each layer) [Sect 5.1.4, para 2, We believe that even more aggressive bitwidth optimization could be possible by taking this layer-wise sensitivity to bitwidths into account during quantization. Exhaustive search of all possible combinations of bitwidths is impractical as there are too many of them even for small networks (e.g., AlexNet has 515 ≈ 3 × 1010 possible configurations that use two to six bits for each layer)].
Park is analogous to the claimed invention as they both relate to quantization in DNNs. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified He, Zhou, and Kolter’s teachings to incorporate the teachings of Park and provide aggressive quantization to reduce computational cost and model size quantizing to 1-bit, 2-bit, 3-bit, or 4-bit values [Park, Abstract] to achieve the desired accuracy.

Regarding claim 6, He-Zhou-Reisser-Kolter teach the limitations of claims 3 including the partial sums (see claim 3).

He-Zhou-Reisser-Kolter do not teach wherein quantizing comprises quantizing to 1-bit or 2-bit values.

Park teaches,
wherein quantizing (Sect 5.1.4, para 2, aggressive bitwidth optimization) comprises quantizing (Sect 5.1.4, para 2, quantization) to 1-bit or 2-bit values (Sect 5.1.4, para 2, two to six bits for each layer) [Sect 5.1.4, para 2, We believe that even more aggressive bitwidth optimization could be possible by taking this layer-wise sensitivity to bitwidths into account during quantization. Exhaustive search of all possible combinations of bitwidths is impractical as there are too many of them even for small networks (e.g., AlexNet has 515 ≈ 3 × 1010 possible configurations that use two to six bits for each layer)].
Park is analogous to the claimed invention as they both relate to quantization in DNNs. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified He, Zhou, and Kolter’s teachings to incorporate the teachings of Park and provide aggressive quantization to reduce computational cost and model size quantizing to 1-bit or 2-bit values [Park, Abstract] to achieve the desired accuracy.

Claim(s) 7 is rejected under 35 U.S.C. 103 as being unpatentable over He in view of Zhou and Reisser, and in further view of Pan et al. (CN 110827283 A, see attached translation), hereinafter Pan.

Regarding claim 7, He-Zhou-Reisser teach the limitations of claim 1 including using noise of in-memory computing hardware (claim 1: Reisser, paras 0022 and 0033).

He further teaches, 
adversarially training the DNN using measured noise [Abstract, Inspired by this classical method, we explore to utilize the regularization characteristic of noise injection to improve DNN’s robustness against adversarial attack. In this work, we propose Parametric-Noise-Injection (PNI)… embedded with adversarial training; Sect 1, para 5, The proposed PNI technique is to apply to inject layer-wise trainable Gaussian noise on various locations, including network input/activation/weights]. 

He-Zhou-Reisser teach the above limitations of claim 7, including adversarially training the DNN.

He-Zhou-Reisser does not teach training neural network using a continually differentiable exponential linear unit (CELU) activation function.  

Pan teaches,
Training (Para 0063, training) neural network (Para 0002, neural network) using a continually differentiable exponential linear unit (CELU) activation function (Para 0063, CELU nonlinear activation function) [Para 0002, The present application relates to the field of deep learning technology… based on a convolutional neural network; Para 0063, The optimizer used is Adam, the initial learning rate is set to 0.001, batch gradient descent is used for error backpropagation, the batch size is set to 5, 4 cards are used for training, and the training time is 20 hours. Preferably, a densely connected convolution unit is used as the basic convolution processing unit, a convolution unit of size 3×3×3 is used, and then a CELU nonlinear activation function].
Pan is analogous to the claimed invention as they both relate to deep learning technology. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified He, Zhou, and Reisser’s teachings to incorporate the teachings of Park and provide training neural network using a CELU activation function [Barron (Continuously Differentiable Exponential Linear Units, published 2017), Abstract] in order to speed up and improve deep learning architectures via an ELU activation with continuously differentiable parameterization.

Claim(s) 21, 35, 36, and 38 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou in view of He and Reisser.

Regarding claim 21, Zhou teaches,
A method of training a neural network, the method comprising: an IMC crossbar circuit [Fig. 1, see illustration in claim 1] configured to store parameters of NN [Sect 3, para 1, programming each element of this matrix into a memristive cell in the crossbar array] and to perform analog multiply-and-accumulate operations to produce analog partial sum signals [Sect 3, para 1, Figure 1 illustrates… to perform analog multiplication… The memristive devices connect row with columns, where the row voltages are converted into currents scaled by the programmed conductance, G, to generate the currents i(yl)… The currents from each memristive device essentially add up for free where they are connected in the columns, according to Kirchhoff’s current law. Finally, the differential currents are converted to bipolar voltages, v(yl), which are they digitized before adding bias, and performing batch normalization and ReLU operations, which are not shown in Figure 1]; 
to provide a pre-trained neural network (Sect 5.1, para 2, pretrained model) in a memory system. [Sect 3, Figure 1: Deploying a neural network layer, l, on an analog in-memory crossbar involves first flattening the filters for a given layer into weight matrix W1, which is then programmed into an array of NVM devices which provide differential conductances G1 for analog multiplication. A random Gaussian ΔW1 is used to model the inherent imprecision in analog computation; Sect 5.2, para 1, The teacher model is trained to an accuracy of 93:845%... The network is then retrained with noise injection to make it robust against noise. Retraining takes place for 150 epochs, the initial learning rate is 0:01 and decays with the same cosine profile. We performed two sets of retraining; Sect 5.1, para 2, The pretrained model without any retraining performs very poorly at inference time when noise is present. Retraining with Gaussian noise injection can effectively recover some accuracy] and
updating parameters of the NN based at least in part on resolution-reduced computation results [Sect 5.1, para 6, In a given layer, the input activations are quantized before being multiplied by noisy weights.].

Zhou does not teach providing the deep neural network including a model, training the deep neural network using measured variations responsive to the training to provide a pre-trained deep neural network, measured variations being from idealities in signals generated by an in-memory computing crossbar array circuit.

He teaches,
providing the DNN [Sect 3, para 2, The method that we propose to inject Gaussian noise to different components or locations within DNN] including a model [Sect 4.2.2, para 1, two trained neural network is taken as the source model (S) and target model (T).]; 
and training (Sect 1, para 6, trained) the DNN (Sect 1, para 6, DNN) using characterization data (Sect 1, para 6, injected noise) to provide a DNN [Sect 1, para 6, For each inference, the injected noise is independently sampled from the corresponding Gaussian distribution, where its mean and variance of this distribution is trained by gradient descent method as other parameters of DNN. To achieve the proper objective of the training, PNI is embedded with well-known adversarial training, where the injected noise (i.e., its mean and variance) will be optimized through end-to-end training instead of manual configuration.].
He is analogous to the claimed invention as they both relate to adversarial training of a neural network. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou’s teachings to incorporate the teachings of He and provide a deep neural network and that is trained using varied measures [He, Abstract] in order to improve model robustness.

Zhou-He teach the above limitations of claim 21 including the DNN (He, Sect 1, para 6).

Zhou-He do not teach neural network on in-memory computing (IMC) hardware; obtaining characterization data representing measured variations from reference values in the analog partial sum signals produced by the IMC crossbar circuit, the measured variations arising from physical variations of the IMC crossbar circuit; wherein the training comprises: modifying intermediate computation results of the DNN based at least in part on the characterization data, the intermediate computation results corresponding to aggregation-level outputs of the IMC crossbar circuit; and reducing a numerical resolution of the modified intermediate computation results to provide resolution-reduced computation results, the reducing being performed prior to accumulation of the resolution-reduced computation results into a full-sum output of a layer of the NN.

Reisser teaches,
Neural network on in-memory computing (IMC) hardware [Para 0016, One newly emerging architecture that allows for a significant reduction in power used is the compute-in-memory (CIM) architecture. Some implementations of CIM devices use modified static random-access memory (SRAM) cells. Other implementations may use other types of memory cells (e.g., magnetoresistive RAM (MRAM) or resistive RAM (RRAM)). Exemplary electronic computing devices may contain single or multiple CIM arrays. In some embodiments, a CIM array may comprise an array of modified SRAM cells programmable to store weights of, for example, a corresponding CNN, where the cells are also configured to perform calculations with received input values];
obtaining characterization data representing measured variations from reference values in the analog partial sum signals produced by the IMC crossbar circuit, the measured variations arising from physical variations of the IMC crossbar circuit [Para 0022, such methods include accounting for effects of noise in the training of neural networks destined for use on a binary system so that the resultant neural network models are better suited for operations using CIM arrays. In other words, the training process includes simulating noise effects of CIM devices; Para 0032, In order to train neural network models that are robust to CIM specific effects, a low-level circuit simulation (e.g. SPICE) of a CIM array is generated in order to have a low-level noise model, which is abstracted into a high-level differentiable CIM-array simulator. The CIM array simulation is then integrated into a CIM chip simulator; Para 0033, The low-level circuit simulations include a CIM array of N word lines and a single bit line. The cell weights are randomly initialized to 0 or 1. Subsequently, all rows are activated in sequence by switching the corresponding word-line such that XNOR evaluates to 1. For each word-line activated in this way, the bit-line voltage corresponding to a pop-count from 0 to N is read out. After this bit-line voltage vs. population count characterization is done at a typical case, Monte Carlo simulations are used to generate bit-line voltage variations at each individual population count in [0, N] according to the hardware noise model; Para 0037, the matrix-vector operation is split across several CIM arrays and the partial population counts are digitized using the ADC in order to be summed in the digital domain. The training simulations may be used to adjust the design of the CIM chip by trying different array heights to compromise between reduced ADC use and reduced resolution and signal-to-noise ration (SNR)];
wherein the training comprises: modifying intermediate computation results of the DNN based at least in part on the characterization data, the intermediate computation results corresponding to aggregation-level outputs of the IMC crossbar circuit [Para 0037, the matrix-vector operation is split across several CIM arrays and the partial population counts are digitized using the ADC in order to be summed in the digital domain. The training simulations may be used to adjust the design of the CIM chip by trying different array heights to compromise between reduced ADC use and reduced resolution and signal-to-noise ration (SNR)];
reducing a numerical resolution of the modified intermediate computation results to provide resolution-reduced computation results [Para 0037, the matrix-vector operation is split across several CIM arrays and the partial population counts are digitized using the ADC in order to be summed in the digital domain. The training simulations may be used to adjust the design of the CIM chip by trying different array heights to compromise between reduced ADC use and reduced resolution and signal-to-noise ration (SNR)], the reducing being performed prior to accumulation of the resolution-reduced computation results into a full-sum output of a layer of the NN [Para 0028, The results of each column are summed and compared to a threshold to determine the binary output for the column, with the results for all the columns aggregated to form a 1×1×64 output tensor (e.g., output tensor 210), which forms a part of the output tensor 204 for the layer; Para 0043, During training, a is a Gaussian random variable and consequently, a reparameterization trick may be used to sample from a before rounding or a probabilistic alternative such as relaxed quantization may be used. Note that sampling in combination with a straight-through estimator may be advantageous as it avoids the computationally expensive probabilistic relaxation of relaxed quantization. Upon adding all quantized partial pre-activations, the transformation of equation (3) is undone and the probability of stochastic binary activations is formulated as a difference from a threshold θ].
Reisser is analogous to the claimed invention as they both relate to CIM applications. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou and He’s teachings to incorporate the teachings of Reisser and provide noise injection and quantization in order to improve model robustness by preventing overfitting.

Regarding claim 35, Zhou-He-Reisser teach the limitations of claim 21 including training the DNN using characterization data (Reisser, Sect 1, para 6).

Reisser further teaches,
scaling magnitudes of the measured variations from reference values to correspond to deviations in the analog partial sum signals produced by the IMC crossbar circuit when operated at different supply voltages [Para 0031, the analog operations and conversions in a CIM device introduce various noises such as capacitor variation, thermal noise, and offset noise. The capacitor variation may be fixed per CIM array, but may depend on a particular population count for an operation. Thermal noise varies for each computation. Offset noise may be fixed per CIM array and be added at each activation. Reusing a CIM array, as in the sharing configuration, may introduce correlated noises of capacitor variation and offset. These various noise effects should be accounted for during training in order to generate an accurate model; Para 0032, In order to train neural network models that are robust to CIM specific effects, a low-level circuit simulation (e.g. SPICE) of a CIM array is generated in order to have a low-level noise model, which is abstracted into a high-level differentiable CIM-array simulator. The CIM array simulation is then integrated into a CIM chip simulator; Para 0033, The low-level circuit simulations include a CIM array of N word lines and a single bit line. The cell weights are randomly initialized to 0 or 1. Subsequently, all rows are activated in sequence by switching the corresponding word-line such that XNOR evaluates to 1. For each word-line activated in this way, the bit-line voltage corresponding to a pop-count from 0 to N is read out. After this bit-line voltage vs. population count characterization is done at a typical case, Monte Carlo simulations are used to generate bit-line voltage variations at each individual population count in [0, N] according to the hardware noise model; Para 0037, the matrix-vector operation is split across several CIM arrays and the partial population counts are digitized using the ADC in order to be summed in the digital domain. The training simulations may be used to adjust the design of the CIM chip by trying different array heights to compromise between reduced ADC use and reduced resolution and signal-to-noise ration (SNR)].
Reisser is analogous to the claimed invention as they both relate to CIM applications. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou and He’s teachings to incorporate the teachings of Reisser and provide variations in order to protect systems against adversarial attacks.

Regarding claim 36, Zhou-He-Reisser teach the limitations of claim 21.

Zhou further teaches storage cells within IMC crossbar circuit [Sect 3, para 1, Figure 1 illustrates how an arbitrary neural network layer, l… can be mapped to this hardware substrate… then programming each element of this matrix into a memristive cell in the crossbar array; See Fig. 1].

Zhou does not teach wherein modifying the intermediate computation results based at least in part on the characterization data comprises applying the measured variations from reference values in a spatially non-uniform pattern determined by physical locations of corresponding IMC crossbar circuit.

Reisser further teaches,
wherein modifying the intermediate computation results based at least in part on the characterization data comprises applying the measured variations from reference values in a spatially non-uniform pattern determined by physical locations of corresponding within the IMC crossbar circuit [Para 0031, the analog operations and conversions in a CIM device introduce various noises such as capacitor variation, thermal noise, and offset noise. The capacitor variation may be fixed per CIM array, but may depend on a particular population count for an operation. Thermal noise varies for each computation. Offset noise may be fixed per CIM array and be added at each activation. Reusing a CIM array, as in the sharing configuration, may introduce correlated noises of capacitor variation and offset. These various noise effects should be accounted for during training in order to generate an accurate model; Para 0037, the matrix-vector operation is split across several CIM arrays and the partial population counts are digitized using the ADC in order to be summed in the digital domain. The training simulations may be used to adjust the design of the CIM chip by trying different array heights to compromise between reduced ADC use and reduced resolution and signal-to-noise ration (SNR)].

Regarding claim 38, Zhou-He-Reisser teach the limitations of claim 21.

Reisser further teaches,
wherein obtaining the characterization data comprises collecting the measured variations from reference values from a plurality of IMC crossbar circuits of a same type and combining the measured variations to generate a composite variation profile for use in the training [Para 0022, such methods include accounting for effects of noise in the training of neural networks destined for use on a binary system so that the resultant neural network models are better suited for operations using CIM arrays. In other words, the training process includes simulating noise effects of CIM devices; Para 0037, the matrix-vector operation is split across several CIM arrays and the partial population counts are digitized using the ADC in order to be summed in the digital domain. The training simulations may be used to adjust the design of the CIM chip by trying different array heights to compromise between reduced ADC use and reduced resolution and signal-to-noise ration (SNR)].
Reisser is analogous to the claimed invention as they both relate to CIM applications. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified He and Zhou’s teachings to incorporate the teachings of Reisser and provide collecting from a plurality of IMC crossbar circuits to generate a composite variation profile for use in the training in order to [Reisser, para 0037] adjust the design of the CIM chip by trying different array heights to compromise between reduced ADC use and reduced resolution and signal-to-noise ration (SNR).

Claim(s) 24, 26, and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou in view of He and Reisser, and in further view of Chakraborty et al. (GENIEx: A Generalized Approach to Emulating Non-ideality in Memristive Xbars using Neural Networks, published 2020), hereinafter Chakraborty, and Kang et al. (US 12120331 B2), hereinafter Kang.

Regarding claim 24, Zhou-He-Reisser teach the limitations of claim 21 including the DNN (He, Sect 3, para 2) and the characterization data representing measured variations from reference values (Reisser, para 0033).

He further teaches,
wherein the training comprises: applying an adversarial input to an input of the DNN [Sect 3, para 1, The method that we propose to inject Gaussian noise to different components or locations within DNN];
incorporating data (Sect 3, para 1, ηl,i) into the partial sum signals (Sect 3, para 1, vl in l-th layer of DNN) to provide varied partial sum signals (Eq. 5,                         
                            
                                
                                    
                                        
                                            v
                                        
                                        ~
                                    
                                
                                
                                    l
                                    ,
                                    i
                                
                            
                        
                    ) [Sect 3, para 1, The method that we propose to inject Gaussian noise to different components or locations within DNN can be mathematically described as:

    PNG
    media_image2.png
    68
    697
    media_image2.png
    Greyscale

where vl,i is the element of noise-free tensor vl in l-th layer of DNN, and such vl can be input/weight/inter-layer (i.e., activation) tensor in this work. ηl,i is the noise term…αi is the coefficient that scales the magnitude of injected noise ηl.];
He is analogous to the claimed invention as they both relate to adversarial training of a neural network. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou’s teachings to incorporate the teachings of He and provide incorporating the measured variations from the idealities into the partial sum signals to provide varied partial sum signals in order to improve neural network robustness by further training the model with adversarial samples obtained from the circuit.

Zhou-He-Reisser teaches the above limitations of claim 24 including the aggregation-level outputs of the IMC crossbar circuit (claim 21: Reisser, para 0037), the full-sum output of the corresponding layer (claim 21: Reisser, paras 0028 and 0043), the adversarial input (claim 24: He, Sect 3, para 1) and varied partial sum signals (claim 24: He, Eq. 5).

Zhou-He-Reisser do not teach wherein NN includes hidden layers that are configured to be inaccessible from outside the NN, wherein signals are current signals, generating outputs as analog partial sum current signals at the hidden layers responsive to input, the partial sum currents being analog, and converting the analog partial sum current signals to digital partial sum values prior to accumulation into output.

Chakraborty further teaches,
wherein NN includes hidden layer that are configured to be inaccessible from outside the NN [Fig. 4; GENIEx considers a two layer fully-connected neural network consisting of an input layer, a hidden layer and an output layer.],
wherein signals are current signals [Sect 3, para 1, the output current in the jth BL (for ideal crossbar) is the sum of currents through each NVM device in the corresponding column:                         
                            
                                
                                    I
                                
                                
                                    j
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        i
                                    
                                
                                
                                    
                                        
                                            V
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            G
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                        
                    ], 
generating outputs as analog partial sum current signals (Sect 3, para 1, jth BL (for ideal crossbar) is the sum of currents through each NVM) at the hidden layer (Fig. 4; a hidden layer) responsive to input (Sect 4, para 4, V, G combinations as inputs) [Sect 3, para 1, the output current in the jth BL (for ideal crossbar) is the sum of currents through each NVM device in the corresponding column:                         
                            
                                
                                    I
                                
                                
                                    j
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        i
                                    
                                
                                
                                    
                                        
                                            V
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            G
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                        
                    . Thus, the currents from the M columns constitute the output vector of the MVM operation; Sect 4, para 4, GENIEx considers a two layer fully-connected neural network consisting of an input layer, a hidden layer and an output layer. For a N x N crossbar, the size of the neural network is given as: (N2 + N) x P x N, where P is the number of neurons in the hidden layer. The training set mentioned above is used to train the neural network by feeding V, G combinations as inputs and                         
                            
                                
                                    f
                                
                                
                                    R
                                
                            
                            (
                            V
                            ,
                             
                            G
                            )
                        
                     as the output; Sect 6, para 3, GENIEx has 500 hidden layer neurons and ReLU nonlinearity]; 
the partial sum currents being analog (Sect 5, para 2, we extract the analog computing aspect of crossbar hardware), and converting (Sect 5, para 2, applied… to produce) the analog partial sum current signals (Sect 5, para 2, crossbar's rows) to digital partial sum values (Sect 5, para 2, ADC outputs) prior to accumulation into output [Sect 5, para 2, we extract the analog computing aspect of crossbar hardware… A slice of input vector is shared by tiles in a row. Tiles in a column produce partial sums, which are added together to produce a slice of the convolution output… Within each step, an input stream is applied to a crossbar's rows to produce ADC outputs. Next, the shift-and-add units merge the ADC outputs of different weight slices. Eventually, the outputs of successive input streams go through shift-and-add units to produce the partial sums for a tile].
Chakraborty is analogous to the claimed invention as they both relate to deep learning accelerators. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified He and Zhou’s teachings to incorporate the teachings of Chakraborty and provide wherein neural network includes hidden layers that are configured to be inaccessible from outside the neural network, wherein signals are current signals, generating analog partial sum current signals at the hidden layers responsive to input, the partial sum currents being analog, and converting the analog partial sum current signals to digital partial sum values [Zhou, Abstract] in order to incorporate a neural network onto IMC hardware, accelerating the neural network to achieve lower power consumption and higher accuracy while abstracting complexities using hidden layers.

Zhou-He-Reisser-Chakraborty do not teach a plurality of hidden layers.

Kang teaches,
a plurality of hidden layers [Col 3, lines 52-55, The convolutional neural network for encoding based on the FLASH in-memory computing array is a multi-layer neural network, including… a plurality of hidden layers].
Kang is analogous to the claimed invention as they both relate to in-memory computing with neural networks. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou, He, and Chakraborty’s teachings to incorporate the teachings of Kang and provide a plurality of hidden layers in order to improve the output by allowing for more nuanced decision making.

Regarding claim 26, Zhou-He-Reisser-Chakraborty teach the limitations of claims 24 including the full-sum output of the corresponding layer (claim 21: Reisser, paras 0028 and 0043) and the varied analog partial sum current signals (claim 24: He, Eq. 5).

Chakraborty further teaches 
quantizing the analog partial sum current signals (Sect 5, para 2, crossbar's rows) to less than 3 bits (Sect 5, para 2, bit-slice (>= 1 bits)) for the digital partial sum values (Sect 5, para 2, ADC outputs) prior to accumulation into output [Sect 5, para 2, We will refer to a bit-slice (>= 1 bits) of inputs and weights as stream and slice, respectively. Within each step, an input stream is applied to a crossbar's rows to produce ADC outputs. Next, the shift-and-add units merge the ADC outputs of different weight slices. Eventually, the outputs of successive input streams go through shift-and-add units to produce the partial sums for a tile].
Chakraborty is analogous to the claimed invention as they both relate to deep learning accelerators. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou and He’s teachings to incorporate the teachings of Chakraborty and provide quantizing the analog partial sum current signals to less than 3 bits in order to achieve the desired accuracy.

Regarding claim 27, Zhou-He-Reisser-Chakraborty teach the limitations of claim 24 including the full-sum output of the corresponding layer (claim 21: Reisser, paras 0028 and 0043) and the varied analog partial sum current signals (claim 24: He, Eq. 5).

Chakraborty further teaches 
quantizing the analog partial sum current signals (Sect 5, para 2, crossbar's rows) to 1 bit (Sect 5, para 2, bit-slice (>= 1 bits)) for the digital partial sum values (Sect 5, para 2, ADC outputs) prior to accumulation into output [Sect 5, para 2, We will refer to a bit-slice (>= 1 bits) of inputs and weights as stream and slice, respectively. Within each step, an input stream is applied to a crossbar's rows to produce ADC outputs. Next, the shift-and-add units merge the ADC outputs of different weight slices. Eventually, the outputs of successive input streams go through shift-and-add units to produce the partial sums for a tile].
Chakraborty is analogous to the claimed invention as they both relate to deep learning accelerators. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou and He’s teachings to incorporate the teachings of Chakraborty and provide quantizing the analog partial sum current signals to 1 bit in order to achieve the desired accuracy.

Claim(s) 28 is rejected under 35 U.S.C. 103 as being unpatentable over Zhou in view of He, Reisser, Chakraborty, and Kang, and in further view of Pan.

Regarding claim 28, Zhou-He-Reisser-Chakraborty teach the limitations of claim 24 including generating the analog partial sum current signals at the hidden layers (see claim 24).

Zhou-He-Reisser-Chakraborty do not teach layers including a continually differentiable exponential linear unit activation function.

Pan teaches,
Layer including a continually differentiable exponential linear unit activation function [Para 0063, a densely connected convolution unit is used as the basic convolution processing unit, a convolution unit of size 3×3×3 is used, and then a CELU nonlinear activation function and a BatchNorm layer are used for processing].
Pan is analogous to the claimed invention as they both relate to deep learning technology. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified He, Zhou, and Chakraborty’s teachings to incorporate the teachings of Park and provide training neural network using a CELU activation function [Barron (Continuously Differentiable Exponential Linear Units, published 2017), Abstract] in order to speed up and improve deep learning architectures via an ELU activation with continuously differentiable parameterization.

Claim(s) 29 is rejected under 35 U.S.C. 103 as being unpatentable over Kang in view of Bohnstingl et al. (WO 2021220069 A2), hereinafter Bohnstingl, Kolter, Chakraborty, Reisser, and Lees et al. (SEMULATOR: Emulating the Dynamics of Crossbar
Array-based Analog Neural System with Regression
Neural Networks, published 19 Jan 2021), hereinafter Lees.

Regarding claim 29, Kang teaches,
A pre-trained (After a plurality of trainings) NN device comprising: a processor device [Col 3, line 33, Fig. 1, the processor] configured to operate a neural network that is pre-trained [Col 2, lines 45-51, After a plurality of trainings, a convolutional neural network may extract feature images from an image. The extracted feature images are processed by the convolutional neural network, and a compressed image obtained by processing the extracted feature images may reflect original image features to a maximum extent, which effectively solve problems such as blocking effects and noises] using measured variations from idealities in signals [Col 4, lines 33-39, By converting weight values in a weight matrix of the convolutional neural network into binary numbers, the FLASH cell with the memory state of “0” is used to represent “0” in the binary weight values, and the FLASH cell with the memory state of “1” is used to represent “1” in the binary weight values, so that the in-memory computing array composed of the plurality of FLASH cells may represent the weight] generated by an external in-memory computing crossbar array circuit [Col 2, lines 52-60, FLASH in-memory computing array for encoding] to provide the pre-trained [Col 3, lines 1-7, The system and method for compressing the image based on the FLASH in-memory computing array of the present disclosure may construct and train the convolutional neural network for encoding and decoding based on a CPU/GPU, and may obtain a weight distribution of the convolutional neural network] neural network [Col 2, lines 52-60, The system and method for compressing an image based on a FLASH in-memory computing array of the present disclosure may execute a large number of matrix-vector multiplication operations in the convolutional neural network in the process of image encoding… so that the image compression may be accelerated at a hardware level, while greatly reducing energy and hardware resource consumption, which is of great significance to the image compression];
a memory system operatively coupled to the processor device [Col 3, lines 34-38, The control module is connected to the signal generation module, the convolutional neural network for encoding based on the FLASH in-memory computing array, the convolutional neural network for decoding based on the FLASH in-memory computing array, and the processor; Fig. 1];
 
    PNG
    media_image3.png
    259
    729
    media_image3.png
    Greyscale

and an in-memory computing memory (Col 5, lines 15-16, the FLASH in-memory computing array for decoding)  operatively coupled to the processor device, the in-memory computing memory including an internal in-memory computing crossbar array circuit [Col 5, lines 15-16, A structure of the convolutional neural network for decoding based on the FLASH in-memory computing array… Col 5, lines 48-56, The quantized image is decoded by the convolutional neural network for decoding based on the FLASH in-memory computing array, so as to obtain a compressed image. The hardware implementation of this embodiment stores the weight values in the FLASH in-memory computing array, and uses the in-memory computing array for computing, eliminating random access to the weight values in the computing process, thereby achieving computing in memory].

Kang does not teach wherein the NN is a DNN, memory system storing instructions configured to provide operation of the pre-trained deep neural network, an internal in-memory computing crossbar array circuit configured to: generate analog partial sum signals representing multiply-and-accumulate results of the input data and weights stored in the crossbar array circuit, the analog partial sum signals having variations from idealities inherent to the crossbar array circuit; quantize the analog partial sum signals to less than five bits using quantizing circuits having predetermined threshold values; apply a continually differentiable exponential linear unit activation function to the quantized partial sum signals; and provide an output of the deep neural network based at least in part on the activated partial sum signals.

Bohnstingl teaches,
wherein the neural network is a deep neural network [Para 0003, Neural networks include… deep neural networks], 
memory system storing instructions configured to provide operation of the pre-trained deep neural network [Para 0054, a memristive crossbar structure 30 (with PCM cells) is used together with optimized read/write heads (24/22) to achieve an external memory for the controller 10 and its processing unit. The controller is aimed at executing a neural network, be it to train the latter or perform inferences based on the trained network. Such a neural network can thus be augmented with memory built on memristive devices 33].
Bohnstingl is analogous to the claimed invention as they both relate to in-memory computing using neural networks. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kang’s teachings to incorporate the teachings of Bohnstingl and provide the memory system storing instructions configured to provide operation of the pre-trained deep neural network in order to incorporate a deep neural network onto IMC hardware, accelerating the deep neural network to achieve lower power consumption and higher accuracy.

Kang-Bohnstingl do not teach an internal in-memory computing crossbar array circuit configured to: generate analog partial sum signals representing multiply-and-accumulate results of the input data and weights stored in the crossbar array circuit, the analog partial sum signals having variations from idealities inherent to the crossbar array circuit; quantize the analog partial sum signals to less than five bits using quantizing circuits having predetermined threshold values; apply a continually differentiable exponential linear unit activation function to the quantized partial sum signals; and provide an output of the deep neural network based at least in part on the activated partial sum signals.

Kolter teaches,
an internal in-memory computing crossbar array circuit configured to: generate analog partial sum signals (Para 0167, current) representing multiply-and-accumulate results (Para 0167, written as a summation of all currents through the weight elements connected to it) of the input data (Para 0166, the data) and weights (Para 0167, weight elements) stored in the crossbar array circuit [Para 0166, the weights of the neural network are stationary and stored where the calculation occurs and therefore the data movement can be reduced greatly; Para 0167, An analog circuit configuration called a crossbar network can be used for the purpose of matrix multiply and add operations. Such network (e.g., illustrated in FIG. 36) applies the integer neuron activation values, X.sub.i, via digital-to-analog converters (DACs) through access rows (word lines). These word lines deploy analog voltages X.sub.i.Math.V.sub.ref,DAC across the word lines, where V.sub.ref,DAC is the reference voltage of the DAC. Along each word line, multiple weight elements are placed at crossings with columns (bit lines). These weight elements are implemented by means of impedances (conductances), where each element is an integer W.sub.ij multiple of a unit conductance G, resulting in a conductance of G.Math.W.sub.ij. Each bit line crosses multiple word lines with corresponding weights at their crossings and therefore implements a summation node to add the currents. For the j.sup.th bit line, this current can be written as a summation of all currents through the weight elements connected to it], 

Kang-Bohnstingl-Kolter do not teach the analog partial sum signals having variations from idealities inherent to the crossbar array circuit; quantize the analog partial sum signals to less than five bits using quantizing circuits having predetermined threshold values; apply a continually differentiable exponential linear unit activation function to the quantized partial sum signals; and provide an output of the deep neural network based at least in part on the activated partial sum signals.

Chakraborty teaches,
quantize the analog partial sum signals to less than five bits using quantizing circuits having predetermined threshold values [Sect 5, para 2, We will refer to a bit-slice (>= 1 bits) of inputs and weights as stream and slice, respectively. Within each step, an input stream is applied to a crossbar's rows to produce ADC outputs. Next, the shift-and-add units merge the ADC outputs of different weight slices]; 
the analog partial sum signals having variations from idealities inherent to the crossbar array circuit [Sect 4, para 2, the output current vector from a real crossbar is non-ideal and can be expressed as a distorted MVM function: Inon-ideal = fD(V, G). Therefore, it represents multiplicative behavior between the input variables, V and G. The objective here is to model such non-ideality function fD(V, G) being input-dependent and having multiplicative behavior].
Chakraborty is analogous to the claimed invention as they both relate to deep learning accelerators. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou and He’s teachings to incorporate the teachings of Chakraborty and provide quantizing the analog partial sum current signals to less than 5 bits in order to achieve the desired accuracy.

Kang-Bohnstingl-Kolter-Chakraborty do not teach apply a continually differentiable exponential linear unit activation function to the quantized partial sum signals; and provide an output of the deep neural network based at least in part on the activated partial sum signals.

Reisser teaches,
apply a activation function to the quantized partial sum signals [Para 0030, The outputs for the CIM cells 314(j)(1)-314(j)(r) of a column j—in the form of corresponding capacitances—are summed by the corresponding bitline 312(j) and provided as an input indicative of population count to a corresponding ADC 308(j) in the ADC module 304… The outputs of the ADCs 308 are provided to digital processing module 313 for further processing, where the further processing may include… applying non-linearities]; and 
provide an output of the deep neural network based at least in part on the activated partial sum signals [Para 0033, For each word-line activated… the bit-line voltage corresponding to a pop-count from 0 to N is read out].
Reisser is analogous to the claimed invention as they both relate to CIM applications. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kang’s teachings to incorporate the teachings of Reisser and provide activation functions in order to allow the model to learn complex patterns.

Kang-Bohnstingl-Kolter-Chakraborty-Reisser do not teach wherein the activation function is a continually differentiable exponential linear unit activation function.

Lees teaches,
wherein the activation function is a continually differentiable exponential linear unit activation function [Sect 3.2, para 4, the filters help neural networks not to suffer from the curse of dimension because the filters are shared for all tiles of input feature maps or activation maps; See Appendix A, neural network architecture].
Lees is analogous to the claimed invention as they both relate to deep neural networks. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kang’s teachings to incorporate the teachings of Lees and provide a continually differentiable exponential linear unit activation function in order to [Lees, sect 3.2, para 4] help neural networks with issues of dimension.

Claim(s) 30-33 are rejected under 35 U.S.C. 103 as being unpatentable over Kang in view of Bohnstingl, Kolter, Chakraborty, Reisser, and 
Lees, and in further view of Dazzi et al. (Efficient Pipelined Execution of CNNs Based
on In-Memory Computing and Graph
Homomorphism Verification, published June 2021), hereinafter Dazzi.

Regarding claim 30, Kang-Bohnstingl-Kolter-Chakraborty-Reisser teach the limitations of claim 29.

Kang further teaches,
	wherein the external in-memory computing crossbar array circuit from which the pre-trained model was derived and the internal in-memory computing crossbar array circuit include about equal numbers of rows and columns of storage cells [Claim 1, wherein each layer in the convolutional neural network for encoding and each layer in the convolutional neural network for decoding comprises: an in-memory computing array based on FLASH, wherein the in-memory computing array based on FLASH comprises: a plurality of FLASH cells, a plurality of word lines, a plurality of source lines, a plurality of bit lines… wherein the in-memory computing array is composed of the plurality of FLASH cells… FLASH cells in each column are connected to the same word line, source electrodes of the FLASH cells in each column are connected to the same source line, drain electrodes of the FLASH cells in each row are connected to the same bit line, and a positive terminal and a negative terminal of each subtractor are respectively connected to two adjacent bit lines.].

Kang does not teach circuit using inference.

Dazzi teaches,
	circuit using inference [Abstract, In-memory computing is an emerging computing paradigm enabling deep-learning inference at significantly higher energy efficiency and reduced latency. The essential idea is mapping the synaptic weights of each layer to one or more in-memory computing (IMC) cores. During inference, these cores perform the associated matrix-vector multiplications in place with O(1) time complexity, obviating the need to move the synaptic weights to additional processing units; Sect 10, para 1, In this section we present one hardware implementation of
the 5PP topology. This hardware implementation includes the send and receive units for the IMC cores as well as the inter-core routing, and has been designed for an 8-by-8
array of IMC cores... The design of transmitter (TX) and receiver (RX) is completely carried out with Verilog-based digital circuit synthesis, and results in the physical TX and RX being inverters properly sized to drive the channels].
Dazzi is analogous to the claimed invention as they both relate to deep learning and neural network acceleration. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kang’s teachings to incorporate the teachings of Dazzi and provide a circuit using inference in order to enable high-speed predictions of unseen patterns.

Regarding claim 31, Kang-Bohnstingl-Kolter-Chakraborty-Reisser teach the limitations of claim 30, including the internal in-memory computing crossbar array circuit (claim 29: Kang, Col 5, lines 15-16) and using inference (claim 30: Dazzi, Abstract, Sect 10).

Kang does not teach quantizing circuits coupled to outputs of the rows and columns of the storage cells, the quantizing circuits configured to quantize analog partial sum current signals from the outputs to less than 5 bits.

Chakraborty teaches,
Quantizing (Sect 5, para 2, ADC) circuits (Sect 5, para 2, crossbar hardware) coupled to outputs of the rows (Sect 5, para 2, a crossbar's rows) and columns of the storage cells (Sect 5, para 2, slices), the quantizing circuits configured to quantize analog partial sum current signals (Sect 5, para 2, ADC outputs) from the outputs to less than 5 bits (Sect 5, para 2, (>= 1 bits)) [Sect 5, para 2, we extract the analog computing aspect of crossbar hardware… A slice of input vector is shared by tiles in a row. Tiles in a column produce partial sums, which are added together to produce a slice of the convolution output… We will refer to a bit-slice (>= 1 bits) of inputs and weights as stream and slice, respectively. Within each step, an input stream is applied to a crossbar's rows to produce ADC outputs. Next, the shift-and-add units merge the ADC outputs of different weight slices. Eventually, the outputs of successive input streams go through shift-and-add units to produce the partial sums for a tile].
Chakraborty is analogous to the claimed invention as they both relate to deep learning accelerators. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou and He’s teachings to incorporate the teachings of Chakraborty and provide quantizing circuits configured to quantize analog partial sum current signals from the outputs to less than 5 bits in order to achieve the desired accuracy.

Regarding claim 32, Kang-Bohnstingl-Kolter-Chakraborty-Reisser teach the limitations of claim 30, including the internal in-memory computing crossbar array circuit (claim 29: Kang, Col 5, lines 15-16) and using inference (claim 30: Dazzi, Abstract, Sect 10).

Kang does not teach quantizing circuits coupled to outputs of the rows and columns of the storage cells, the quantizing circuits configured to quantize analog partial sum current signals from the outputs to less than 3 bits.

Chakraborty teaches,
Quantizing (Sect 5, para 2, ADC) circuits (Sect 5, para 2, crossbar hardware) coupled to outputs of the rows (Sect 5, para 2, a crossbar's rows) and columns of the storage cells (Sect 5, para 2, slices), the quantizing circuits configured to quantize analog partial sum current signals (Sect 5, para 2, ADC outputs) from the outputs to less than 3 bits (Sect 5, para 2, (>= 1 bits)) [Sect 5, para 2, we extract the analog computing aspect of crossbar hardware… A slice of input vector is shared by tiles in a row. Tiles in a column produce partial sums, which are added together to produce a slice of the convolution output… We will refer to a bit-slice (>= 1 bits) of inputs and weights as stream and slice, respectively. Within each step, an input stream is applied to a crossbar's rows to produce ADC outputs. Next, the shift-and-add units merge the ADC outputs of different weight slices. Eventually, the outputs of successive input streams go through shift-and-add units to produce the partial sums for a tile].
Chakraborty is analogous to the claimed invention as they both relate to deep learning accelerators. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou and He’s teachings to incorporate the teachings of Chakraborty and provide quantizing circuits configured to quantize analog partial sum current signals from the outputs to less than 3 bits in order to achieve the desired accuracy.

Regarding claim 33, Kang-Bohnstingl-Kolter-Chakraborty-Reisser teach the limitations of claim 30, including the internal in-memory computing crossbar array circuit (claim 29: Kang, Col 5, lines 15-16) and using inference (claim 30: Dazzi, Abstract, Sect 10).

Kang does not teach quantizing circuits coupled to outputs of the rows and columns of the storage cells, the quantizing circuits configured to quantize analog partial sum current signals from the outputs to 1 bit.

Chakraborty teaches,
Quantizing (Sect 5, para 2, ADC) circuits (Sect 5, para 2, crossbar hardware) coupled to outputs of the rows (Sect 5, para 2, a crossbar's rows) and columns of the storage cells (Sect 5, para 2, slices), the quantizing circuits configured to quantize analog partial sum current signals (Sect 5, para 2, ADC outputs) from the outputs to 1 bit (Sect 5, para 2, (>= 1 bits)) [Sect 5, para 2, we extract the analog computing aspect of crossbar hardware… A slice of input vector is shared by tiles in a row. Tiles in a column produce partial sums, which are added together to produce a slice of the convolution output… We will refer to a bit-slice (>= 1 bits) of inputs and weights as stream and slice, respectively. Within each step, an input stream is applied to a crossbar's rows to produce ADC outputs. Next, the shift-and-add units merge the ADC outputs of different weight slices. Eventually, the outputs of successive input streams go through shift-and-add units to produce the partial sums for a tile].
Chakraborty is analogous to the claimed invention as they both relate to deep learning accelerators. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou and He’s teachings to incorporate the teachings of Chakraborty and provide quantizing circuits configured to quantize analog partial sum current signals from the outputs to 1 bit in order to achieve the desired accuracy.

Claim(s) 34 is rejected under 35 U.S.C. 103 as being unpatentable over He in view of Zhou and Reisser, and in further view of Kang and Chen et al. (US 20240211536 A1), hereinafter Chen.

Regarding claim 34, He-Zhou-Reisser teach the limitations of claim 1 including the variations in analog partial sum signals measured from in-memory computing hardware device (Reisser, paras 0022, 0032, 0033, 0036 and 0037).

He-Zhou-Reisser do not teach a prototype in-memory computing hardware device having an equal number of rows and columns as the crossbar array of in-memory computing hardware.

Kang further teaches,
in-memory computing hardware device having an equal number of rows and columns as the crossbar array of in-memory computing hardware [Claim 1, wherein each layer in the convolutional neural network for encoding and each layer in the convolutional neural network for decoding comprises: an in-memory computing array based on FLASH, wherein the in-memory computing array based on FLASH comprises: a plurality of FLASH cells, a plurality of word lines, a plurality of source lines, a plurality of bit lines… wherein the in-memory computing array is composed of the plurality of FLASH cells… FLASH cells in each column are connected to the same word line, source electrodes of the FLASH cells in each column are connected to the same source line, drain electrodes of the FLASH cells in each row are connected to the same bit line, and a positive terminal and a negative terminal of each subtractor are respectively connected to two adjacent bit lines].
Kang is analogous to the claimed invention as they both relate to in-memory computing with neural networks. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified He’s teachings to incorporate the teachings of Kang and provide an equal number of rows and columns in order to maintain consistent mapping between the device and the crossbar.

He-Zhou-Reisser do not teach an in-memory computing hardware being a prototype.

Chen teaches,
an in-memory computing hardware being a prototype [Para 0012, To validate the compute in-memory architecture, a 16 Kb SRAM-IMC prototype performs MVM operation on 128-elements 5 b signed input vector and 128×64 ternary weight matrix to generate 64 5 b signed outputs in one computation cycle].
Chen is analogous to the claimed invention as they both relate to compute-in memory architecture. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified He’s teachings to incorporate the teachings of Chen and provide an in-memory computing hardware being a prototype in order to improve output results by having a flexible design to identify flaws early.

Claim(s) 37 is rejected under 35 U.S.C. 103 as being unpatentable over Zhou in view of He and Reisser, and in further view of  Chakraborty and Rakin et al. (Bit-Flip Attack: Crushing Neural Network with Progressive Bit Search, published 2019), hereinafter Rakin.

Regarding claim 37, Zhou-He-Reisser teach the limitations of claim 21 including reducing a numerical resolution of the modified intermediate computation results, quantizing modified intermediate computation results while the measured variations from reference values are applied (Reisser, Paras 0028 and 0037), the parameters stored in the IMC crossbar circuit (Zhou, Sect 3, para 1), and the DNN (He, Sect 3, para 2).

Zhou-He-Reisser do not teach quantizing to a single-bit resolution such that an increased number of bit-flip modifications is required to reduce classification accuracy of NN to a random-guess accuracy level.

Chakraborty teaches,
quantizing to a single-bit resolution [Sect 5, para 2, we extract the analog computing aspect of crossbar hardware… A slice of input vector is shared by tiles in a row. Tiles in a column produce partial sums, which are added together to produce a slice of the convolution output… We will refer to a bit-slice (>= 1 bits) of inputs and weights as stream and slice, respectively. Within each step, an input stream is applied to a crossbar's rows to produce ADC outputs. Next, the shift-and-add units merge the ADC outputs of different weight slices. Eventually, the outputs of successive input streams go through shift-and-add units to produce the partial sums for a tile].
Chakraborty is analogous to the claimed invention as they both relate to deep learning accelerators. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou’s teachings to incorporate the teachings of Chakraborty and provide quantizing to 1 bit in order to achieve the desired accuracy.

Zhou-He-Reisser-Chakraborty do not teach quantizing such that an increased number of bit-flip modifications is required to reduce classification accuracy of NN to a random-guess accuracy level.

Rakin teaches,
quantizing such that an increased number of bit-flip modifications is required to reduce classification accuracy of NN to a random-guess accuracy level [Abstract, Our proposed BFA utilizes a Progressive Bit Search (PBS) method which combines gradient ranking and progressive search to identify the most vulnerable bit to be flipped. With the aid of PBS, we can successfully attack a ResNet-18 fully malfunction (i.e., top-1 accuracy degrade from 69.8% to 0.1%) only through 13 bit-flips out of 93 million bits, while randomly flipping 100 bits merely degrades the accuracy by less than 1%; Sect 1, para 5, Our proposed BFA utilizes a Progressive Bit Search (PBS) method which combines gradient ranking and progressive search to identify the most vulnerable bit to be flipped. With the aid of PBS, we can successfully attack a ResNet-18 fully malfunction (i.e., top-1 accuracy degrade from 69.8% to 0.1%) only through 13 bit-flips out of 93 million bits, while randomly flipping 100 bits merely degrades the accuracy by less than 1%].
Rakin is analogous to the claimed invention as they both relate to deep neural networks. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou’s teachings to incorporate the teachings of Rakin and provide quantizing such that an increased number of bit-flip modifications is required to reduce classification accuracy of NN to a random-guess accuracy level [Rakin, Abstract] in order to improve model accuracy.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SYED RAYHAN AHMED whose telephone number is (571)270-0286. The examiner can normally be reached Mon-Fri ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SYED RAYHAN AHMED/           Examiner, Art Unit 2126                                                                                                                                                                                             

/VAN C MANG/           Primary Examiner, Art Unit 2126
Read full office action
METHODS OF TRAINING DEEP NEURAL NETWORKS (DNN) USING SIGNAL NON-IDEALITIES AND QUANTIZATION ASSOCIATED WITH IN-MEMORY OPERATIONS AND RELATED DEVICES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHODS OF TRAINING DEEP NEURAL NETWORKS (DNN) USING SIGNAL NON-IDEALITIES AND QUANTIZATION ASSOCIATED WITH IN-MEMORY OPERATIONS AND RELATED DEVICES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email