Last updated: May 29, 2026
Application No. 16/428,760
TRAINING A NEURAL NETWORK USING SELECTIVE WEIGHT UPDATES

Non-Final OA §103
Filed
May 31, 2019
Examiner
VAUGHN, RYAN C
Art Unit
2125
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
7 (Non-Final)
Interview Optional

— +19.8% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 62% grant rate with +19.8% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 241 resolved cases, 2023–2026
Examiner Intelligence

VAUGHN, RYAN C View full profile →
Grants 62% of resolved cases
Career Allowance Rate
149 granted / 241 resolved
+6.8% vs TC avg
Strong +20% interview lift
Without
With
+19.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
29 currently pending
Career history
283
Total Applications
across all art units
Statute-Specific Performance

§101
18.1%
-21.9% vs TC avg
§103
60.2%
+20.2% vs TC avg
§102
2.7%
-37.3% vs TC avg
§112
11.9%
-28.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 241 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-48 are presented for examination.

Continued Examination under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on February 20, 2026 has been entered.

Response to Amendment
Applicant’s amendments have obviated most, but not all, of the claim objections.  To the extent that an objection or rejection appears in the previous Office Action(s) but not this Office Action, that objection or rejection is withdrawn.  To the extent that it appears both in a previous Office Action(s) and this Office Action, the objection or rejection is maintained.

Claim Objections
Examiner objects to claims 1-22 and 30-48.
Claims 1, 8, 15, 30, and 37 are objected to because of the following informalities: “based, at least in part on, metadata” should be “based, at least in part, on metadata”.
The dependent claims are objected to for dependency on an objected-to base claim.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
Claims 1-2, 4, 8-9, 12-16, 18, 20-24, 30-31, 34, 36-38, 41-43, and 45-46 are rejected under 35 U.S.C. 103 as being unpatentable over Ma et al. (US 20180075339) (“Ma”) in view of Davies et al. (US 20180174040) (“Davies”) and further in view of Kwant et al. (US 20190188538) (“Kwant”).
Regarding claim 1, Ma discloses “[o]ne or more processors, comprising: circuitry (invention can be implemented as a processor suitable for executing instructions stored on and/or provided by a memory coupled to the processor – Ma, paragraph 23) to:
determine a selected plurality of weights for training one or more neural networks based, at least in part[,] on[] … training steps skipped for two or more different weights in one or more previous training sessions of the one or more neural networks (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used at every timestep and the weights are selected for update based on which neurons fire, which is at least partially dependent on which axons/neurons have fired in previous timesteps, i.e., which firing steps were skipped in previous training timesteps], and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates), Fig. 11 (disclosing that every weight matrix row having all zero lookup table values is skipped, i.e., if two or more weights correspond to rows in the matrix for which the LUT has all zero values, two or more weights will be skipped)); …
update the selected weight using the … weight update; and cause the neural networks to be trained using the updated selected weight (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep, and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates)).”
Ma appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses “two or more different numeric counts of training steps (subthreshold dynamics of leaky integrate-and-fire neuron model are described by a discrete-time dimensionless difference equation that includes a term si[t] corresponding to the count of spikes received [training steps] for time step t at synapse i [i.e., given multiple synapses and/or multiple timesteps, there will be multiple counts] – Davies, paragraphs 33-40) …; [and]
calculat[ing], using the one or more processors, an aggregated weight update for the selected weight by combining metadata corresponding to the selected weight and stored update information (a weight may be added to an appropriate weight accumulation counter [metadata corresponding to the selected weight] for an appropriate future time step; based on an aggregated weight input, a soma updates its activation state according to a spiking neuron model – Davies, paragraph 22; a learning cycle may be performed in response to a predefined passage of time steps, wherein the predefined passage of time steps represents a learning epoch – id. at paragraph 148 [stored update information = number of timesteps elapsed]; see also Fig. 10 [showing that a learning cycle is initiated after a learning epoch counter is expired and that the learning is based on all spikes that have arrived during that epoch]) ….”
Davies and the instant application both relate to machine learning and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to aggregate successive updates to the network, as disclosed by Davies, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase performance and energy efficiency by ensuring that learning only takes place only when there is a demand.  See Davies, paragraph 18.
Neither Ma nor Davies appears to disclose explicitly the further limitations of the claim.  However, Kwant discloses “determining [updates] … based, at least in part[,] on[] metadata comprising an indication of a number of skipped training steps (skip areas can be defined which indicate which areas of an image are not to be labeled or otherwise processed by the machine learning framework [skip areas = data comprising an indication of a number of skipped training steps]; specifying skip areas provides more accurate training data [i.e., updates are determined based on the data] –Kwant,  paragraph 37; image skip areas can be stored as metadata in a training database in association with the respective images of a machine learning training dataset – id. at paragraph 89) ….”
Kwant and the instant application both relate to machine learning using skipped training steps and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Davies to determine updates based on metadata indicating a number of skipped training steps, as disclosed by Kwant, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase the accuracy of the resulting model by ensuring that only training steps that are relevant to the ultimate outcome are considered.  See Kwant, paragraph 37.

Regarding claim 2, Ma/Davies/Kwant discloses that “the selected weight is updated as a result of determining that the selected weight is to be used in a current step of training of the one or more neural networks (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep, and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates)).”  

Regarding claim 4, Ma discloses that “the … weight update is calculated further based on a number of training steps performed between updating the selected weight (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the number of training steps that have elapsed since the last update/use of the weights is a factor to be taken into consideration when updating], and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates)).”  
Ma/Kwant appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses an “aggregated weight update (see mapping to this element in rejection of claim 1 supra) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Kwant to aggregate the weight updates over successive time steps, as disclosed by Davies, for the same reasons as given in the rejection of claim 1.

Regarding claim 8, Ma discloses “[a] system, comprising: 
one or more memories (invention can be embodied as a processor suitable for executing instructions and a memory coupled to the processor – Ma, paragraph 23) to store information to:
determine a selected plurality of weights for training one or more neural networks based, at least in part[,] on[] … training steps skipped for two or more different weights in one or more previous training sessions of the one or more neural networks (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used at every timestep and the weights are selected for update based on which neurons fire, which is at least partially dependent on which axons/neurons have fired in previous timesteps, i.e., which firing steps were skipped in previous training timesteps], and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates), Fig. 11 (disclosing that every weight matrix row having all zero lookup table values is skipped, i.e., if two or more weights correspond to rows in the matrix for which the LUT has all zero values, two or more weights will be skipped)); …
update the selected weight using the … weight update; and cause the neural networks to be trained using the updated selected weight (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep, and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates)).”
Ma appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses “two or more different numeric counts of training steps (subthreshold dynamics of leaky integrate-and-fire neuron model are described by a discrete-time dimensionless difference equation that includes a term si[t] corresponding to the count of spikes received [training steps] for time step t at synapse i [i.e., given multiple synapses and/or multiple timesteps, there will be multiple counts] – Davies, paragraphs 33-40) …; [and]
calculat[ing] an aggregated weight update for the selected weight by combining metadata corresponding to the selected weight and stored update information that indicates one or more changes applied to the selected weight in one or more training steps of the one or more previous training sessions (a weight may be added to an appropriate weight accumulation counter for an appropriate time step; based on an aggregated weight input, a soma updates its activation state according to a spiking neuron model such as the Leaking Integrate and Fire (LIF) model – Davies, paragraph 22 [weight added to accumulation counter in current timestep = metadata corresponding to selected weight; weights added to accumulation counter in previous timesteps = stored update information indicating changes to the weight in previous training steps]; a learning cycle may be performed in response to a predefined passage of time steps, wherein the predefined passage of time steps represents a learning epoch – id. at paragraph 148; see also Fig. 10 [showing that a learning cycle is initiated after a learning epoch counter is expired and that the learning is based on all spikes that have arrived during that epoch], paragraphs 33-40 (disclosing that the LIF model is governed by difference equations comprising a sum of one term representing the output of the previous timestep and another term representing the sum of the weight updates)) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to aggregate successive updates to the network, as disclosed by Davies, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase performance and energy efficiency by ensuring that learning only takes place only when there is a demand.  See Davies, paragraph 18.
Neither Ma nor Davies appears to disclose explicitly the further limitations of the claim.  However, Kwant discloses “determin[ing updates] … based, at least in part[,] on[] metadata comprising an indication of … training steps skipped (skip areas can be defined which indicate which areas of an image are not to be labeled or otherwise processed by the machine learning framework [skip areas = data comprising an indication of a number of skipped training steps]; specifying skip areas provides more accurate training data [i.e., updates are determined based on the data] –Kwant,  paragraph 37; image skip areas can be stored as metadata in a training database in association with the respective images of a machine learning training dataset – id. at paragraph 89) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Davies to determine updates based on metadata indicating a number of skipped training steps, as disclosed by Kwant, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase the accuracy of the resulting model by ensuring that only training steps that are relevant to the ultimate outcome are considered.  See Kwant, paragraph 37.

Regarding claim 9, Ma/Davies/Kwant discloses that “the one or more memories include instructions that, if executed, cause the system to: 
forward propagate the updated selected weight resulting from the … weight update through the one or more neural networks to generate one or more outputs (all types of ANN need to be trained before performing inference and classification functions; typically, there are two distinct modes of ANN operations, feed-forward mode for inferences and classifications, and backpropagation for training or learning using the labeled training datasets – Ma, paragraph 38; neural network [containing the weights] can be trained on a labeled dataset [to generate an output], and if error occurs, the error data feedback for retraining can be iterated many times until the errors converge to a minimum – id. at paragraph 40); 
back-propagate the one or more outputs to update the one or more neural networks (all types of ANN need to be trained before performing inference and classification functions; typically, there are two distinct modes of ANN operations, feed-forward mode for inferences and classifications, and backpropagation for training or learning [updating] using the labeled training datasets – Ma, paragraph 38); and 
update a different neural network weight from the selected weight (the numbers of axons or neurons firing at a given timestep are relatively sparse, so only the rows of the weight matrices having axons or neurons firing may need to be updated at each timestep [since the axons and neurons firing differ at each timestep, it follows that in general different rows will be updated at each timestep] – Ma, paragraph 128).”   
Ma/Kwant appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses an “aggregated weight update (see mapping to this element in rejection of claim 1 supra) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Kwant to aggregate the weight updates over successive time steps, as disclosed by Davies, for the same reasons as given in the rejection of claim 1.

Regarding claim 12, Ma/Davies/Kwant discloses that “the information is updated after one or more epochs of training of the one or more neural networks (when one of the axons or neurons fires, the corresponding timestamp registers can be written with a value B and decremented until the value B reaches 0 – Ma, paragraphs 91-92; LTP/LTD curves are then used to determine synaptic weight updates based on a comparison between the two timestamp registers, which only occurs when an axon or neuron fires and neither timestamp is zero – id. at paragraphs 93-94 [i.e., the timestamp metadata are updated to the value B after a firing event occurring subsequent to a weight update/training epoch]).” 

Regarding claim 13, Ma/Davies/Kwant discloses that “the information indicates how many epochs1 of training have been skipped for respective weights of the plurality of weights (when one of the axons or neurons fires, corresponding timestamp registers Tpre or Tpost can be written with a value B and decremented at each timestep until the value B reaches 0 – Ma, paragraphs 91-92; a compare operation between these two registers can be triggered only when Tpre= B and/or Tpost = B and when neither Tpost nor Tpre = 0; the comparison triggers synaptic weight updates – id. at paragraphs 93-94 [so if the current value in the register is x, the number of timesteps since firing, that is, the number of timesteps since the last weight update [number of steps skipped], is B – x]).”  

Regarding claim 14, Ma/Davies discloses that the system “further compris[es] a vehicle (once trained, weights and parameters of a neural network can be transferred to application devices for deployment, such as self-driving cars or autonomous drones – Ma, paragraph 40).”

Regarding claim 15, Ma discloses “[a] method, comprising: 
determining a selected plurality of weights for training one or more neural networks based, at least in part[,] on[] … training steps skipped for two or more different weights in one or more previous training sessions of the one or more neural networks (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used at every timestep and the weights are selected for update based on which neurons fire, which is at least partially dependent on which axons/neurons have fired in previous timesteps, i.e., which firing steps were skipped in previous training timesteps], and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates), Fig. 11 (disclosing that every weight matrix row having all zero lookup table values is skipped, i.e., if two or more weights correspond to rows in the matrix for which the LUT has all zero values, two or more weights will be skipped)); …
updating the selected weight using the … weight update; and training the one or more neural networks using the updated selected weight (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep, and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates)).”
Ma appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses “two or more different numeric counts of training steps (subthreshold dynamics of leaky integrate-and-fire neuron model are described by a discrete-time dimensionless difference equation that includes a term si[t] corresponding to the count of spikes received [training steps] for time step t at synapse i [i.e., given multiple synapses and/or multiple timesteps, there will be multiple counts] – Davies, paragraphs 33-40) …; [and]
calculating an aggregated weight update for the selected weight by combining metadata corresponding to the selected weight and stored update information that indicates one or more changes applied to the selected weight in one or more training steps of the one or more previous training sessions (a weight may be added to an appropriate weight accumulation counter for an appropriate time step; based on an aggregated weight input, a soma updates its activation state according to a spiking neuron model such as the Leaking Integrate and Fire (LIF) model – Davies, paragraph 22 [weight added to accumulation counter in current timestep = metadata corresponding to selected weight; weights added to accumulation counter in previous timesteps = stored update information indicating changes to the weight in previous training steps]; a learning cycle may be performed in response to a predefined passage of time steps, wherein the predefined passage of time steps represents a learning epoch – id. at paragraph 148; see also Fig. 10 [showing that a learning cycle is initiated after a learning epoch counter is expired and that the learning is based on all spikes that have arrived during that epoch], paragraphs 33-40 (disclosing that the LIF model is governed by difference equations comprising a sum of one term representing the output of the previous timestep and another term representing the sum of the weight updates)) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to aggregate successive updates to the network, as disclosed by Davies, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase performance and energy efficiency by ensuring that learning only takes place only when there is a demand.  See Davies, paragraph 18.
Neither Ma nor Davies appears to disclose explicitly the further limitations of the claim.  However, Kwant discloses “determining [updates] … based, at least in part[,] on[] metadata comprising an indication of … training steps skipped (skip areas can be defined which indicate which areas of an image are not to be labeled or otherwise processed by the machine learning framework [skip areas = data comprising an indication of a number of skipped training steps]; specifying skip areas provides more accurate training data [i.e., updates are determined based on the data] –Kwant,  paragraph 37; image skip areas can be stored as metadata in a training database in association with the respective images of a machine learning training dataset – id. at paragraph 89) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Davies to determine updates based on metadata indicating a number of skipped training steps, as disclosed by Kwant, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase the accuracy of the resulting model by ensuring that only training steps that are relevant to the ultimate outcome are considered.  See Kwant, paragraph 37.

Regarding claim 16, Ma/Davies/Kwant discloses that “the updated selected weight is to be used in a step of training of the one or more neural networks (all types of artificial neural network need to be trained [i.e., perform weight updates] before performing inference or classification functions; supervised learning can generate the best predictors (set of weights) [i.e., the weight information is used in training] – Ma, paragraph 38).”

Regarding claim 18, Ma/Davies/Kwant discloses “storing information indicating the … weight update for respective weights based on the … training steps skipped (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used at every timestep], and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates)).”
Ma/Kwant appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses an “aggregated … update” and “two or more different numeric counts of training steps” (see mapping to these elements in rejection of claim 15 supra) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Kwant to aggregate the weight updates over successive time steps, as disclosed by Davies, for the same reasons as given in the rejection of claim 15.

Regarding claim 20, Ma/Davies/Kwant discloses that “a first portion of the selected weight corresponding to a first weight entry is updated as part of a first step of training and a different portion of the selected weight corresponding to a second weight entry is updated as part of a second step of training (values of a lookup table are looked up in accordance with calculating results of Tpost-Tpre, where Tpre is the timestamp of the selected axon and Tpost is the timestamp of each of the neurons; when any of the lookup values is non-zero, the corresponding weight matrix row is updated; when all the lookup values are zero, the system skips to the next matrix row – Ma, paragraph 140 [first step of training = step in which one or more of the lookup values is nonzero and the row [portion of selected weight corresponding to a first weight entry] is updated; second step of training = step in which one or more of the lookup values is nonzero and another row [portion of selected weight corresponding to a second weight entry] is updated]; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used at every timestep], and in a worst case scenario every row will need to be updated in every 1 ms timestep).”

Regarding claim 21, Ma/Davies/Kwant discloses that “the different portion of the selected weight partially overlaps with the first portion of the selected weight (in a worst case scenario, if every row of the weight matrix needs to be updated in every timestep, an STDP row update read-modify-write finite state machine may take 153.6 microseconds which is approximately 15% of the 1 ms timestep; however, the numbers of axons or neurons firing are relatively sparse, only the rows having axons or neurons firing need to be updated at each timestep [since the extreme cases of full weight updates at every timestep and highly sparse weight updates at every timestep are both contemplated, it follows that the median case, in which some weights are updated in two consecutive timesteps and others are not, is also contemplated] – Ma, paragraphs 127-28).”  

Regarding claim 22, Ma/Davies/Kwant discloses “computing based, at least in part, on the information and an accumulated update of two or more skipped steps of training to update the selected weight (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]; paragraph 105 indicates that the five-step weight update procedure is repeated until all axon timestamps are compared and all weight updated are completed [so the training procedure contains five steps per axon timestamp]; paragraph 128 indicates that only rows of the weight matrix having axons or neurons firing need to be updated at each timestep [i.e., some steps of training are skipped]).”

Regarding claim 23, Ma discloses “[o]ne or more processors (invention can be implemented as a processor suitable for executing instructions stored on and/or provided by a memory coupled to the processor – Ma, paragraph 23), comprising circuitry to:
determine a selected plurality of weights for training one or more neural networks based, at least in part[,] on[] … training steps skipped for two or more different weights in one or more previous training sessions of the one or more neural networks (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used at every timestep and the weights are selected for update based on which neurons fire, which is at least partially dependent on which axons/neurons have fired in previous timesteps, i.e., which firing steps were skipped in previous training timesteps], and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates), Fig. 11 (disclosing that every weight matrix row having all zero lookup table values is skipped, i.e., if two or more weights correspond to rows in the matrix for which the LUT has all zero values, two or more weights will be skipped)); …
update the selected weight using the … weight update; and infer information based, at least in part, on one or more neural networks to be trained using the updated selected weight (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep, and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates)).”
Ma appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses “two or more different numeric counts of training steps (subthreshold dynamics of leaky integrate-and-fire neuron model are described by a discrete-time dimensionless difference equation that includes a term si[t] corresponding to the count of spikes received [training steps] for time step t at synapse i [i.e., given multiple synapses and/or multiple timesteps, there will be multiple counts] – Davies, paragraphs 33-40) …; [and]
calculat[ing], using the one or more processors, an aggregated weight update for the selected weight by combining metadata corresponding to the selected weight and stored update information (a weight may be added to an appropriate weight accumulation counter [metadata corresponding to the selected weight] for an appropriate future time step; based on an aggregated weight input, a soma updates its activation state according to a spiking neuron model – Davies, paragraph 22; a learning cycle may be performed in response to a predefined passage of time steps, wherein the predefined passage of time steps represents a learning epoch – id. at paragraph 148 [stored update information = number of timesteps elapsed]; see also Fig. 10 [showing that a learning cycle is initiated after a learning epoch counter is expired and that the learning is based on all spikes that have arrived during that epoch]) ….” (a weight may be added to an appropriate weight accumulation counter for an appropriate future time step; based on an aggregated weight input, a soma updates its activation state according to a spiking neuron model – Davies, paragraph 22; a learning cycle may be performed in response to a predefined passage of time steps, wherein the predefined passage of time steps represents a learning epoch – id. at paragraph 148; see also Fig. 10 [showing that a learning cycle is initiated after a learning epoch counter is expired and that the learning is based on all spikes that have arrived during that epoch]) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to aggregate successive updates to the network, as disclosed by Davies, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase performance and energy efficiency by ensuring that learning only takes place only when there is a demand.  See Davies, paragraph 18.
Neither Ma nor Davies appears to disclose explicitly the further limitations of the claim.  However, Kwant discloses “determin[ing updates] … based, at least in part[,] on[] metadata comprising an indication of … training steps skipped (skip areas can be defined which indicate which areas of an image are not to be labeled or otherwise processed by the machine learning framework [skip areas = data comprising an indication of a number of skipped training steps]; specifying skip areas provides more accurate training data [i.e., updates are determined based on the data] –Kwant,  paragraph 37; image skip areas can be stored as metadata in a training database in association with the respective images of a machine learning training dataset – id. at paragraph 89) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Davies to determine updates based on metadata indicating a number of skipped training steps, as disclosed by Kwant, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase the accuracy of the resulting model by ensuring that only training steps that are relevant to the ultimate outcome are considered.  See Kwant, paragraph 37.

Regarding claim 24, the rejection of claim 23 is incorporated.  Ma further discloses that “the selected weight is updated as a result of determining that the selected weight is to be used in a current step of training of the one or more neural networks (all types of ANN need to be trained before performing inference or classification functions – Ma, paragraph 38; networks can be trained on labeled training datasets and, if error occurs, the error data feedback for retraining may be iterated many times until the errors converge to a minimum; the weights and parameters can then be transferred to actual application devices for deployment [i.e., each portion of weight information in each step of training is updated in response to the system determining that the weights need to be updated for training purposes] – id. at paragraph 40).”  

Regarding claim 30, Ma discloses “[a] system, comprising: 
one or more processors (invention can be implemented as a processor suitable for executing instructions stored on and/or provided by a memory coupled to the processor – Ma, paragraph 23) to:
determine a selected plurality of weights for training one or more neural networks based, at least in part[,] on[] … training steps skipped for two or more different weights in one or more previous training sessions of the one or more neural networks (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used at every timestep and the weights are selected for update based on which neurons fire, which is at least partially dependent on which axons/neurons have fired in previous timesteps, i.e., which firing steps were skipped in previous training timesteps], and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates), Fig. 11 (disclosing that every weight matrix row having all zero lookup table values is skipped, i.e., if two or more weights correspond to rows in the matrix for which the LUT has all zero values, two or more weights will be skipped)); …
update the selected weight using the … weight update; and infer information using one or more neural networks to be trained using the updated selected weight (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep, and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates)); and
one or more memories to store the one or more neural networks (memory-centric neural network system includes semiconductor memory devices coupled to the processing unit and containing instructions executed by the processing unit – Ma, paragraph 9).”
Ma appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses “two or more different numeric counts of training steps (subthreshold dynamics of leaky integrate-and-fire neuron model are described by a discrete-time dimensionless difference equation that includes a term si[t] corresponding to the count of spikes received [training steps] for time step t at synapse i [i.e., given multiple synapses and/or multiple timesteps, there will be multiple counts] – Davies, paragraphs 33-40) …; [and]
calculat[ing], using the one or more processors, an aggregated weight update for the selected weight by combining metadata corresponding to the selected weight and stored update information (a weight may be added to an appropriate weight accumulation counter [metadata corresponding to the selected weight] for an appropriate future time step; based on an aggregated weight input, a soma updates its activation state according to a spiking neuron model – Davies, paragraph 22; a learning cycle may be performed in response to a predefined passage of time steps, wherein the predefined passage of time steps represents a learning epoch – id. at paragraph 148 [stored update information = number of timesteps elapsed]; see also Fig. 10 [showing that a learning cycle is initiated after a learning epoch counter is expired and that the learning is based on all spikes that have arrived during that epoch]) ….” (a weight may be added to an appropriate weight accumulation counter for an appropriate future time step; based on an aggregated weight input, a soma updates its activation state according to a spiking neuron model – Davies, paragraph 22; a learning cycle may be performed in response to a predefined passage of time steps, wherein the predefined passage of time steps represents a learning epoch – id. at paragraph 148; see also Fig. 10 [showing that a learning cycle is initiated after a learning epoch counter is expired and that the learning is based on all spikes that have arrived during that epoch]) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to aggregate successive updates to the network, as disclosed by Davies, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase performance and energy efficiency by ensuring that learning only takes place only when there is a demand.  See Davies, paragraph 18.
Neither Ma nor Davies appears to disclose explicitly the further limitations of the claim.  However, Kwant discloses “determin[ing updates] … based, at least in part[,] on[] metadata comprising an indication of … training steps skipped (skip areas can be defined which indicate which areas of an image are not to be labeled or otherwise processed by the machine learning framework [skip areas = data comprising an indication of a number of skipped training steps]; specifying skip areas provides more accurate training data [i.e., updates are determined based on the data] –Kwant,  paragraph 37; image skip areas can be stored as metadata in a training database in association with the respective images of a machine learning training dataset – id. at paragraph 89) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Davies to determine updates based on metadata indicating a number of skipped training steps, as disclosed by Kwant, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase the accuracy of the resulting model by ensuring that only training steps that are relevant to the ultimate outcome are considered.  See Kwant, paragraph 37.

Regarding claim 31, Ma/Davies/Kwant discloses that “the one or more neural networks are trained by at least further forward propagating the selected weight to determine one or more outputs (all types of ANN need to be trained before performing inference or classification functions; typically, there are two distinct modes of ANN operations, feed-forward mode for inferences and classifications [i.e., the determination of outputs] and backpropagation mode for training or learning using labeled training datasets – Ma, paragraph 38; new training data may be processed [forward propagated] in accordance with training data sets; if error occurs, the error data feedback for retraining [i.e., weight updating] can be iterated many times until the errors converge to a minimum and below a certain threshold of changes – id. at paragraph 40 [i.e., at each iteration of training, the updated weight information is multiplied with the input information and the result is forward propagated through the network until an output is obtained]; see also paragraphs 104-05 (describing the weight updating process)).”  

Regarding claim 34, Ma/Davies/Kwant discloses that “the information is updated after an epoch of training of the one or more neural networks (when one of the axons or neurons fires, the corresponding timestamp registers can be written with a value B and decremented until the value B reaches 0 – Ma, paragraphs 91-92; LTP/LTD curves are then used to determine synaptic weight updates based on a comparison between the two timestamp registers, which only occurs when an axon or neuron fires and neither timestamp is zero – id. at paragraphs 93-94 [i.e., the timestamp metadata are updated to the value B after a firing event occurring subsequent to a weight update/training epoch]).”  

Regarding claim 36, Ma/Davies/Kwant discloses that the system “further compris[es] an autonomous vehicle (once trained, weights and parameters of a neural network can be transferred to application devices for deployment, such as self-driving cars or autonomous drones – Ma, paragraph 40).”  

Regarding claim 37, Ma discloses “[a] method, comprising:
determining a selected plurality of weights for training one or more neural networks based, at least in part[,] on[] … training steps skipped for two or more different weights in one or more previous training sessions of the one or more neural networks (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used at every timestep and the weights are selected for update based on which neurons fire, which is at least partially dependent on which axons/neurons have fired in previous timesteps, i.e., which firing steps were skipped in previous training timesteps], and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates), Fig. 11 (disclosing that every weight matrix row having all zero lookup table values is skipped, i.e., if two or more weights correspond to rows in the matrix for which the LUT has all zero values, two or more weights will be skipped)); …
updating the selected weight using the … weight update; and inferring information using one or more neural networks to be trained using the updated selected weight (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep, and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates)).”
Ma appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses “two or more different numeric counts of training steps (subthreshold dynamics of leaky integrate-and-fire neuron model are described by a discrete-time dimensionless difference equation that includes a term si[t] corresponding to the count of spikes received [training steps] for time step t at synapse i [i.e., given multiple synapses and/or multiple timesteps, there will be multiple counts] – Davies, paragraphs 33-40) …; [and]
calculating an aggregated weight update for the selected weight by combining metadata corresponding to the selected weight and stored update information (a weight may be added to an appropriate weight accumulation counter [metadata corresponding to the selected weight] for an appropriate future time step; based on an aggregated weight input, a soma updates its activation state according to a spiking neuron model – Davies, paragraph 22; a learning cycle may be performed in response to a predefined passage of time steps, wherein the predefined passage of time steps represents a learning epoch – id. at paragraph 148 [stored update information = number of timesteps elapsed]; see also Fig. 10 [showing that a learning cycle is initiated after a learning epoch counter is expired and that the learning is based on all spikes that have arrived during that epoch]) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to aggregate successive updates to the network, as disclosed by Davies, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase performance and energy efficiency by ensuring that learning only takes place only when there is a demand.  See Davies, paragraph 18.
Neither Ma nor Davies appears to disclose explicitly the further limitations of the claim.  However, Kwant discloses “determining [updates] … based, at least in part[,] on[] metadata comprising an indication of … training steps skipped (skip areas can be defined which indicate which areas of an image are not to be labeled or otherwise processed by the machine learning framework [skip areas = data comprising an indication of a number of skipped training steps]; specifying skip areas provides more accurate training data [i.e., updates are determined based on the data] –Kwant,  paragraph 37; image skip areas can be stored as metadata in a training database in association with the respective images of a machine learning training dataset – id. at paragraph 89) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Davies to determine updates based on metadata indicating a number of skipped training steps, as disclosed by Kwant, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase the accuracy of the resulting model by ensuring that only training steps that are relevant to the ultimate outcome are considered.  See Kwant, paragraph 37.

Regarding claim 38, Ma/Davies/Kwant discloses that “the selected weight is updated further according to how many steps of training have been skipped when the selected weight is updated (when one of the axons or neurons fires, corresponding timestamp registers Tpre or Tpost can be written with a value B and decremented at each timestep until the value B reaches 0 – Ma, paragraphs 91-92; a compare operation between these two registers can be triggered only when Tpre= B and/or Tpost = B and when neither Tpost nor Tpre = 0; the comparison triggers synaptic weight updates – id. at paragraphs 93-94 [so if the current value in the register is x, the number of timesteps since firing, that is, the number of timesteps since the last weight update [number of steps skipped], is B – x]).”

Regarding claim 41, Ma discloses that “the … weight update is used to skip an update of at least one step of training (values of a lookup table are looked up in accordance with calculating results of Tpost-Tpre, where Tpre is the timestamp of the selected axon and Tpost is the timestamp of each of the neurons; when any of the lookup values is non-zero, the corresponding weight matrix row is updated; when all the lookup values are zero, the system skips to the next matrix row – Ma, paragraph 140).”  Ma further discloses “updates corresponding to two or more skipped training steps (see mapping of Ma to this element in the rejection of claim 37 supra) ….”
Ma/Kwant appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses an “aggregated … update” and “aggregating updates … for a respective weight” (see mapping of this element in the rejection of claim 37 supra).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to aggregate the updates, as disclosed by Davies, for substantially the same reason as given in the rejection of claim 37.

Regarding claim 42, Ma/Davies/Kwant discloses that “a first portion of the selected weight is updated as part of a first step of training and a different portion of the selected weight is updated as part of a second step of training (values of a lookup table are looked up in accordance with calculating results of Tpost-Tpre, where Tpre is the timestamp of the selected axon and Tpost is the timestamp of each of the neurons; when any of the lookup values is non-zero, the corresponding weight matrix row is updated; when all the lookup values are zero, the system skips to the next matrix row – Ma, paragraph 140 [first step of training = step in which one or more of the lookup values is nonzero and the row [portion of weight information] is updated; second step of training = step in which all the lookup values are zero and the system skips to the next row and updates that row]; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used every timestep], and in a worst case scenario every row will need to be updated in every 1 ms timestep).”  

Regarding claim 43, Ma/Davies/Kwant discloses that “the different portion of the selected weight partially overlaps with the first portion of the selected weight (in a worst case scenario, if every row of the weight matrix needs to be updated in every timestep, an STDP row update read-modify-write finite state machine may take 153.6 microseconds which is approximately 15% of the 1 ms timestep; however, the numbers of axons or neurons firing are relatively sparse, only the rows having axons or neurons firing need to be updated at each timestep [since the extreme cases of full weight updates at every timestep and highly sparse weight updates at every timestep are both contemplated, it follows that the median case, in which some weights are updated in two consecutive timesteps and others are not, is also contemplated] – Ma, paragraphs 127-28; paragraphs 126-28 further disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used every timestep], and in a worst case scenario every row will need to be updated in every 1 ms timestep).”
Regarding claim 45, Ma, as modified by Davies and Kwant, discloses that “each of the two or more different numeric counts corresponds to a respective weight of the plurality of weights (subthreshold dynamics of leaky integrate-and-fire neuron model are described by a discrete-time dimensionless difference equation that includes a term si[t] corresponding to the count of spikes received for time step t at synapse i with weight wi [i.e., given multiple synapses, there will be multiple counts, each corresponding to a different weight] – Davies, paragraphs 33-40).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Kwant to maintain multiple counters each corresponding to a different weight, as disclosed by Davies, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would provide more fine-grained information about each separate weight than would be available if there were only one counter for all weights.  See Davies, paragraphs 33-40.
Regarding claim 46, Ma, as modified by Davies and Kwant, discloses that “a skipped training step comprises a training iteration during which no update is applied to the respective weight (Ma paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used at every timestep and the weights are selected for update based on which neurons fire, so in some timesteps no update is applied to some weights], and in a worst case scenario every row will need to be updated in every 1 ms timestep).”  
Claims 3, 7, 11, 25-26, 29, 33, 35, 44, and 48 are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Davies and Kwant and further in view of Kaskari et al. (US 20180232632) (“Kaskari”).
Regarding claim 3, the rejection of claim 1 is incorporated.  Ma further discloses that “the selected weight is updated based at least in part on: 
information indicating … updates of unused weights (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used at every timestep], and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates)) ….”
Ma/Kwant appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses “two or more successive updates (see mapping of this element in the rejection of claim 1 supra) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to aggregate the updates, as disclosed by Davies, for substantially the same reason as given in the rejection of claim 1.
Neither Ma, Kwant, nor Davies appears to disclose explicitly the further limitations of the claim.  However, Kaskari discloses that “the weight is updated based at least in part on: …
momentum information to indicate how to update the selected weight (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between a momentum and a quantity Δweight(i – 1) [momentum information] – Kaskari, paragraph 47); 
a learning rate (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between an error signal received at corresponding weights or biases using backpropagation through time and a learning rate – Kaskari, paragraph 47); and 
a momentum coefficient (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between a momentum [momentum coefficient] and a quantity Δweight(i – 1) – Kaskari, paragraph 47).”
Kaskari and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to update the weight information using a momentum coefficient, further momentum information, and a learning rate, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 7, Ma discloses that “the … weight update is calculated further based, at least in part, on … the information indicating the … updates of the unused weights, to update the selectedweight (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update [information indicating how often the weights are updated], since weight updates occur via firing events and are thereby associated with the registers]; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used at every timestep], and in a worst case scenario every row will need to be updated in every 1 ms timestep).” 
Ma/Kwant appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses an “aggregated … update” and “two or more successive updates” (see mapping of this element in the rejection of claim 1 supra).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to aggregate the updates, as disclosed by Davies, for substantially the same reason as given in the rejection of claim 1.
Neither Ma, Kwant, nor Davies appears to disclose explicitly the further limitations of the claim.  However, Kaskari discloses that “the … update … [is] calculated based, at least in part, on the momentum information (weights and biases are updated for each epoch according to the rule weight(i) = weight(i – 1) + update, where the update is equal to a quantity dependent, inter alia, on a momentum coefficient provided that this quantity is in bounds [since the weight is dependent on the weight at the previous time step, which is dependent on the weight at a previous time step, etc., the weight update up to epoch i is an accumulated update] – Kaskari, paragraph 47) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to base the weight update on momentum information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 11, Ma, as modified by Davies, Kwant, and Kaskari, discloses that “the one or more memories are to store momentum information to indicate how to update the selected weight (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between a momentum and a quantity Δweight(i – 1) [momentum information] – Kaskari, paragraph 47).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to store momentum information to indicate how to update the weight information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 25, Ma discloses that “the selected weight is updated based, at least in part, on: 
information indicating the … weight update (Ma paragraphs 90-94 disclose a system for performing synaptic weight updates based on axon and neuron timestamps; when an axon fires, a timestamp register Tpre can be written with a value B, and decremented each timestep until reaching 0; when a neuron fires, a timestamp register Tpost is written with B and decremented until B reaches 0; if Tpre = B and/or Tpost = B and neither Tpre nor Tpost = 0, a compare operation between Tpre and Tpost is triggered, and Tpost-Tpre = 1 as shown in an LTP/LTD table signifies long term potentiation and results in a synaptic weight update; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., weights are not used every timestep], and in a worst case scenario every row will need to be updated in every 1 ms timestep; see also paragraphs 23 (disclosing a processor [circuit] for carrying out the method),  39-42 (disclosing that training takes place through synaptic weight updates)) ….”
Ma/Kwant appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses an “aggregated … update (see mapping of this element in the rejection of claim 23 supra) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to aggregate the updates, as disclosed by Davies, for substantially the same reason as given in the rejection of claim 23.
Neither Ma, Kwant, nor Davies appears to disclose explicitly the further limitations of the claim.  However, Kaskari discloses that “the selected weight is updated based, at least in part, on: …
momentum information to indicate how to update the selected weight (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between a momentum and a quantity Δweight(i – 1) [momentum information] – Kaskari, paragraph 47); 
a learning rate (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between an error signal received at corresponding weights or biases using backpropagation through time and a learning rate – Kaskari, paragraph 47); and 
a momentum coefficient (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, the product between a momentum [momentum coefficient] and a quantity Δweight(i – 1) – Kaskari, paragraph 47).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to update the weights based on learning rate and momentum information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 26, Ma, as modified by Davies, Kwant, and Kaskari, discloses that “the learning rate and momentum coefficients are hyperparameters (in a backward pass of training a neural network, an adaptive learning rate algorithm may be used in which the weights are updated for the i-th epoch based on the formula weight(i) = weight (i – 1) + update, where the update value is equal to Xweight(i) if Xweight(i) is between a lower bound and an upper bound for the update of the weights, and Xweight(i) is based on, inter alia, a learning rate and a momentum, where the momentum may be set to m = 0.9 and the learning rate may be set to μ = 10-3 [i.e., they are not parameters set by training and thus are hyperparameters] – Kaskari, paragraph 47).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to update the weights based on learning rate and momentum hyperparameters, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 29, Ma discloses that “an accumulated update is calculated based, at least in part, on the … weight update (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used at every timestep], and in a worst case scenario every row will need to be updated in every 1 ms timestep).”  
Ma/Kwant appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses an “aggregated … update (see mapping of this element in the rejection of claim 23 supra) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to aggregate the updates, as disclosed by Davies, for substantially the same reason as given in the rejection of claim 23.
Neither Ma, Kwant, nor Davies appears to disclose explicitly the further limitations of the claim.  However, Kaskari discloses that “an accumulated update is calculated based, at least in part, on the momentum information (weights and biases are updated for each epoch according to the rule weight(i) = weight(i – 1) + update, where the update is equal to a quantity dependent, inter alia, on a momentum coefficient provided that this quantity is in bounds [since the weight is dependent on the weight at the previous time step, which is dependent on the weight at a previous time step, etc., the weight update up to epoch i is an accumulated update] – Kaskari, paragraph 47) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies to calculate an accumulated update based on momentum information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 33, Ma, as modified by Davies, Kwant, and Kaskari, discloses that “the selected weight is updated further based, at least in part, on momentum information to indicate how to update the selected weight (weights and biases connected to output layer of neural network are updated according to the equation weight(i) = weight(i – 1) + update and the update, when within bounds, is defined by an expression that includes a momentum term [i.e., the equation for weight update indicates how the update is done using momentum information] – Kaskari, paragraph 47).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to update the weight information using momentum information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 35, the rejection of claim 33 is incorporated.  Ma further discloses that “an accumulated update is calculated based, at least in part, on the … weight update (axon timestamp registers register rolling timestamps of the last firing event of the axons, and neuron timestamp registers register rolling timestamps of the last firing event of the neurons; when an axon fires, a corresponding timestamp register Tpre can be written with a value B and is decremented by 1 until B reaches zero; when a neuron fires, a corresponding timestamp register Tpost can be written with a value B and decremented in each timestep until B reaches zero; a comparison operation between Tpre and Tpost can be triggered when an axon or neuron fires and neither Tpre nor Tpost equals zero; long-term potentiation and long-term depression curves determine synaptic weight updates; when Tpost – Tpre = 1, the synaptic weight can be considered “Strong LTP;” when Tpost – Tpre = 0, the synaptic weight can be considered “weak LTD” – Ma, paragraphs 90-94 [i.e., the timestamp registers that store the time since last firing also store the time since the last weight update, since weight updates occur via firing events and are thereby associated with the registers [containing timestamp metadata]]; paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep [i.e., the weights are not used at every timestep], and in a worst case scenario every row will need to be updated in every 1 ms timestep).”
Ma/Kwant appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses an “aggregated update (see mapping of this element in the rejection of claim 30 supra) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to aggregate the updates, as disclosed by Davies, for substantially the same reason as given in the rejection of claim 30.
Kaskari further discloses that “an accumulated update is calculated based, at least in part, on the momentum information (weights and biases are updated for each epoch according to the rule weight(i) = weight(i – 1) + update, where the update is equal to a quantity dependent, inter alia, on a momentum coefficient provided that this quantity is in bounds [since the weight is dependent on the weight at the previous time step, which is dependent on the weight at a previous time step, etc., the weight update up to epoch i is an accumulated update] – Kaskari, paragraph 47) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to calculate an accumulated update based on momentum information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 44, the rejection of claim 37 is incorporated.  Ma further discloses “training steps skipped”, as shown in the rejection of claim 37.  Davies further discloses “two or more different numeric counts of training steps,” as shown in the rejection of claim 37.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma and Kwant to maintain two or more separate counts of training steps, as disclosed by Davies, for substantially the same reasons as given in the rejection of claim 37.
Neither Ma, Davies, nor Kwant appears to disclose explicitly the further limitations of the claim.  However, Kaskari discloses that the “momentum information is used to determine the aggregated weight update based on the … training steps … for the selected weight (weights and biases are updated for each epoch according to the rule weight(i) = weight(i – 1) + update, where the update is equal to a quantity dependent, inter alia, on a momentum coefficient provided that this quantity is in bounds [since the weight is dependent on the weight at the previous time step, which is dependent on the weight at a previous time step, etc., the weight update up to epoch i is an aggregated update] – Kaskari, paragraph 47) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to calculate an accumulated update based on momentum information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Regarding claim 48, the rejection of claim 1 is incorporated.  Ma further discloses “training steps skipped”, as shown in the rejection of claim 1.  Ma further discloses that “the … weight update is applied when the selected weight is next determined to be used in a step of training (Ma paragraph 128 discloses that weight update [i.e., the determination that the weight is to be used in a training step] is only performed [applied] for rows of the weight matrix having axons or neurons firing [i.e., the selected rows]) ….” 
Davies further discloses an “aggregated weight update [that] aggregates updates,” as shown in the rejection of claim 1.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma to aggregate weight updates, as disclosed by Davies, for substantially the same reasons as given in the rejection of claim 1.
Neither Ma, Davies, nor Kwant appears to disclose explicitly the further limitations of the claim.  However, Kaskari discloses “updat[ing] based on momentum accumulated during the training steps (weights and biases are updated for each epoch according to the rule weight(i) = weight(i – 1) + update, where the update is equal to a quantity dependent, inter alia, on a momentum coefficient provided that this quantity is in bounds [since the weight is dependent on the weight at the previous time step, the momentum is accumulated across time steps] – Kaskari, paragraph 47) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to calculate an accumulated update based on momentum information, as disclosed by Kaskari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve performance of the network and increase the training convergence rate.  See Kaskari, paragraph 47.

Claims 5, 27, and 40 are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Davies and Kwant and further in view of Krishnamurthy et al. (US 20190042910) (“Krishnamurthy”).
Regarding claim 5, Ma, as modified by Davies, Kwant, and Krishnamurthy, discloses that “one or more counters indicate the number of skipped training steps during the previous training session of the one or more neural networks (long-term potentiation on a synapse may be conducted using a replay spike; when a presynaptic spike is replayed after a fixed number of time steps T, the relative spike timing between a presynaptic spike and a replay spike is determined based solely on a postsynaptic spike history counter (e.g., the number of time-steps since the postsynaptic spike occurred); T is the maximum spike time difference beyond which the synaptic weight update is zero [so the counter indicates when a spike was last emitted, and the timing of the spikes are used for weight update, so a counter indicating the amount of time since the last spike is also a counter for how long ago the weight was updated because weight updates do not occur absent a spike] – Krishnamurthy, paragraph 39).”  
Krishnamurthy and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to include a counter indicating how many steps of training have elapsed since weight information was last updated, as disclosed by Krishnamurthy, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that the system does not learn from events that occurred to far back in time to be significant.  See Krishnamurthy, paragraph 39.

Regarding claim 27, Ma, as modified by Davies, Kwant, and Krishnamurthy, discloses that “a counter indicates the number of skipped training steps during the previous training session of the one or more neural networks (long-term potentiation on a synapse may be conducted using a replay spike; when a presynaptic spike is replayed after a fixed number of time steps T, the relative spike timing between a presynaptic spike and a replay spike is determined based solely on a postsynaptic spike history counter (e.g., the number of time-steps since the postsynaptic spike occurred); T is the maximum spike time difference beyond which the synaptic weight update is zero [so the counter indicates when a spike was last emitted, and the timing of the spikes are used for weight update, so a counter indicating the amount of time since the last spike is also a counter for how long ago the weight was updated because weight updates do not occur absent a spike] – Krishnamurthy, paragraph 39).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to include a counter indicating how many steps of training have elapsed since the last weight update, as disclosed by Krishnamurthy, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that the system does not learn from events that occurred to far back in time to be significant.  See Krishnamurthy, paragraph 39.

Regarding claim 40, Ma/Davies/Kwant/Krishnamurthy discloses that “a counter indicates the number of skipped training steps during the previous training session of the one or more neural networks (long-term potentiation on a synapse may be conducted using a replay spike; when a presynaptic spike is replayed after a fixed number of time steps T, the relative spike timing between a presynaptic spike and a replay spike is determined based solely on a postsynaptic spike history counter (e.g., the number of time-steps since the postsynaptic spike occurred); T is the maximum spike time difference beyond which the synaptic weight update is zero [so the counter indicates when a spike was last emitted, and the timing of the spikes are used for weight update, so a counter indicating the amount of time since the last spike is also a counter for how long ago the weight was updated because weight updates do not occur absent a spike] – Krishnamurthy, paragraph 39).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to include a counter indicating how many steps of training have elapsed since the last weight update, as disclosed by Krishnamurthy, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that the system does not learn from events that occurred to far back in time to be significant.  See Krishnamurthy, paragraph 39.

Claims 6, 10, 28, and 32 are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Davies and Kwant and further in view of Vu et al. (US 20200372360) (“Vu”).
Regarding claim 6, Ma, as modified by Davies, Kwant, and Vu, discloses that “the selected weight is associated with an embedding vector2 (in order for a perceptron to generate a desired value, a learning rule is specified that specifies how to train the perceptron; the learning process proceeds by selecting an initial weight vector [embedding vector], and then a set of m pairs of inputs and target values is used successively to update the weight vector until the output of the perceptron is equal to or close to the target values for the output – Vu, paragraph 35).”  
Vu and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to associate the weight information with an embedding vector, as disclosed by Vu, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the network to manipulate multiple weights at once using linear algebraic expressions, thereby increasing efficiency.  See Vu, paragraph 35.

Regarding claim 10, Ma, as modified by Davies, Kwant, and Vu, discloses that “the information indicates how to update a plurality of embedding vectors used to train the one or more neural networks (in order for a perceptron to generate a desired value, a learning rule is specified that specifies how to train the perceptron; the learning process proceeds by selecting an initial weight vector [embedding vector], and then a set of m pairs of inputs and target values [metadata] is used successively to update the weight vector until the output of the perceptron is equal to or close to the target values for the output – Vu, paragraph 36).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to use data to indicate how to update the embedding vectors of the networks, as disclosed by Vu, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that the updating of the weights of the network takes place according to a predefined procedure, thereby enhancing the predictability o the network operations.  See Vu, paragraph 36.

Regarding claim 28, Ma, as modified by Davies, Kwant, and Vu, discloses that “the selected weight is associated with an embedding vector (in order for a perceptron to generate a desired value, a learning rule is specified that specifies how to train the perceptron; the learning process proceeds by selecting an initial weight vector [embedding vector], and then a set of m pairs of inputs and target values is used successively to update the weight vector until the output of the perceptron is equal to or close to the target values for the output – Vu, paragraph 35).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to associate the weight information with an embedding vector, as disclosed by Vu, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the network to manipulate multiple weights at once using linear algebraic expressions, thereby increasing efficiency.  See Vu, paragraph 35.

Regarding claim 32, the rejection of claim 31 is incorporated.  Ma/Kwant appears not to disclose explicitly the further limitations of the claim.  However, Davies discloses an “aggregated … update (see mapping of this element in the rejection of claim 30 supra) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Kwant to aggregate the updates, as disclosed by Davies, for substantially the same reason as given in the rejection of claim 30.
Vu discloses that “the … weight update is represented using information that indicates how to update a plurality of embedding vectors used to train the one or more neural networks (in order for a perceptron to generate a desired value, a learning rule is specified that specifies how to train the perceptron; the learning process proceeds by selecting an initial weight vector [embedding vector], and then a set of m pairs of inputs and target values [metadata] is used successively to update the weight vector until the output of the perceptron is equal to or close to the target values for the output – Vu, paragraph 35).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to include data indicating how to update the weight vectors for the networks, as disclosed by Vu, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that the updating of the weights of the network takes place according to a predefined procedure, thereby enhancing the predictability of the network operations.  See Vu, paragraph 36.

Claims 17 and 39 are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Davies and Kwant and further in view of Le Gallo-Bordeau et al. (US 20200293855) (“Le Gallo-Bordeau”).
Regarding claim 17, Ma, as modified by Davies, Kwant, and Le Gallo-Bordeau, discloses that “determining the updated selected weight is further based, at least in part, on a random or pseudo-random process (neural network weight update is calculated for a weight in each of a plurality of arrays storing a weight set; in some embodiments the weight updates can be computed for only a subset of weights, e.g., a randomly-selected subset – Le Gallo-Bordeau, paragraph 33).”  
Le Gallo-Bordeau and the instant application both relate to selective weight updating in neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to select the weight information to be used randomly or pseudo-randomly, as disclosed by Le Gallo-Bordeau, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would reduce the processing requirements of the system by requiring only that certain weights, as opposed to the entire weight set, be updated.  See Le Gallo-Bordeau, paragraph 33.

Regarding claim 39, the rejection of claim 37 is incorporated.  Le Gallo-Bordeau discloses that “determining the selected weight is further based, at least in part, on a random[] or pseudo-random[] process (neural network weight update is calculated for a weight in each of a plurality of arrays storing a weight set; in some embodiments the weight updates can be computed for only a subset of weights, e.g., a randomly-selected subset – Le Gallo-Bordeau, paragraph 33).”   It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to select the weights to be updated randomly, as disclosed by Le Gallo-Bordeau, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would reduce the processing requirements of the system by requiring only that certain weights, as opposed to the entire weight set, be updated.  See Le Gallo-Bordeau, paragraph 33.

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Davies and further in view of Yang et al. (US 20200356803) (“Yang”).
Regarding claim 19, Ma, as modified by Davies, Kwant, and Yang, discloses “updating the selected weight by at least computing a gradient based at least in part on ground truth data and output data of the one or more neural networks (supervised learning algorithm may use forward propagation to generate a factor and overall scores, determine differences between the generated factor and overall scores [output data] to a ground truth factor and overall scores to estimate a loss function, use the differences to estimate a gradient of the loss function, and backpropagate [update] the differences to weights and biases of the system according to the estimate of the gradient of the loss function – Yang, paragraph 35).”  
Yang and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma/Davies/Kwant to generate the weight information by computing a gradient based on ground truth data and output data, as disclosed by Yang, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would provide a point of comparison to which the output of the network can be compared to determine how much to update the weights.  See Yang, paragraph 35.

Claim 47 is rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Davies and Kwant and further in view of Cao et al. (US 10387774) (“Cao”).
Regarding claim 47, the rejection of claim 1 is incorporated.  Ma further discloses that an action is taken “when the selected weight is updated (Ma Fig. 11 shows that, after the corresponding weight matrix row is updated with lookup table values [i.e., when the selected weight is updated], the system skips to the next matrix row and looks up other lookup table values [action taken in response]) ….”
Neither Ma, Davies, nor Kwant appears to disclose explicitly the further limitations of the claim.  However, Cao discloses that “a counter associated with the selected weight is reset [and] the selected weight is updated using a gradient (adapted CNN was trained using an error back-propagation with stochastic gradient descent and learned weights are applied to an SNN architecture; before each test image is presented to the SNN, the neurons were reset so their membrane voltage is zero and the counters in the spike counter [associated with a selected weight] were cleared – Cao, col. 10, l. 64-col. 11, l. 24).” 
Cao and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ma, Davies, and Kwant to employ counters that reset when certain conditions are satisfied, as disclosed by Cao, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that previous irrelevant information does not influence the current decision-making process of the network.  See  Cao, col. 10, l. 64-col. 11, l. 24. 

Response to Arguments
Applicant's arguments filed February 20, 2026 (“Remarks”) have been fully considered but they are not persuasive.
Applicant alleges that the Ma/Davies/Kwant combination fails to disclose using metadata that indicate two or more different numeric counts of training steps skipped for two or more different weights from one or more previous training sessions.  Remarks at 11-13.  However, the specification does not explicitly define “previous training sessions,” and the term may be broadly construed to mean any previous moment in the lifetime of the network during which training occurred, including weight updates for a previous timestep.  With that in mind, consider that the rejection is based on the combination of references, not on any one of the references standing alone.  Ma, for example, discloses determining skipped training steps for two or more weights in previous training sessions, since paragraphs 126-28 disclose that only the rows of weights having axons or neurons firing need to be updated at each timestep.  That is, to the extent that two or more rows of the weight matrix do not correspond to an axon or neuron that is actively firing, the updating of those weights is skipped for that time step.  Davies, meanwhile, discloses a system in which two or more different counters are maintained for two or more different weights, and Kwant discloses metadata that comprise an indication of skipped training steps.  Taken together, an ordinary artisan could combine these references to arrive at a system that uses metadata that indicate two or more different numeric counts of training steps skipped for two or more different weights from one or more previous training sessions for the reasons given in the rejection itself, which Applicant does not dispute.  One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN C VAUGHN whose telephone number is (571)272-4849. The examiner can normally be reached M-R 7:00a-5:00p ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/RYAN C VAUGHN/Primary Examiner, Art Unit 2125

        1 Applicant appears to be using terminology in a way that differs from its accepted meaning in the art.  Examiner understands the term “epoch” to refer to a cycle through the full training dataset.  See¸DeepAI, What is an Epoch?, https://deepai.org/machine-learning-glossary-and-terms/epoch.  However, Applicant appears to be using “epoch” more broadly to mean any training step or batch.  See, e.g., specification paragraph 49 (“…metadata [are] used to track how many steps (batches) of training have been skipped.”).  For purposes of examination, the term “epoch” will be construed to mean any step of training.
        2 Applicant appears to be using the term “embedding vector” in a way that differs from its accepted meaning in the art.  In common parlance, the term “embedding vector” refers to a numerical vector representation of a set of input data.  See Tripathi, What Are Vector Embeddings?, https://www.pinecone.io/learn/vector-embeddings/.  However, Applicant appears to be using the term to refer to a weight vector.  See specification paragraph 66 (“In at least one embodiment, set of embedding vectors 402 comprises weights (e.g., 256 weights) that are used to control and adjust behavior of neural network 406.”).  For purposes of examination, the term “embedding vector” will be deemed synonymous with a weight vector.
Read full office action
Prosecution Timeline

Show 23 earlier events
Apr 28, 2025
Applicant Interview (Telephonic)
Apr 28, 2025
Examiner Interview Summary
Aug 28, 2025
Response Filed
Oct 20, 2025
Final Rejection mailed — §103
Oct 30, 2025
Interview Requested
Feb 20, 2026
Request for Continued Examination
Mar 04, 2026
Response after Non-Final Action
Mar 26, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/684,254
Patent 12639610
QUANTUM-CLASSICAL HYBRID COMPUTER FOR CALCULATING ARITHMETIC FUNCTIONS USING FOURIER ANALYSIS
4y 2m to grant Granted May 26, 2026
17/453,330
Patent 12619598
DATA ALLOCATION WITH USER INTERACTION IN A MACHINE LEARNING SYSTEM
4y 6m to grant Granted May 05, 2026
17/872,118
Patent 12619860
METHOD AND DEVICE FOR CONTROLLING FIRING TIMING IN SPIKING NEURAL NETWORKS
3y 9m to grant Granted May 05, 2026
17/159,842
Patent 12608634
DISTRIBUTED QUANTUM FILE CONSOLIDATION
5y 2m to grant Granted Apr 21, 2026
17/304,163
Patent 12602448
PROGRESSIVE NEURAL ORDINARY DIFFERENTIAL EQUATIONS
4y 10m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

7-8
Expected OA Rounds
62%
Grant Probability
82%
With Interview (+19.8%)
3y 9m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 241 resolved cases by this examiner. Grant probability derived from career allowance rate.