Last updated: April 19, 2026
Application No. 17/339,978
ONLINE TRAINING OF NEURAL NETWORKS

Final Rejection §101§103§112
Filed
Jun 05, 2021
Examiner
NAULT, VICTOR ADELARD
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
4 (Final)
Interview Optional

— +83.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 13 resolved cases, 2023–2026
Examiner Intelligence

NAULT, VICTOR ADELARD View full profile →
Grants 62% of resolved cases
Career Allow Rate
8 granted / 13 resolved
+6.5% vs TC avg
Strong +83% interview lift
Without
With
+83.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
29.1%
-10.9% vs TC avg
§103
40.4%
+0.4% vs TC avg
§102
7.5%
-32.5% vs TC avg
§112
21.4%
-18.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 13 resolved cases
Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on December 18, 2025, in which claims 1, 3-6, 8-13, and 15-23 have been amended. No claims have been newly cancelled. Claim 25 has been added.
Claims 1 and 3-25 are currently pending.

Response to Arguments
With regards to the rejections of claims 8, 9, and 12 under 25 U.S.C. 112(b), Applicant’s arguments that the claims as amended overcome the rejections are partially persuasive. Examiner acknowledges that the amendments to claims 8 and 9 resolve the previously noted source of indefiniteness and thus the 112(b) rejections of claims 8 and 9 are withdrawn. However, Examiner has considered the arguments with respect to claim 12 and respectfully disagrees that the 112(b) rejection of claim 12 is overcome. Examiner acknowledges that both of the following formulas are recited within the instant application’s specification:

    PNG
    media_image1.png
    78
    473
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    68
    142
    media_image2.png
    Greyscale

However, this does not change the fact that these formulas recite conflicting values for the computed variable ϵt,θl. Examiner notes that according to paragraph [0064] within the specification, the first formula for ϵt,θl is intended for use with multi-layer RNNs, and according to paragraph [0081] the second formula for ϵt,θl is intended for use with the stateless layers of hybrid neural networks, i.e. two different types of neural networks.
With regards to the rejections of claims 18, 19, and 22 as rejected under 35 U.S.C. 101 for being directed towards non-statutory subject matter, Applicant has amended the claims to recite tangible hardware components, and thus the claims now fall under the statutory category of a machine, and therefore the rejections are withdrawn. 
With regards to the rejections of claims 1 and 3-24 under 35 U.S.C. 101 for being directed towards abstract ideas, Examiner considers Applicant’s arguments that the claims as amended overcome the 101 rejections persuasive, at least due to the amended claims using a specialized hardware architecture to facilitate their operations, that being neuromorphic hardware comprising a crossbar array of resistive memory elements, which is a particular machine under MPEP 2106.05(b) with specific elements.
With regards to the rejections of claims 1, 3-5, 8-10, 13-16, 18-22, and 24 under 35 U.S.C. 103 as being unpatentable over Bellec et al. “Biologically inspired alternatives to backpropagation through time for learning in recurrent neural nets” (Bellec-2019) in view of Bellec et al. “A solution to the learning dilemma for recurrent networks of spiking neurons” (Bellec-2020), Examiner finds Applicant’s argument that the claims as amended overcome the rejections partially persuasive, though the arguments found persuasive are moot in view of a new grounds of rejection detailed below, necessitated by Applicant’s amendments to the claims.
Applicant argues on page 22 of the Remarks that “Bellec-2019 and Bellec-2020 are silent regarding disclosing the specifics of ‘the neuromorphic hardware having a crossbar array including a plurality of row lines, a plurality of column lines, and a plurality of junctions arranged between the plurality of row lines and the plurality of column lines, each junction of the plurality of junctions comprising a resistive memory element having a resistance state, the resistance states representing input weights and recursive weights of the neuronal units’”. Examiner acknowledges that neither Bellec-2019 nor Bellec-2020 teach most elements of the cited limitation, though Examiner notes that Bellec-2020 teaches “’input weights and recursive weights’”.
Additionally, Applicant argues on pages 20 and 21 of the Remarks that “Bellec-2019 does not teach or suggest computation of two specific independent components i.e., a spatial gradient component and a temporal gradient component for each neuronal unit, at each time instance of the input signal, as recited in amended independent claim 1” (emphasis Applicant’s) and “the alleged updation of learning signals at every time instance in Bellec-2019 is not equivalent to updating two independent gradient components for each neuronal unit at each time instance of the input signal, as claimed” (emphasis Applicant’s).
Examiner respectfully disagrees. Applicant appears to be arguing that, although Bellec-2019 does explicitly disclose updating a learning signal, corresponding to a spatial gradient component, at every time instance, Bellec-2019 does not teach that its eligibility traces, corresponding to temporal gradient components, are likewise updated at every time instance. Although Examiner acknowledges that Bellec-2019 is not as explicit about the frequency of eligibility trace updates as it is about learning signal updates, Examiner considers Bellec-2019 to adequately disclose that eligibility traces are updated at every time instance. 
As stated in the prior 103 rejection of claim 1, Bellec-2019 recites: ((Bellec-2019 Pg. 14) “To compute these gradients when processing the interval {t - Δt, …, t}, the eligibility traces and learning signals are computed solely from data available within that interval”. If the interval is every 1 ms, as Bellec-2019 states is used in some experiments for learning signals (at Bellec-2019 Fig. 1d and 1e), then the same interval is used for computing the eligibility traces, and thus the eligibility traces are computed every 1 ms, i.e. at each time instance, as well. Further evidence can be seen in Bellec-2019 Fig. 1b, where a learning signal Lt-1 and a subsequent learning signal Lt are computed, and therefore the learning signal is updated after a single time instance (as only a single time instance separates times t-1 and t). Even further evidence can be seen in Bellec-2019 Fig. 5, where eligibility traces and learning signals are computed for individual neuron states st-1, st, and st+1, and for neuron output zt. 
Although Bellec-2019 Fig. 1b and Fig. 5 do not explicitly disclose that learning signals are always updated at every time instance, they at least disclose that it is possible for the learning signal to be updated at every time instance. Given these elements, Examiner considers Bellec-2019 to adequately teach updating two independent gradient components for each neuronal unit at each time instance of the input signal.
Additionally, Applicant argues on pages 21 and 22 of the Remarks that “Bellec-2020 describes that eligibility traces et (the alleged temporal gradient component) are computed forward in time, whereas learning signals Ltj (the alleged spatial gradient component) require backward gradient propagation and are approximated in e-prop applications. Computations that require opposite temporal directions (forward vs. backward) cannot occur simultaneously for the same time instance, nor are they computed independently per neuronal unit, as required by amended independent claim 1. Therefore, Bellec-2020's disclosure contradicts Examiner's assertion that eligibility traces et (the alleged temporal gradient component) and learning signals/approximation Ltj (the alleged spatial gradient component) are computed in parallel and independently”.
Examiner respectfully disagrees that Bellec-2020 contradicts or teaches away from computation of eligibility traces (temporal gradient components) and learning signals (spatial gradient components) independently and in parallel. First, Examiner notes that Bellec-2020 discloses that (Bellec-2020 Pg. 4) “As the ideal value dE/dztj of the learning signal Ltj also captures influences that the current spike output ztj of neuron j may have on E via future spikes of other neurons, its precise value is in general not available at time t. We replace it by an approximation…this approximate learning signal Ltj only captures errors that arise at the current time step t”. That is, that the approximate learning signal that Bellec-2020 uses does not compute anything in a backward temporal direction, as doing so is impractical. This can be seen more clearly graphically in Bellec-2020 Fig. 6, where the true learning signal in Fig. 6c relies on the future value of variables, but the approximate learning signal in Fig. 6d relies only on the error gradient at the current time. Therefore, in Bellec-2020’s preferred embodiment, backwards temporal computation does not occur. 
Second, Examiner considers that even if Bellec-2020 used the true learning signal with backwards temporal computation, it is possible to compute both the learning signal and the eligibility trace simultaneously because they are done independently and in parallel. As can be seen in Bellec-2020 Fig. 6, both the true and approximate learning signal and the eligibility traces depend on separate values to be computed (i.e. they are computed independently), and are computed in parallel for neuron output zt. It is rather the opposite of what Applicant alleges, that if the learning signal and eligibility trace were not computed independently, that there would be an issue due to opposite temporal directions of computation. However, this is not the case, either in Bellec-2020 or in claim 1, and therefore there is no contradiction with Examiner’s prior statements.

Claim Objection - Allowable Subject Matter
Claim 11 has no outstanding rejection over the cited prior art. The closest identified art is Bellec et al. “Biologically inspired alternatives to backpropagation through time for learning in recurrent neural nets”. Within, the equation for a learning signal, which falls under the broadest reasonable interpretation of a spatial gradient, is described by the following equations:

    PNG
    media_image3.png
    148
    1126
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    95
    885
    media_image4.png
    Greyscale

However, the reference does not disclose the equation as recited by claim 11:

    PNG
    media_image5.png
    141
    791
    media_image5.png
    Greyscale

As noted in the rejection of claim 8 under 35 U.S.C. 102(a)(1), the term “ztj” used within the reference corresponds to the term “yt”, as they are both used for output signals. However, the output signal used within claim 11, “ytk”, is for a layer k, while “ztj” is for a neuron j. Further, the claimed equation uses multiplications while the equation 9 taught by the reference uses addition/summation. Further, the multiplication over the layers 1 to k within the claimed equation do not explicitly use the error gradient term “E”, while the equation 9 taught by the reference does. For at least the above reasons, the reference does not teach the subject matter of claim 11.
A search for the prior arts with PE2E Search has been conducted. Besides patent databases, searching over non-patent literature databases, such as IEEE Xplore, arxiv.org, and semanticscholar.com has also been performed. The prior arts searched and investigated in patent and non-patent domains do not fairly teach or suggest the above limitation recited in claim 11.
Therefore, claim 11 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Interpretation
	The terms “spatial gradient” and “temporal gradient” are often used within the claims. Applicant states in the specification that [0052] “according to embodiments the temporal gradients denoted et,θ in Equation 6 may be associated with eligibility traces and the spatial gradients denoted as Lt in Equation 7 may be associated with learning signals”. Therefore, “spatial gradient” and “temporal gradient” will be treated as related to the terms of art “learning signal” and “eligibility trace”, respectively.
Within claim 16, the term “a hybrid network” is taken to be defined as “deep RNNs or SNNs…containing one or more layers of stateless neurons”, “for example sigmoid or softmax layers” [0081].
Within claims 10, 13, and 15, some or all of the variables used in the recited equations are not explicitly defined in the claim or in any parent claims. Definitions for the variables that are not defined in these claims have been taken from the specification.

Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 12 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 12,
Claim 12 recites the limitation computing the temporal gradient component comprises computing:…ϵt,θl = [equation below].

    PNG
    media_image1.png
    78
    473
    media_image1.png
    Greyscale

However, claim 12 later recites the limitation and ϵt,θl = [other equation below].

    PNG
    media_image2.png
    68
    142
    media_image2.png
    Greyscale

The two equations appear to provide conflicting formulas for the computation of the term ϵt,θl. Therefore, it is unclear which should be used, and thus the scope of the claim is indefinite. For examination purposes, ϵt,θl will be interpreted as equal to the first formula.
Additionally, claim 12 recites the limitation θ denotes the set of predefined training parameters of the neural network, however the term “set of predefined training parameters” is not previously mentioned within the claim or its parent claim, claim 1. Therefore the term “set of predefined training parameters” lacks antecedent basis, and therefore the scope of the claim is indefinite. For examination purposes, the limitation will be interpreted as reading “θ denotes a set of predefined training parameters of the neural network”.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-5, 8-10, 13-16, 18-22, 24, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Bellec et al. “Biologically inspired alternatives to backpropagation through time for learning in recurrent neural nets”, hereinafter Bellec-2019, in view of Bellec et al. “A solution to the learning dilemma for recurrent networks of spiking neurons”, hereinafter Bellec-2020, further in view of Eleftheriou et al. (International Patent Application Publication No. 2016/069334), hereinafter Eleftheriou.

Regarding claim 1,
	Bellec-2019 teaches A computer-implemented method for training a neural network, the neural network comprising two or more layers of neuronal units, ((Bellec-2019 Pg. 17) “For all algorithms, the parameters were kept identical to those defined in the Tensorflow tutorial, with two exceptions: firstly, the networks had a single layer of 200 LSTM units instead of two to simplify the implementation, and because the second did not seem to improve performance with this configuration;”, although Bellec-2019 does not use two layers for their experiments, they still disclose that they successfully applied their method to a two layer network, albeit without performance improvement over one layer) wherein each neuronal unit has an internal state, ((Bellec-2019 Pg. 4) “We assume that each neuron j is at time t in an internal state stj ∈ Rd”) the computer-implemented method comprising:
providing training data comprising an input signal and an expected output signal to the neural network; ((Bellec-2019 Pg. 29) “The networks received preprocessed audio at the input”, preprocessed audio is an input signal, (Bellec-2019 Pg. 29) “A softmax was applied to their output, which was used to compute the cross entropy error against the target label”, a target label is an expected output signal)
computing, for each neuronal unit, a spatial gradient component ((Bellec-2019 Pg. 21, Fig. 5c) “The dependencies involved in the computation of the learning signal Ltj are shown in green.” (Bellec-2019 Pg. 21, Fig. 5c), Bellec-2019 Pg. 21, Fig. 5c shows that the dependencies include st and zt and the calculation of the learning signal includes how they interdepend, (Bellec-2019 Pg. 20) “We assume that the state at time t of each neuron j in the network can be described by an internal state vector stj ∈ Rd and an observable state ztj. The internal state includes internal variables of the neuron…The observable state is given by the output of the neuron“, it’s stated in Applicant’s specification that [0052] “the spatial gradients denoted as Lt in Equation 7 may be associated with learning signals” and [0017] “the spatial gradient components are components which take into consideration the spatial aspects of the neural network, in particular interdependencies between the individual neuronal units at each time instance”, a learning signal discloses interdependencies of the internal variables of a neuron, therefore it teaches a spatial gradient component as disclosed by the instant specification) at each time instance (Bellec-2019 Fig. 1d and Fig. 1e show learning signals updating every time instance (1 ms)) of the input signal; ((Bellec-2019 Pg. 29) “Every input step which represents the 10 ms preprocessed audio frame is fed to the LSNN network for 5 consecutive 1 ms steps”, the input is given to the neural network in 1 ms steps)

    PNG
    media_image6.png
    654
    641
    media_image6.png
    Greyscale


    PNG
    media_image7.png
    310
    835
    media_image7.png
    Greyscale

computing, for each neuronal unit, a temporal gradient component ((Bellec-2019 Pg. 6) “The intuition for eligibility traces is that a synapse remembers some of its activation history”, (Bellec-2019 Pg. 2) “The error gradient is represented here as a sum over t of an eligibility trace etji until time t - which is independent from error signals - and a learning signal Ltj that reaches this synapse at time t, see equation (1). This can be interpreted as on online merging for every time step t of eligibility traces and learning signals”, an eligibility trace incorporates temporal information from previous time steps into the error gradient, therefore it teaches a temporal gradient component as disclosed by the instant specification) at each time instance of the input signal, (Bellec-2019 Fig. 1b shows that the eligibility traces, corresponding to temporal gradient components, are computed at subsequent time instances including a given time t and the previous time t-1, i.e. at each time instance)

    PNG
    media_image8.png
    351
    433
    media_image8.png
    Greyscale

wherein computing the temporal gradient component is based on parameters related to temporal dynamics of the neuronal units, ((Bellec-2019 Pg. 21, Fig. 5b), “The dependencies involved in the computation of the eligibility traces etji are shown in blue” Bellec-2019 Pg. 21, Fig. 5b shows that the dependencies include include st and zt, (Bellec-2019 Pg. 20) “We assume that the state at time t of each neuron j in the network can be described by an internal state vector stj ∈ Rd and an observable state ztj. The internal state includes internal variables of the neuron such as its activation or membrane potential…The dynamics of a neuron’s discrete-time state evolution is described by two functions M(s, z, x, θ) and f(s), where s is an internal state vector, z is a vector of the observable network state (i.e., outputs of all neurons in the network), x is the vector of inputs to the network, and θ denotes the vector of network parameters (i.e., synaptic weights)”, Bellec-2019 Pg. 20, Equations 11 and 12 show that st and zt are computed based on parameters related to temporal dynamics of the neurons, therefore eligibility traces are ultimately based on parameters related to temporal dynamics of the neurons)

    PNG
    media_image9.png
    66
    478
    media_image9.png
    Greyscale


    PNG
    media_image10.png
    74
    534
    media_image10.png
    Greyscale

updating the temporal gradient component and the spatial gradient component for each neuronal unit at each time instance of the input signal; ((Bellec-2019 Pg. 14) “To compute these gradients when processing the interval {t - Δt, …, t}, the eligibility traces and learning signals are computed solely from data available within that interval”, Bellec-2019 Fig. 1d and Fig. 1e show learning signals updating every time instance (1 ms), additionally Bellec-2019 Fig. 1b shows an eligibility trace and a learning signal for subsequent times t-1 and t)
and performing a classification task using the updated neuronal units, ((Bellec-2019 Pg. 27) “For the classification tasks solved with e-prop 1, we consider one readout neuron ytk per output class, and the network output at time t corresponds to the readout with highest voltage. To train the recurrent networks in this setup, we replace the mean squared error by the the cross entropy error”, neurons are updated during training, and training is completed before performing classification with a machine learning model, (Bellec-2019 Pg. 2) “We show that BPTT can be represented by a sum of products based on a new factorization or errors gradients...The error gradient is represented here as a sum over t of an eligibility trace etji...and a learning signal Ltj that reaches this synapse at time t...Because of the prominent role which forward propagation of eligibility traces play in the resulting approximations to BPTT we refer to these new algorithms as e-prop”, e-prop is the name of the algorithm in which neural units are updated using a temporal gradient component (eligibility traces) and a spatial gradient component (learning signals))
wherein the neural network is implemented on a neuromorphic hardware, ((Bellec-2019 Abstract) “these algorithms provide efficient methods for on-chip training of RSNNs in neuromorphic hardware”)
Bellec-2020 teaches the following further limitations more explicitly than Bellec-2019:
and wherein the temporal component is computed independently of and in parallel with the computing of the spatial gradient component; (Bellec-2020 Fig. 6, specifically Fig. 6b and Fig. 6d, show how the eligibility trace et, i.e. the temporal gradient component, and the spatial gradient component, i.e. the learning signal approximation Lt = ∂E/∂ztj, for neuron z at time t are computed with separate dependents, the previous neuron states for et and the current summed error for Lt, and are thus computed independently, and as they are computed at the same time for the same neuron they are also computed in parallel)

    PNG
    media_image11.png
    808
    873
    media_image11.png
    Greyscale

… input weights and recursive weights of the neuronal units ((Bellec-2020 Pg. 13) “Note also that the weight updates derived in the following for the recurrent weights Wrecji also apply to the inputs weights Winji”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019 and Bellec-2020. Bellec-2019 teaches most claim limitations, but Bellec-2020 more explicitly shows that the temporal gradient component and the approximated spatial gradient component are computed independently of but in parallel with each other, and that the weights of the neurons include both input weights and recurrent weights. One of ordinary skill in the art would have motivation to combine Bellec-2019 and Bellec-2020, as Bellec-2020 teaches: (Bellec-2020 Pg. 2) “The gradient dE/dWji for the weight Wji of the synapse from neuron i to neuron j tells us how this weight should be changed in order to reduce E. It can in principle be estimated—in spite of the fact that the implicit discrete variable ztj is non-differentiable—with the help of a suitable pseudo derivative for spikes as in refs. 3,4. The key innovation is a rigorous proof (see “Methods”) that the gradient dE/dWji can be represented as a sum of products over the time steps t of the RSNN computation, where the second factor is just a local gradient that does not depend on E”. That is, that factoring a gradient into two independent components is necessary for estimation of the gradient in order to train the RSNN weights. Such a combination would be obvious.
Eleftheriou teaches the following further limitations that neither Bellec-2019 nor Bellec-2020 teach:
the neuromorphic hardware having a crossbar array including a plurality of row lines, a plurality of column lines, and a plurality of junctions arranged between the plurality of row lines and the plurality of column lines, ((Eleftheriou [0009]) “With these crossbar memory-cell arrays, each individual cell representing a synapse is connected between a respective pair of row and column lines of the array chip”, a cell connected between a pair of row and column lines in an array is at a junction between the lines in the array) each junction of the plurality of junctions comprising a resistive memory element ((Eleftheriou Abstract) “A neuromorphic synapse comprises a resistive memory cell”) having a resistance state, the resistance states representing [input weights and recursive] weights of the neuronal units ((Eleftheriou [0015]) “Synaptic weight can be modified by applying a programming signal to the cell to program the cell resistance”, a synapse is a neuronal unit, Bellec-2020 teaches weights that are specifically input weights and recursive weights)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, and Eleftheriou. Bellec-2019 and Bellec-2020 jointly teach a method for training a particular kind of neural network, including input weights and recurrent weights, as well as implementation of the neural network on neuromorphic hardware. Eleftheriou teaches a particular kind of neuromorphic hardware, including a crossbar array of resistive memory elements. One of ordinary skill in the art would have motivation to combine Bellec-2019, Bellec-2020, and Eleftheriou, as Bellec-2019 and Bellec-2020 already teach implementation on neuromorphic hardware, and substitution of the particular neuromorphic hardware of Eleftheriou predictably also allows for an efficient hardware implementation of the neural network. Such a combination would be obvious.

Regarding claim 3,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method according to claim 1, further comprising
Bellec-2019 further teaches:
updating a predefined set of training parameters of the neural network ((Bellec-2019 Pg. 16) “This is done in e-prop 3 by computing simultaneously the gradients (dE’ / dθ) and (dE’ / d𝚿) of an error function E’ that combines the error function E...These gradients are approximated to update the network parameters with any variant of stochastic gradient descent”)
as a function of the spatial gradient component and the temporal gradient component (Bellec-2019 Pg. 20) “we show that the total derivative of the error function E with respect to the parameters can be written as a product of learning signals Ltj and eligibility traces etji”, as described in the quote above, the training parameters are updated as a function of the derivative of error function E, which is itself a function of the spatial and temporal gradient components, a learning signal is a spatial gradient component, and an eligibility trace is a temporal gradient component)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, and Eleftheriou for the parent claim of claim 3, claim 1, for reasons mentioned previously. All additional limitations in claim 3 are present in Bellec-2019, so no additional rationale for combination is required.

Regarding claim 4,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method according to claim 3, further comprising
Bellec-2019 further teaches:
updating the predefined set of training parameters of the neural network at predefined time instances ((Bellec-2019 Pg. 32) “For each batch of 256 sequences, the parameters are updated every Δt = 4 time steps”)
as a function of the spatial gradient component and the temporal gradient component ((Bellec-2019 Pg. 20) “we show that the total derivative of the error function E with respect to the parameters can be written as a product of learning signals Ltj and eligibility traces etji”, as described in the quote above, the training parameters are updated as a function of the derivative of error function E, which is itself a function of the spatial and temporal gradient components. Note that despite the gulf in pages, both this quote and the one for the limitation above are both describing aspects of Bellec-2019’s e-prop 3 algorithm, a learning signal is a spatial gradient component, and an eligibility trace is a temporal gradient component)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, and Eleftheriou for the parent claim of claim 4, claim 3, for reasons mentioned previously. All additional limitations in claim 4 are present in Bellec-2019, so no additional rationale for combination is required.

Regarding claim 5,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method according to claim 4.
	Bellec-2019 further teaches:
further comprising updating the predefined set of training parameters of the neural network ((Bellec-2019 Pg. 16) “This is done in e-prop 3 by computing simultaneously the gradients (dE’ / dθ) and (dE’ / d𝚿) of an error function E’ that combines the error function E...These gradients are approximated to update the network parameters with any variant of stochastic gradient descent”)
as a function of the spatial gradient component and the temporal gradient component ((Bellec-2019 Pg. 20) “we show that the total derivative of the error function E with respect to the parameters can be written as a product of learning signals Ltj and eligibility traces etji”, as described in the quote above, the training parameters are updated as a function of the derivative of error function E, which is itself a function of the spatial and temporal gradient components, a learning signal is a spatial gradient component, and an eligibility trace is a temporal gradient component)
Bellec-2020 further teaches:
at each time instance ((Bellec-2020 Pg. 4) “This equation defines a clear program for approximating the network loss gradient through local rules for synaptic plasticity: change each weight Wji at step t proportionally to -Ltjetji, or accumulate these so-called tags in a hidden variable that is translated occasionally into an actual weight change”, weights are parameters)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, and Eleftheriou for the parent claim of claim 5, claim 4, for reasons mentioned previously. All additional limitations in claim 5 are present in Bellec-2019 or Bellec-2020, so no additional rationale for combination is required.

Regarding claim 8,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method according to claim 1, wherein	Bellec-2019 further teaches:
computing the spatial gradient component comprises computing: 
Lt := (∂Et / ∂yt) 
wherein (Bellec-2019 Equation 4, Pg. 6, as the alternative embodiment described in (Bellec-2019 Pg. 25) “The first approximation is to focus on the error at the present time t and ignore dependencies on future errors in the computation of the total derivative…As a result the total derivative (dE / dztj) is replaced by the partial derivative (∂E / ∂ztj) in equation (4)”, a learning signal is a spatial gradient component)

    PNG
    media_image3.png
    148
    1126
    media_image3.png
    Greyscale

t denotes a respective time instance; ((Bellec-2019 Pg. 7) “at the current time t”)
Lt denotes the spatial gradient component at the respective time instance t ((Bellec-2019 Pg. 4) “learning signals Ltj”, a learning signal is a spatial gradient component)
Et denotes a network error between the expected output signal and a current output signal at the respective time instance t; ((Bellec-2019 Pg. 4) “The general goal is to approximate the gradients of the network error function E”, (Bellec-2019 Pg. 25) “the error function is given by the mean squared error 
E = (1/2) Σt,k (ytk – y*,tk)2, 
with y*,tk being the target output at time t”
and yt denotes the current output signal at the respective time instance t ((Bellec-2019 Pg. 6) “the existence of a spike ztj”, a spike is an output signal in a spiking neural network)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, and Eleftheriou for the parent claim of claim 8, claim 1, for reasons mentioned previously. All additional limitations in claim 8 are present in Bellec-2019, so no additional rationale for combination is required.

Regarding claim 9,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method according to claim 1.
	Bellec-2019 further teaches:
further comprising updating a predefined set of training parameters of the neural network ((Bellec-2019 Pg. 16) “This is done in e-prop 3 by computing simultaneously the gradients (dE’ / dθ) and (dE’ / d𝚿) of an error function E’ that combines the error function E...These gradients are approximated to update the network parameters with any variant of stochastic gradient descent”)
as a function of the spatial gradient component and the temporal gradient component ((Bellec-2019 Pg. 20) “we show that the total derivative of the error function E with respect to the parameters can be written as a product of learning signals Ltj and eligibility traces etji”, as described in the quote above, the training parameters are updated as a function of the derivative of error function E, which is itself a function of the spatial and temporal gradient components, a learning signal is a spatial gradient component, and an eligibility trace is a temporal gradient component)
wherein t denotes a respective time instance; ((Bellec-2019 Pg. 7) “at the current time t”)
yt denotes a current output signal at the respective time instance t; ((Bellec-2019 Pg. 6) “the existence of a spike ztj”, a spike is an output signal in a spiking neural network)
st denotes a unit state at the respective time instance t; ((Bellec-2019 Pg. 6) “the neuron state stj”, broadest reasonable interpretation of “unit” is one of the “neuronal units” recited in parent claim 1)
θ denotes the predefined set of training parameters of the neural network; ((Bellec-2019 Pg. 2) “the synaptic weights θji”)
Bellec-2020 further teaches:
wherein computing the temporal gradient component comprises computing: 
et,θ := (dyt / dθ) (Bellec-2020 Pg. 3, Equation 2 computes their temporal gradient in the same manner as Applicant does, the derivative of the output signal at time t with respect to the training parameters. Note also that in Bellec-2020, the variable θ is renamed W, as in (Bellec-2020 Pg. 2) “synaptic weights W”)

    PNG
    media_image12.png
    210
    1024
    media_image12.png
    Greyscale

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, and Eleftheriou for the parent claim of claim 9, claim 1, for reasons mentioned previously. All additional limitations in claim 9 are present in Bellec-2019 or Bellec-2020, so no additional rationale for combination is required.

Regarding claim 10,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method according to claim 9.
	Bellec-2019 further teaches:
wherein updating the predefined set of training parameters comprises computing: 
(Δθ = αΣtLtet,θ) 
(Bellec-2019 Equation 7, Pg. 13)

    PNG
    media_image13.png
    137
    667
    media_image13.png
    Greyscale

wherein α denotes a learning rate ((Bellec-2019 Pg. 13) “where η represents a fixed learning rate”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, and Eleftheriou for the parent claim of claim 10, claim 9, for reasons mentioned previously. All additional limitations in claim 10 are present in Bellec-2019, so no additional rationale for combination is required.

Regarding claim 13,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method according to claim 1, further comprising
	Bellec-2019 further teaches:
updating a predefined set of training parameters of the neural network ((Bellec-2019 Pg. 16) “This is done in e-prop 3 by computing simultaneously the gradients (dE’ / dθ) and (dE’ / d𝚿) of an error function E’ that combines the error function E...These gradients are approximated to update the network parameters with any variant of stochastic gradient descent”)
as a function of the spatial gradient component and the temporal gradient component ((Bellec-2019 Pg. 20) “we show that the total derivative of the error function E with respect to the parameters can be written as a product of learning signals Ltj and eligibility traces etji”, as described in the quote above, the training parameters are updated as a function of the derivative of error function E, which is itself a function of the spatial and temporal gradient components, a learning signal is a spatial gradient component, and an eligibility trace is a temporal gradient component) 
wherein updating the predefined set of training parameters comprises computing: 
((dE / dθl) = Σt[Ltlet,θl + R]), 
wherein R denotes a residual term (Substituting (dE / dzt’j) in as Ltj (since the two terms are equal as stated in Bellec-2019 Equation 4, Pg. 6) in Bellec-2019 Equation 1, Pg. 4, yields the equation 
((dE / dθji) = Σt((∂E / ∂ztj)etji) + Σt((Σi(dE / dst’+1i)(∂st’+1i / ∂zt’j))etji)), 
where (∂E / ∂ztj) is an approximation for Ltj as described in Bellec-2019 Equation 4, Pg. 6, as the alternative embodiment described in (Bellec-2019 Pg. 25) “The first approximation is to focus on the error at the present time t and ignore dependencies on future errors in the computation of the total derivative…As a result the total derivative (dE / dztj) is replaced by the partial derivative (∂E / ∂ztj) in equation (4)”, and Σt((Σi(dE / dst’+1i)(∂st’+1i / ∂zt’j))etji) is a residual term R).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, and Eleftheriou for the parent claim of claim 13, claim 1, for reasons mentioned previously. All additional limitations in claim 13 are present in Bellec-2019, so no additional rationale for combination is required.

Regarding claim 14,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method according to claim 13, wherein
	Bellec-2019 further teaches:
the residual term R is approximated with a combination of eligibility traces and learning signals (Substituting (dE / dzt’j) in as Ltj (since the two terms are equal as stated in Bellec-2019 Equation 4, Pg. 6) in Bellec-2019 Equation 1, Pg. 4, yields the equation 
((dE / dθji) = Σt((∂E / ∂ztj)etji) + Σt((Σi(dE / dst’+1i)(∂st’+1i / ∂zt’j))etji)), 
where (∂E / ∂ztj) is an approximation for Ltj as described in Bellec-2019 Equation 4, Pg. 6, as the alternative embodiment described in (Bellec-2019 Pg. 25) “The first approximation is to focus on the error at the present time t and ignore dependencies on future errors in the computation of the total derivative…As a result the total derivative (dE / dztj) is replaced by the partial derivative (∂E / ∂ztj) in equation (4)”, and Σt((Σi(dE / dst’+1i)(∂st’+1i / ∂zt’j))etji) is a residual term R. R here is a sum of products of eligibility traces etji and part of the learning signals 
(Σi(dE / dst’+1i)(∂st’+1i / ∂zt’j))), and is therefore approximated with a combination of eligibility traces and learning signals.
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, and Eleftheriou for the parent claim of claim 14, claim 13, for reasons mentioned previously. All additional limitations in claim 14 are present in Bellec-2019, so no additional rationale for combination is required.

Regarding claim 15,
	Bellec-2019 and Bellec-2020 jointly teach The computer-implemented method according to claim 1, further comprising
	Bellec-2019 further teaches:
updating a predefined set of training parameters of the neural network ((Bellec-2019 Pg. 16) “This is done in e-prop 3 by computing simultaneously the gradients (dE’ / dθ) and (dE’ / d𝚿) of an error function E’ that combines the error function E...These gradients are approximated to update the network parameters with any variant of stochastic gradient descent”)
as a function of the spatial and the temporal gradient components ((Bellec-2019 Pg. 20) “we show that the total derivative of the error function E with respect to the parameters can be written as a product of learning signals Ltj and eligibility traces etji”, as described in the quote above, the training parameters are updated as a function of the derivative of error function E, which is itself a function of the spatial and temporal gradient components, a learning signal is a spatial gradient component, and an eligibility trace is a temporal gradient component)
wherein updating the network parameters comprises computing: 
(Δθl = αΣtLtlet,θl)
(Bellec-2019 Equation 7, Pg. 13, moving θinit to the left side of the equation results in Δθ)

    PNG
    media_image14.png
    149
    637
    media_image14.png
    Greyscale

wherein α is a learning rate ((Bellec-2019 Pg. 13) “where η represents a fixed learning rate”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, and Eleftheriou for the parent claim of claim 15, claim 1, for reasons mentioned previously. All additional limitations in claim 15 are present in Bellec-2019, so no additional rationale for combination is required.

Regarding claim 16,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method according to claim 1, wherein the neural network is selected from the group consisting of:
Bellec-2019 further teaches:
a recurrent neural network ((Bellec-2019 Pg. 20) “Our proposed learning algorithms for recurrent neural networks can be applied to a large class of spiking and non-spiking neural network models”)
a hybrid network ((Bellec-2019 Pg. 27) “To train the recurrent networks in this setup…the output class distribution predicted by the network is given as 
πtk = softmax(ytk)”, 
as mentioned in the claim interpretation section, Applicant’s specification states that a deep recurrent network containing one or more softmax layers qualifies as a hybrid network)
a spiking neural network ((Bellec-2019 Pg. 20) “Our proposed learning algorithms for recurrent neural networks can be applied to a large class of spiking and non-spiking neural network models”)
and a generic recurrent network, the generic recurrent network comprising long-short-term-memory units and gated recurrent units ((Bellec-2019 Pg. 4) “The learning rules that we describe can be applied to a variety of recurrent neural network models:…and networks of LSTM (long short-term memory) units”, (Bellec-2019 Pg. 24) “For LSTM units…One defines the network dynamics that involves the usual input, forget and output gates”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, and Eleftheriou for the parent claim of claim 16, claim 1, for reasons mentioned previously. All additional limitations in claim 16 are present in Bellec-2019, so no additional rationale for combination is required.

Regarding claim 18,
	Claim 18 recites a neural network that performs the function of the method of claim 1. Additionally, claim 18 discloses the neural network is configured to perform a method for training a neural network, with “a method” being identical to the method recited in claim 1. Bellec-2019 Pg. 12 Figure 3b teaches the use of a neural network to produce the learning signals used to train a neural network. All other limitations in claim 18 are substantially the same as those in claim 1, therefore the same rationale for rejection applies.

    PNG
    media_image15.png
    566
    467
    media_image15.png
    Greyscale


Regarding claim 19,
Claim 19 discloses a neural network that implements the method of claim 5 with substantially the same limitation; therefore the same rationale for rejection applies.

Regarding claim 20,
	Claim 20 recites a manufacture that performs the function of the method of claim 1. Additionally, claim 20 discloses:
that the neural network being trained is specifically a recurrent neural network ((Bellec-2019 Pg. 4) “The learning rules that we describe can be applied to a variety of recurrent neural network models”)
That the manufacture is specifically a computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by the neural network to cause the neural network to perform a method ((Bellec-2019 Pg. 17) “Here, we used the same implementation and baseline performance as provided freely by the Tensorflow tutorial on recurrent neural networks”, a Tensorflow implementation encompasses program instructions executable to cause a neural network to perform a method, that the instructions are on a computer readable storage medium is inherent)
All other limitations in claim 20 are substantially the same as those in claim 1, therefore the same rationale for rejection applies.

Regarding claim 21,
Claim 21 discloses a computer program product that implements the method of claim 5 with substantially the same limitation; therefore the same rationale for rejection applies.

Regarding claim 22,
Claim 22 recites a computing system that performs the function of the method of claim 1. Specifically, claim 22 recites a computing system comprising at least one processor and a memory storing instructions that, when executed by the processor, cause the computing system to train a predefined set of training parameters of a neural network ((Bellec-2019 Pg. 34) “Computations were primarily carried out on the Supercomputer JUWELS at Jülich Supercomputing Centre”, a supercomputer is a computing system comprising a processor and a memory). All other limitations in claim 22 are substantially the same as those in claim 1, therefore the same rationale for rejection applies.

Regarding claim 24,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method of claim 1, further comprising
Bellec-2019 further teaches:
facilitating an efficient implementation on hardware accelerators ((Bellec-2019 Abstract) “We show that an online merging of locally available information during a computation with suitable top-down learning signals in real-time provides highly capable approximations to BPTT…The resulting new generation of learning algorithms for recurrent neural networks provides a new understanding of network learning in the brain that can be tested experimentally. In addition, these algorithms provide efficient methods for on-chip training of RSNNs in neuromorphic hardware”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, and Eleftheriou for the parent claim of claim 24, claim 1, for reasons mentioned previously. All additional limitations in claim 24 are present in Bellec-2019, so no additional rationale for combination is required.

Regarding claim 25,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method of claim 1
Bellec-2020 further teaches:
wherein updating the temporal gradient component and the spatial gradient component for each neuronal unit at each time instance of the input signal comprises updating the input weights and/or recursive weights of the neuronal units ((Bellec-2020 Pg. 13) “Note also that the weight updates derived in the following for the recurrent weights Wrecji also apply to the inputs weights Winji”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, and Eleftheriou for the parent claim of claim 25, claim 1, for reasons mentioned previously. All additional limitations in claim 24 are present in Bellec-2020, so no additional rationale for combination is required.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Bellec-2019, in view of Bellec-2020, further in view of Eleftheriou, further in view of Wu et al. “Spatio-Temporal Backpropagation for Training High-Performance Spiking Neural Networks”, hereinafter Wu.

Regarding claim 7,
Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method according to claim 1, wherein:
Wu teaches the following further limitation more explicitly than either Bellec-2019 or Bellec-2020 and that Eleftheriou does not teach:
the spatial gradient component is based on connectivity parameters of the neural network ((Wu Pg. 3) “As we know, the efficiency of error BP for training DNNs greatly benefits from the iterative representation of gradient descent which yields the chain rule for layer-by-layer error propagation in the SD backward pass. This motivates us to propose an iterative LIF based SNNS in which the iterations occur in both the SD and TD”, Wu Pg. 3, Fig. 1 shows iterative gradients occur in the SD (spatial domain), i.e. there is a spatial gradient, and that the Spatial Domain consists of the connections between neurons, Applicant’s specification additionally states: [0017] “According to embodiments, the connectivity parameters may be defined as number or the set of transmission lines that allow for information exchange between individual neuronal units”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, Eleftheriou, and Wu. Bellec-2019, Bellec-2020, and Eleftheriou jointly teach the method of claim 1, including a spatial gradient component. Wu teaches a spatial gradient based on the connectivity parameters of a neural network. One of ordinary skill would have motivation to combine Bellec-2019, Bellec-2020, Eleftheriou, and Wu to have the spatial gradient of Bellec-2019, Bellec-2020, and Eleftheriou be explicitly based off of connectivity parameters, as (Wu Fig. 1, Pg. 3) “In addition to the layer-by-layer spatial dataflow like ANNs, SNNs are known for”, and ANNs, or artificial neural networks, are the most common and well-known type of neural network.

Claims 6, 12, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Bellec-2019, in view of Bellec-2020, further in view of Eleftheriou, further in view of Nugraha and Su “Analysis of Layer Efficiency and Layer Reduction on Pre-trained Deep Learning Models”, hereinafter Nugraha.

Regarding claim 6,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method according to claim 1.
	Bellec-2019 further teaches:
wherein the spatial gradient component is computed ((Bellec-2019 Pg. 21, Fig. 5c) “The dependencies involved in the computation of the learning signal Ltj are shown in green”, a learning signal is a spatial gradient component) for [each of] the two or more layers of neuronal units ((Bellec-2019 Pg. 17) “the networks had a single layer of 200 LSTM units instead of two…because the second did not seem to improve performance”, Bellec-2019 does not teach each layer having its own gradient computed) at each time instance ((Bellec-2019 Pg. 14) “To compute these gradients when processing the interval {t - Δt, …, t}, the eligibility traces and learning signals are computed solely from data available within that interval”, Bellec-2019 Fig. 1d and Fig. 1e show learning signals updating every time instance (1 ms))
and the temporal gradient component is computed ((Bellec-2019 Pg. 21, Fig. 5b) “The dependencies involved in the computation of the eligibility traces etji are shown in blue”, an eligibility trace is a temporal gradient component) for [each of] the two or more layers of neuronal units ((Bellec-2019 Pg. 17) “the networks had a single layer of 200 LSTM units instead of two…because the second did not seem to improve performance”, Bellec-2019 does not teach each layer having its own gradient computed) at each time instance ((Bellec-2019 Pg. 14) “To compute these gradients when processing the interval {t - Δt, …, t}, the eligibility traces and learning signals are computed solely from data available within that interval”, Bellec-2019 Fig. 1d and Fig. 1e show learning signals updating every time instance (1 ms))
Nugraha teaches the following further limitation that neither Bellec-2019, nor Bellec-2020, nor Eleftheriou teaches:
for each of the two or more layers ((Nugraha Pg. 3) “Measure Grad-CAM by evaluating the mean gradient of each layer”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, Eleftheriou, and Nugraha. Bellec-2019, Bellec-2020, and Eleftheriou jointly teach the method of claim 1, in which a spatial and a temporal gradient component are computed at every time instance. Nugraha teaches calculating a gradient for every layer. One of ordinary skill in the art would have motivation to combine Bellec-2019, Bellec-2020, Eleftheriou, and Nugraha to have both spatial and temporal gradients be computed for every layer, as doing so allows for the importance of each layer to be determined, as Nugraha teaches: (Nugraha Pg. 3) “we rank the layers using layer averaging. Layer averaging computes the average measurement of each layer to analyze the importance of the layers”.

Regarding claim 12,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method according to claim 1.
	Bellec-2019 further teaches:
wherein computing the temporal gradient component comprises computing: 
et,θl = ((∂ytl / ∂stl)ϵt,θl + (∂ytl / ∂θl)) 
(Bellec-2019 Equation 3, Pg. 6)

    PNG
    media_image16.png
    103
    931
    media_image16.png
    Greyscale

ϵt,θl = ((dstl / dst-1l)ϵt-1,θl + ((∂stl / ∂θl) + (∂stl / ∂yt-1l)(∂yt-1l / ∂θl)))
(Bellec-2019 Equation 2, Pg. 6, combined with (Bellec-2019, Pg. 5) “the internal neuron dynamics isolated from the rest of the network is described by 
Dt−1j = (∂M / ∂st-1j)(st-1j, zt−1, xt, θ)” 
and (Bellec-2019, Pg. 5) “the network dynamics is formalized through the equation 
stj = M(st-1j, zt-1, xt, θ)”) 

    PNG
    media_image17.png
    97
    1006
    media_image17.png
    Greyscale

t denotes a respective time instance; ((Bellec-2019 Pg. 7) “at the current time t”)
yt denotes a current output signal at time instance t; ((Bellec-2019 Pg. 6) “the existence of a spike ztj”, a spike is an output signal in a spiking neural network)
st denotes a unit state at time instance t; ((Bellec-2019 Pg. 6) “the neuron state stj”, broadest reasonable interpretation of “unit” is one of the “neuronal units” recited in parent claim 1)
θ denotes the set of predefined training parameters of the neural network; and ϵt,θl = (∂stl / ∂θl) ((Bellec-2019 Pg. 2) “the synaptic weights θji”)
Nugraha teaches the following further limitation that neither Bellec-2019, nor Bellec-2020, nor Eleftheriou teaches:
l denotes a respective layer (Equations 2 and 3 in Bellec are also computed without reference to layers) ((Nugraha Pg. 3) “Measure Grad-CAM by evaluating the mean gradient of each layer”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, Eleftheriou, and Nugraha. Bellec-2019, Bellec-2020, and Eleftheriou jointly teach the method of claim 1, in which the temporal gradient component is computed using a variant of the recited equation without reference to layers (as the gradients in Bellec-2019 are per-neuron). Nugraha teaches calculating a gradient for every layer. One of ordinary skill in the art would have motivation to combine Bellec-2019, Bellec-2020, Eleftheriou, and Nugraha to have the temporal gradients be computed for every layer, as doing so allows for the importance of each layer to be determined, as Nugraha teaches: (Nugraha Pg. 3) “we rank the layers using layer averaging. Layer averaging computes the average measurement of each layer to analyze the importance of the layers”.

Regarding claim 17,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computer-implemented method according to claim 1.
	Bellec-2019 further teaches:
updating a predefined set of training parameters of the neural network ((Bellec-2019 Pg. 16) “This is done in e-prop 3 by computing simultaneously the gradients (dE’ / dθ) and (dE’ / d𝚿) of an error function E’ that combines the error function E...These gradients are approximated to update the network parameters with any variant of stochastic gradient descent”)
as a function of the spatial gradient component and the temporal gradient component ((Bellec-2019 Pg. 20) “we show that the total derivative of the error function E with respect to the parameters can be written as a product of learning signals Ltj and eligibility traces etji”, as described in the quote above, the training parameters are updated as a function of the derivative of error function E, which is itself a function of the spatial and temporal gradient components, a learning signal is a spatial gradient component, and an eligibility trace is a temporal gradient component)
computing the temporal gradient component comprises computing: 
et,θl = ((∂ytl / ∂stl)ϵt,θl + (∂ytl / ∂θl) + (∂ytl / ∂yt-1l)et-1,θl)
(Bellec-2019 Equation 3, Pg. 6)
t denotes a respective time instance; ((Bellec-2019 Pg. 7) “at the current time t”)
yt denotes a current output signal ((Bellec-2019 Pg. 6) “the existence of a spike ztj”, a spike is an output signal in a spiking neural network)
st denotes a current unit state ((Bellec-2019 Pg. 6) “the neuron state stj”, broadest reasonable interpretation of “unit” is one of the “neuronal units” recited in parent claim 1)
θ denotes trainable parameters of the network; ((Bellec-2019 Pg. 2) “the synaptic weights θji”)
and ϵt,θl = (∂stl / ∂θl) 
((Bellec-2019 Pg. 18) “Instead of using the eligibility trace vectors as defined in equation (37) we used
ϵtji = (∂stj / ∂θji)”)
Nugraha teaches the following further limitation that neither Bellec-2019, nor Bellec-2020, nor Eleftheriou teaches:
l denotes the respective layer (Equations 2 and 3 in Bellec are also computed without reference to layers) ((Nugraha Pg. 3) “Measure Grad-CAM by evaluating the mean gradient of each layer”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, Eleftheriou, and Nugraha. Bellec-2019, Bellec-2020, and Eleftheriou jointly teach the method of claim 1, in which the temporal gradient component is computed using a variant of the recited equation without reference to layers (as the gradients in Bellec-2019 are per-neuron). Nugraha teaches calculating a gradient for every layer. One of ordinary skill in the art would have motivation to combine Bellec-2019, Bellec-2020, Eleftheriou, and Nugraha to have the temporal gradients be computed for every layer, as doing so allows for the importance of each layer to be determined, as Nugraha teaches: (Nugraha Pg. 3) “we rank the layers using layer averaging. Layer averaging computes the average measurement of each layer to analyze the importance of the layers”.

Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Bellec-2019, in view of Bellec-2020, further in view of Eleftheriou, further in view of Fouda et al. “Spiking Neural Networks for Inference and Learning: A Memristor-based Design Perspective”, hereinafter Fouda.

Regarding claim 23,
	Bellec-2019, Bellec-2020, and Eleftheriou jointly teach The computing system according to claim 22.
	Bellec-2019 further teaches:
further comprising a memristive [device] ((Bellec-2019 Pg. 19) “new 3-terminal memristive synapses (Yang et al., 2017) are likely to support an efficient combination of local eligibility traces with top-down error signals in the hardware”)
Fouda teaches the following further limitation that neither Bellec-2019 nor Bellec-2020 teach, and more explicitly than Eleftheriou teaches:
a memristive memory array ((Fouda Pg. 3) “A minimum requirement for neural network applications is a memory to store the synaptic weight. Memristive devices can be used to realize these such synapses. A device is referred to as a memristive device if it exhibits pinched hysteresis behavior in the current-voltage plane which indicates a memory behavior in its resistance. Many physical devices exhibit memristive behaviors such as…resistive RAM (RRAM or ReRAM)”, according to Fouda Pg. 4, Fig. 2 the memristors are organized as a crossbar array)

    PNG
    media_image18.png
    341
    646
    media_image18.png
    Greyscale

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Bellec-2019, Bellec-2020, Eleftheriou, and Fouda. Bellec-2019, Bellec-2020, and Eleftheriou jointly teach the computing system of claim 22, in which a recurrent neural network is taught, as well as that memristive devices are a viable way to implement it. Fouda teaches that a memristive memory array is a method of implementing a neural network. One of ordinary skill in the art would have motivation to combine Bellec-2019, Bellec-2020, Eleftheriou, and Fouda to have the memristive device explicitly be a memristive memory array, specifically RRAM, as (Fouda Pg. 3) “RRAMs have a promising potential for neuromorphic applications due to high area density, stackability, and low write energy”.







Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Vinyals et al. (International Patent Application Publication No. 2017/201506) teaches techniques for determining synthetic gradients for subnetworks of a neural network, such as individual layers of the neural network, in order to update the parameters for that subnetwork.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VICTOR A NAULT whose telephone number is (703) 756-5745. The examiner can normally be reached M - F, 12 - 8.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/V.A.N./Examiner, Art Unit 2124                                                                                                                                                                                                        

/Kevin W Figueroa/Primary Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

Jun 05, 2021
Application Filed
Sep 16, 2024
Non-Final Rejection — §101, §103, §112
Jan 17, 2025
Response Filed
Mar 19, 2025
Final Rejection — §101, §103, §112
Jun 04, 2025
Examiner Interview Summary
Jun 04, 2025
Applicant Interview (Telephonic)
Aug 19, 2025
Request for Continued Examination
Aug 26, 2025
Response after Non-Final Action
Sep 22, 2025
Non-Final Rejection — §101, §103, §112
Dec 03, 2025
Interview Requested
Dec 09, 2025
Applicant Interview (Telephonic)
Dec 09, 2025
Examiner Interview Summary
Dec 18, 2025
Response Filed
Feb 18, 2026
Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/571,899
Patent 12579429
DEEP LEARNING BASED EMAIL CLASSIFICATION
2y 5m to grant Granted Mar 17, 2026
17/663,579
Patent 12566953
AUTOMATED PROCESSING OF FEEDBACK DATA TO IDENTIFY REAL-TIME CHANGES
2y 5m to grant Granted Mar 03, 2026
17/730,413
Patent 12561563
AUTOMATED PROCESSING OF FEEDBACK DATA TO IDENTIFY REAL-TIME CHANGES
2y 5m to grant Granted Feb 24, 2026
17/517,313
Patent 12468939
OBJECT DISCOVERY USING AN AUTOENCODER
2y 5m to grant Granted Nov 11, 2025
17/578,759
Patent 12446600
TWO-STAGE SAMPLING FOR ACCELERATED DEFORMULATION GENERATION
2y 5m to grant Granted Oct 21, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
62%
Grant Probability
99%
With Interview (+83.3%)
3y 11m
Median Time to Grant
High
PTA Risk
Based on 13 resolved cases by this examiner. Grant probability derived from career allow rate.