Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Remarks
This Office Action is responsive to Applicants' Amendment filed on December 15, 2025, in which claims 1 and 16 are currently amended. Claims 2 and 15 are canceled. Claims 21 and 22 are newly added. Claims 1, 3-14, and 16-22 are currently pending.
Response to Arguments
Applicant’s arguments with respect to rejection of claims 1, 3-14, and 16-22 under 35 U.S.C. 102/103 based on amendment have been considered, however, are not persuasive.
In response to Applicant’s arguments on p. 7 of the Remarks submitted 12/15/2025 that Examiner indicated the present amendments would overcome the cited art. Examiner agrees that the amended limitation overcomes the previous citation. However, After further consideration additional citations have been identified that disclose the amended claim limitation. Examiner notes that the messages in the primary reference of Kirsch are explicitly abstractions of the input/output of the LSTM/RNN inputs/outputs ([p. 3] "messages are essentially inputs and outputs of sub-RNNs") where the sub-RNN input and output computation explicitly comprises an interaction term which comprises a matrix C in Eqn. 3 (or equivalently U in Eqn. 7) that controls the mixing of states (scai) ([p. 2] "This computation describes A · B independent computation paths (A · B independent sub-RNNs) which we connect using an interaction term [See Eqn. 3 sum(scaiCij)] where CERN^NxN. This recursion constitutes the VS-ML RNN with VM:=(W,C)"), Kirsch explicitly defines this in the forward and backwards learning procedure and explicitly differentiates forward messages from backward messages ([p. 4 ]"Messages−→ m(1)and←−m(K) are externally specified inputs" [p. 3] "updated Equation 3 with distinct layers k is defined as […] In this case information would only flow into one direction which we can compensate for using an additional backward term […] Sbci^(k+1)*Uij"). For at least these reasons Examiner believes that it is reasonable and appropriate to maintain the rejection under 35 U.S.C. 102 in view of Kirsch and similarly under 35 U.S.C. 103 in view of the respective combinations of Liu and Guo with Kirsch.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1, 3-8, 10, 13, 14, 16-18, and 20-22 are rejected under U.S.C. §102(a)(1) as being anticipated by Kirsch (“Meta Learning Backpropagation And Improving It”, 2021).
PNG
media_image1.png
310
312
media_image1.png
Greyscale
FIG. 3 of Kirsch
Regarding claim 1, Kirsch teaches A computing system featuring a bi-directional artificial neural network, the computing system comprising:([p. 12 §C] "LSTM implementation We implement the VS-ML RNN using A · B LSTMs as in Equation 6 with forward and backward messages as described in Equation 8")
one or more processors; and one or more non-transitory computer-readable media that collectively store:([p. 13 §C.2] "Meta Training Meta training is done across 8 GPUs using distributed gradient descent, each running one trajectory of 500 online examples. The gradients on VM are then averaged across GPUs")
an artificial neural network comprising a plurality of neurons and configured to forward process input data in a forward direction and backward process feedback data in a backward direction opposite to the forward direction;([p. 3] "the VS-ML RNN implements an NN with complex neurons (here 2 neurons)" [p. 12 §C] "LSTM implementation We implement the VS-ML RNN using A · B LSTMs as in Equation 6 with forward and backward messages as described in Equation 8" See also FIG. 1 and FIG. 3)
wherein at least a first neuron of the plurality of neurons is configured to maintain a plurality of different states;([p. 3 §3.5] "we stack multiple VS-ML RNNs where their states are untied and their parameters are tied. 3 This corresponds to Figure 1c where the states s (k+1) of the second column of sub-RNNs are distinct from the first column s (k)")
wherein a machine-learned forward transform parameter set comprises one or more matrices ([p. 3] "messages are essentially inputs and outputs of sub-RNNs" [p. 2] "This computation describes A · B independent computation paths (A · B independent sub-RNNs) which we connect using an interaction term [See Eqn. 3 sum(scaiCij)] where CERN^NxN. This recursion constitutes the VS-ML RNN with VM:=(W,C)" [p. 3] "Neuron activations correspond to messages m-> produced by the interaction term which are fed back into the sub-RNNS [...] We can also write Equation 3 as multiple sub-RNNs passing messages m-> : R^N -> R^N' to each other to determine the next state" Kirsch’s messages are an abstraction corresponding to the RNN input/output mechanics and explicitly tied to the interaction terms, those terms are explicitly written as matrix-weighted mixing (via C and U), C and U therefore function as the transform parameter sets comprising matrices and machine learned values. C is explicitly a matrix having dimensions NxN)
comprising one or more learned parameter values that control an amount of mixing between each of the plurality of different states of the first neuron during forward processing; and([p. 5] "We do this by optimizing VM := {W,C} to (1) store a weight w and bias b as a sub set of each state sab, (2) compute y = tanh(x)w + b to implement neural forward computation, and (3) update w and b according to the backpropagation" Kirsch explicitly learns matrix C through forward and backward propagation (learning), C being explicitly used to control an amount of mixing between states as is explicitly shown in Eqn. 3 interaction term scai*Cij.)
wherein a machine-learned backward transform parameter set comprises one or more matrices comprising one or more learned parameter values that control an amount of mixing between each of the plurality of different states of the first neuron during backward processing; and([p. 4] "If we then duplicate this RNN and connect them we obtain the VS-ML RNN from Equation 3 or Equation 7. This corresponds to connections in the opposite direction compared to Figure 1c. In the message passing view this is s (k) ab ← f(s (k) ab , P c −→m(s (k−1) ca), P c ←−m(s (k+1) bc)) with a forward message function −→m : R N → R N0 and a backward message function ←−m : R N → R N00 . We denote the resulting summed incoming messages to sub-RNN ab at layer k by −→m(k) ∈ R A(k)×N0 where −→m (k) a := P c −→m(s (k−1) ca) and by ←−m(k) ∈ R B(k)×N00 where ←−m (k) b := P c ←−m(s (k+1) bc). Thus, our state update equation can also be written as [Eqn. 8]" m← interpreted as learned backward transform parameter. Messages are mixed between states according to the learned parameters. Because Kirsch applies separate forward and backward message functions to different directional inputs, these functions are interpreted as being implemented using different learned matrices having distinct machine-learned forward and backward transform parameter sets that determine how state information is combined, thereby controlling the amount of mixing between the plurality of neural state components)
wherein the machine-learned forward parameter set and the machine-learned backward transform set are distinct parameter sets([p. 4 ]"Messages−→ m(1)and←−m(K) are externally specified inputs" [p. 3] "updated Equation 3 with distinct layers k is defined as […] In this case information would only flow into one direction which we can compensate for using an additional backward term […] Sbci^(k+1)*Uij" Kirsch explicitly distinguishes the forward and backward parameter sets (−→ m(1)and←−m(K)) as abstractions of the explicit modification of Eqn. 3 for bidirectional flow as shown in Eqn. 7. Kirsch further teaches that the system is implemented using LSTM and it is well understood that LSTMs transform input signals using learned weight matrices (as shown in Eqn. 3 and 7). Because Kirsch applies separate forward and backward message functions to different directional inputs, these functions are interpreted as being implemented using different learned matrices having distinct machine-learned forward and backward transform parameter sets that determine how state information is combined, thereby controlling the amount of mixing between the plurality of neural state components.)
instructions that, when executed by the one or more processors, cause the computing system to execute the artificial neural network to forward process input data to generate a prediction for a task.([p. 5 §3.8] "To optimize each sub-RNN to compute y = tanh(x)w + b or a related activation function, we designate an element of the summed input message −→m (k) a to correspond to x, and an element of the output message −→m(s (k) ab) to the predicted y").
Regarding claim 3, Kirsch teaches The computing system of claim 1, wherein one or both of the machine-learned forward transform parameter set and the machine-learned backward transform parameter set have been learned by performance of a meta-learning technique.(Kirsch [p. 5 §3.7] "Having formalized the VS-ML RNN we can now use end to-end meta learning to create LAs from scratch. In this setting, we simply optimize VM := {W, C} to minimize the sum of prediction losses over many time steps starting with a random state VL := s. This meta training is done using gradient descent.").
Regarding claim 4, Kirsch teaches The computing system of claim 1, wherein machine-learned forward transform parameter set and the machine-learned backward transform parameter set are included in an update genome associated with the first neuron.(Kirsch [p. 4] "our state update equation can also be written as s (k) ab ← fVM (s (k) ab , −→m(k) a ,←−m (k) b). [...] Let θ be the parameters of an NN and h its activations, then in the most general case we can define FWs / LLRs by a variable update rule [...] In VS-ML RNNs θ and h can be represented by subsets of s, whereas φ is given by W and C, leading to a general RNN update" θ interpreted as update genome which comprises all of the trainable parameters of the entire system, and is explicitly represented by state subsets which are explicitly updated by the messages. See also Eqn. 11 and 12.).
Regarding claim 5, Kirsch teaches The computing system of claim 4, wherein the update genome is shared across the first neuron and one or more other neurons of the plurality of neurons.(Kirsch [p. 4] "our state update equation can also be written as s (k) ab ← fVM (s (k) ab , −→m(k) a ,←−m (k) b). [...] Let θ be the parameters of an NN and h its activations, then in the most general case we can define FWs / LLRs by a variable update rule [...] In VS-ML RNNs θ and h can be represented by subsets of s, whereas φ is given by W and C, leading to a general RNN update" Theta comprises all of the trainable parameters of the entire system.).
Regarding claim 6, Kirsch teaches The computing system of claim 5, wherein the update genome is shared across all of the plurality of neurons.(Kirsch [p. 4] "Let θ be the parameters of an NN and h its activations" Theta comprises all of the trainable parameters of the entire system. See also Eqn. 11 and 12.).
Regarding claim 7, Kirsch teaches The computing system of claim 4, wherein the update genome further comprises a machine-learned pre-synaptic transform parameter set that controls forward updates to a plurality of forward synaptic weights associated with the first neuron.(Kirsch [p. 4] "To implement backpropagation we optimize the VS-ML RNN to use and update weights w and biases b as part of the state sab in each sub-RNN. Inputs are pre-synaptic x and error e. Outputs are post-synaptic yˆ and error e" Pre-synaptic x interpreted as controlling forward synaptic weights.).
Regarding claim 8, Kirsch teaches The computing system of claim 4, wherein the update genome further comprises a machine-learned post-synaptic transform parameter set that controls backward updates to a plurality of backward synaptic weights associated with the first neuron.(Kirsch [p. 4] "To implement backpropagation we optimize the VS-ML RNN to use and update weights w and biases b as part of the state sab in each sub-RNN. Inputs are pre-synaptic x and error e. Outputs are post-synaptic yˆ and error e" Post-synaptic y interpreted as controlling backward synaptic weights.).
Regarding claim 10, Kirsch teaches The computing system of claim 4, wherein the update genome further comprises one or more of: a machine-learned neuron forget parameter set;(Kirsch [p. 5 §4] "Our implementation uses LSTMs and the message interpretation from Equation 8" LSTMs have forget gates by definition.)
a machine-learned neuron update parameter set;(Kirsch [p. 4] "our state update equation can also be written as s (k) ab ← fVM (s (k) ab , −→m(k) a ,←−m (k) b). [...] Let θ be the parameters of an NN and h its activations, then in the most general case we can define FWs / LLRs by a variable update rule [...] In VS-ML RNNs θ and h can be represented by subsets of s, whereas φ is given by W and C, leading to a general RNN update" Theta comprises all of the trainable parameters of the entire system. See also Eqn. 11 and 12.)
a machine-learned synapses forget parameter set; and(Kirsch [p. 5 §4] "Our implementation uses LSTMs and the message interpretation from Equation 8" LSTMs have forget gates by definition. See also FIG. 3.)
a machine-learned synapses update parameter set.(Kirsch [p. 4] "Inputs are pre-synaptic x and error e. Outputs are post-synaptic yˆ and error e").
Regarding claim 13, Kirsch teaches The computing system of claim 1, wherein the artificial neural network is configured to simultaneously forward process the input data in the forward direction and backward process the feedback data in the backward direction.(Kirsch [p. 4] "If we then duplicate this RNN and connect them we obtain the VS-ML RNN from Equation 3 or Equation 7. This corresponds to connections in the opposite direction compared to Figure 1c. In the message passing view this is s (k) ab ← f(s (k) ab , P c −→m(s (k−1) ca), P c ←−m(s (k+1) bc)) with a forward message function −→m : R N → R N0 and a backward message function ←−m : R N → R N00 . We denote the resulting summed incoming messages to sub-RNN ab at layer k by −→m(k) ∈ R A(k)×N0 where −→m (k) a := P c −→m(s (k−1) ca) and by ←−m(k) ∈ R B(k)×N00 where ←−m (k) b := P c ←−m(s (k+1) bc). Thus, our state update equation can also be written as [Eqn. 8]" See also FIG. 3 and Eqn. 8 which shows that state update is a function of both forward and backward messages simultaneously.).
Regarding claim 14, Kirsch teaches The computing system of claim 1, wherein the feedback data comprises gradient-free feedback data.(Kirsch [p. 8] "we are interested in meta learning that does not rely on fixed gradient calculation in the inner loop." [p. 5] "Crucially, during meta testing no explicit gradient descent is used" [p. 6] "This allows for faster learning during meta testing compared to online backpropagation with Adam on the same dataset").
Regarding claim 16, Kirsch teaches One or more non-transitory computer-readable media that collectively store:([p. 13 §C.2] "Meta Training Meta training is done across 8 GPUs using distributed gradient descent, each running one trajectory of 500 online examples. The gradients on VM are then averaged across GPUs")
an artificial neural network comprising a plurality of neurons and configured to forward process input data in a forward direction and backward process feedback data in a backward direction opposite to the forward direction;([p. 3] "the VS-ML RNN implements an NN with complex neurons (here 2 neurons)" [p. 12 §C] "LSTM implementation We implement the VS-ML RNN using A · B LSTMs as in Equation 6 with forward and backward messages as described in Equation 8" See also FIG. 1 and FIG. 3)
wherein at least a first neuron of the plurality of neurons is configured to maintain a plurality of different states; and([p. 3 §3.5] "we stack multiple VS-ML RNNs where their states are untied and their parameters are tied. 3 This corresponds to Figure 1c where the states s (k+1) of the second column of sub-RNNs are distinct from the first column s (k)")
wherein a learned update genome is associated with at least the first neuron and comprises one or more machine-learned parameter sets that control operation of at least the first neuron; ([p. 4] "our state update equation can also be written as s (k) ab ← fVM (s (k) ab , −→m(k) a ,←−m (k) b). [...] Let θ be the parameters of an NN and h its activations, then in the most general case we can define FWs / LLRs by a variable update rule [...] In VS-ML RNNs θ and h can be represented by subsets of s, whereas φ is given by W and C, leading to a general RNN update" θ interpreted as update genome which comprises all of the trainable parameters of the entire system, and is explicitly represented by state subsets which are explicitly updated by the messages. See also Eqn. 11 and 12.)
wherein the one or more machine-learned parameter sets comprise one or more of: a machine-learned forward transform parameter set comprises one or more learned parameter values that control an amount of mixing between each of the plurality of different states of the first neuron during forward processing;([p. 4] "If we then duplicate this RNN and connect them we obtain the VS-ML RNN from Equation 3 or Equation 7. This corresponds to connections in the opposite direction compared to Figure 1c. In the message passing view this is s (k) ab ← f(s (k) ab , P c −→m(s (k−1) ca), P c ←−m(s (k+1) bc)) with a forward message function −→m : R N → R N0 and a backward message function ←−m : R N → R N00 . We denote the resulting summed incoming messages to sub-RNN ab at layer k by −→m(k) ∈ R A(k)×N0 where −→m (k) a := P c −→m(s (k−1) ca) and by ←−m(k) ∈ R B(k)×N00 where ←−m (k) b := P c ←−m(s (k+1) bc). Thus, our state update equation can also be written as [Eqn. 8]" m→ interpreted as learned forward transform parameter. Messages are mixed between states according to the learned parameters. See also FIG. 3.)
a machine-learned backward transform parameter set comprises one or more learned parameter values that control an amount of mixing between each of the plurality of different states of the first neuron during backward processing;([p. 4] "If we then duplicate this RNN and connect them we obtain the VS-ML RNN from Equation 3 or Equation 7. This corresponds to connections in the opposite direction compared to Figure 1c. In the message passing view this is s (k) ab ← f(s (k) ab , P c −→m(s (k−1) ca), P c ←−m(s (k+1) bc)) with a forward message function −→m : R N → R N0 and a backward message function ←−m : R N → R N00 . We denote the resulting summed incoming messages to sub-RNN ab at layer k by −→m(k) ∈ R A(k)×N0 where −→m (k) a := P c −→m(s (k−1) ca) and by ←−m(k) ∈ R B(k)×N00 where ←−m (k) b := P c ←−m(s (k+1) bc). Thus, our state update equation can also be written as [Eqn. 8]" m← interpreted as learned backward transform parameter. Messages are mixed between states according to the learned parameters.)
a machine-learned pre-synaptic transform parameter set that controls forward updates to a plurality of forward synaptic weights associated with the first neuron; and([p. 4] "To implement backpropagation we optimize the VS-ML RNN to use and update weights w and biases b as part of the state sab in each sub-RNN. Inputs are pre-synaptic x and error e. Outputs are post-synaptic yˆ and error e" Pre-synaptic x interpreted as controlling forward synaptic weights.)
a machine-learned post-synaptic transform parameter set that controls backward updates to a plurality of backward synaptic weights associated with the first neuron.([p. 4] "To implement backpropagation we optimize the VS-ML RNN to use and update weights w and biases b as part of the state sab in each sub-RNN. Inputs are pre-synaptic x and error e. Outputs are post-synaptic yˆ and error e" Post-synaptic y interpreted as controlling backward synaptic weights.).
Regarding claim 17, Kirsch teaches The one or more non-transitory computer-readable media of claim 16, wherein the plurality of different states comprise three or more different states.(Kirsch [p. 13] "Stability during meta testing In order to prevent exploding states during meta testing we also clip the LSTM state between −4 and 4. Bounded states in LSTMs In LSTMs the hidden state is bounded between (−1, 1). For learning algorithm cloning, we’d like to support weights and biases beyond this range. This can be circumvented by choosing a constant, here 4, by which we scale w and b down to store them in the context state. This is only relevant during learning algorithm cloning" Negative 4,-1,1, and 4 are four different states. There is also an infinite number of values in the continuous range of -1 to 1.).
Regarding claim 18, Kirsch teaches The one or more non-transitory computer-readable media of claim 16, wherein the learned update genome has been learned by performance of a meta-learning technique.(Kirsch [p. 4] "Let θ be the parameters of an NN and h its activations" [p. 5 §3.7] "Having formalized the VS-ML RNN we can now use end to-end meta learning to create LAs from scratch. In this setting, we simply optimize VM := {W, C} to minimize the sum of prediction losses over many time steps starting with a random state VL := s. This meta training is done using gradient descent." Theta comprises all of the trainable parameters of the entire system including the learned parameters learned by meta-learning. See also Eqn. 11 and 12.).
Regarding claim 20, Kirsch teaches The one or more non-transitory computer-readable media of claim 16, wherein the artificial neural network is configured to simultaneously forward process the input data in the forward direction and backward process the feedback data in the backward direction.(Kirsch [p. 4] "If we then duplicate this RNN and connect them we obtain the VS-ML RNN from Equation 3 or Equation 7. This corresponds to connections in the opposite direction compared to Figure 1c. In the message passing view this is s (k) ab ← f(s (k) ab , P c −→m(s (k−1) ca), P c ←−m(s (k+1) bc)) with a forward message function −→m : R N → R N0 and a backward message function ←−m : R N → R N00 . We denote the resulting summed incoming messages to sub-RNN ab at layer k by −→m(k) ∈ R A(k)×N0 where −→m (k) a := P c −→m(s (k−1) ca) and by ←−m(k) ∈ R B(k)×N00 where ←−m (k) b := P c ←−m(s (k+1) bc). Thus, our state update equation can also be written as [Eqn. 8]" See also FIG. 3 and Eqn. 8 which shows that state update is a function of both forward and backward messages simultaneously.).
Regarding claim 21, Kirsch teaches The computing system of claim 1, wherein: the machine-learned forward transform parameter set selects one or more of the plurality of different states of the first neuron to mix during forward processing; (Kirsch [p. 3] "Weights are multi-dimensional and a subset of the RNN state s" [p. 5] "We designate an element of the summed input message ->ma(k) to correspond to x" [p. 3] "messages are essentially inputs and outputs of sub-RNNs" [p. 2] "This computation describes A · B independent computation paths (A · B independent sub-RNNs) which we connect using an interaction term [See Eqn. 3 sum(scaiCij)] where CERN^NxN. This recursion constitutes the VS-ML RNN with VM:=(W,C)" Kirsch's forward direction mixing is explicitly implemented by a learned matrix C that weights and sums incoming state components from the previous layer)
and the machine-learned backward transform parameter set selects one or more of the plurality of different states of the first neuron to mix during backward processing.(Kirsch [p. 3] "we can compensate for using an additional backward term [See U_ij in Eqn. 7]" [p. 5] "An element of the summed input message ← − m(k) b is thus designated as the input error e" [p. 5] "information always also flows backward through ←m" Kirsch adds an explicit backward direction interaction term using U_ij where the notation shows a plurality of backward weights (the coefficients of U) that weight and combine higher layer state components s^(k+1) into the state update at layer k).
Regarding claim 22, Kirsch teaches The computing system of claim 21, wherein the computing system further comprises: a plurality of forward weights that modify a first state of the plurality of different states during forward processing, the mixing of the modified first state controlled by the machine- learned forward transform parameter set; (Kirsch [p. 3] "Weights are multi-dimensional and a subset of the RNN state s" [p. 5] "We designate an element of the summed input message ->ma(k) to correspond to x" [p. 3] "messages are essentially inputs and outputs of sub-RNNs" [p. 2] "This computation describes A · B independent computation paths (A · B independent sub-RNNs) which we connect using an interaction term [See Eqn. 3 sum(scaiCij)] where CERN^NxN. This recursion constitutes the VS-ML RNN with VM:=(W,C)" Kirsch's forward direction mixing is explicitly implemented by a learned matrix C that weights and sums incoming state components from the previous layer)
and a plurality of backward weights that modify a second state of the plurality of different states during backward processing, the mixing of the modified second state controlled by the machine-learned backward transform parameter set.(Kirsch [p. 3] "we can compensate for using an additional backward term [See U_ij in Eqn. 7]" [p. 5] "An element of the summed input message ← − m(k) b is thus designated as the input error e" [p. 5] "information always also flows backward through ←m" Kirsch adds an explicit backward direction interaction term using U_ij where the notation shows a plurality of backward weights (the coefficients of U) that weight and combine higher layer state components s^(k+1) into the state update at layer k).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 9, 11, and 19 are rejected under U.S.C. §103 as being unpatentable over the combination of Kirsch and Liu (“Binarized LSTM Language Model”, 2018).
Regarding claim 9, Kirsch teaches The computing system of claim 7.
However, Kirsch doesn't explicitly teach wherein one or both of the machine-learned pre-synaptic transform parameter set and the machine-learned post-synaptic function comprise a binary mixing matrix.
Liu, in the same field of endeavor, teaches one or both of the machine-learned pre-synaptic transform parameter set and the machine-learned post-synaptic function comprise a binary mixing matrix. ([p. 3] "The training approach is similar to the methods proposed in (Courbariaux et al., 2016; Rastegari et al., 2016). At run-time, the input embedding and the output embedding are binarized matrices [...] In a binarized LSTM language model, all the matrices in the parameters are binarized, which can save much more memory space").
Kirsch as well as Liu are directed towards LSTMs. Therefore, Kirsch as well as Liu are reasonably pertinent analogous art. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Kirsch with the teachings of Liu by using a binarized LSTM. Liu provides as additional motivation for combination ([p. 8 §5] "words are represented in the form of binarized vectors, which only contain parameters of -1 or 1. For further compression, we binarize the long short-term memory language model combined with the binarized embeddings. Thus, the total memory usage can be significantly reduced"). This motivation for combination also applies to the remaining claims which depend on this combination.
Regarding claim 11, Kirsch teaches The computing system of claim 1.
However, Kirsch doesn't explicitly teach one or both of the machine-learned forward transform parameter set and the machine-learned backward transform parameter set comprise a binary mixing matrix.
Liu, in the same field of endeavor, teaches one or both of the machine-learned forward transform parameter set and the machine-learned backward transform parameter set comprise a binary mixing matrix. ([p. 3] "The training approach is similar to the methods proposed in (Courbariaux et al., 2016; Rastegari et al., 2016). At run-time, the input embedding and the output embedding are binarized matrices [...] In a binarized LSTM language model, all the matrices in the parameters are binarized, which can save much more memory space").
Kirsch as well as Liu are directed towards LSTMs. Therefore, Kirsch as well as Liu are reasonably pertinent analogous art. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Kirsch with the teachings of Liu by using a binarized LSTM. Liu provides as additional motivation for combination ([p. 8 §5] "words are represented in the form of binarized vectors, which only contain parameters of -1 or 1. For further compression, we binarize the long short-term memory language model combined with the binarized embeddings. Thus, the total memory usage can be significantly reduced"). This motivation for combination also applies to the remaining claims which depend on this combination.
Regarding claim 19, Kirsch teaches The one or more non-transitory computer-readable media of claim 16.
However, Kirsch doesn't explicitly teach one or both of the machine-learned forward transform parameter set and the machine-learned backward transform parameter set comprise binary mixing matrices..
Liu, in the same field of endeavor, teaches one or both of the machine-learned forward transform parameter set and the machine-learned backward transform parameter set comprise binary mixing matrices. ([p. 3] "The training approach is similar to the methods proposed in (Courbariaux et al., 2016; Rastegari et al., 2016). At run-time, the input embedding and the output embedding are binarized matrices [...] In a binarized LSTM language model, all the matrices in the parameters are binarized, which can save much more memory space").
Kirsch as well as Liu are directed towards LSTMs. Therefore, Kirsch as well as Liu are reasonably pertinent analogous art. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Kirsch with the teachings of Liu by using a binarized LSTM. Liu provides as additional motivation for combination ([p. 8 §5] "words are represented in the form of binarized vectors, which only contain parameters of -1 or 1. For further compression, we binarize the long short-term memory language model combined with the binarized embeddings. Thus, the total memory usage can be significantly reduced"). This motivation for combination also applies to the remaining claims which depend on this combination.
Claim 12 is rejected under U.S.C. §103 as being unpatentable over the combination of Kirsch and Guo (US20180307753A1).
Regarding claim 12, Kirsch teaches The computing system of claim 1.
However, Kirsch doesn't explicitly teach, wherein at least two of the plurality of different states operate over at least two different time scales.
Guo, in the same field of endeavor, teaches The computing system of claim 1, wherein at least two of the plurality of different states operate over at least two different time scales. ([¶0034] "the electronic device 108 is configured to perform one or more operations based on a bidirectional long short term memory (LSTM) recurrent neural network (RNN) 149. The classifier circuit 128 may be configured to evaluate the samples 116 based on the bidirectional LSTM RNN 149, such as by performing multiple evaluations of the samples 116 using multiple time scales (also referred to herein as time horizons). For example, the classifier circuit 128 may be configured to perform a first evaluation of the samples 116 based on a first time scale, a second evaluation of the samples 116 based on a second time scale, and a third evaluation of the samples 116 based on a third time scale. In an illustrative example, the first time scale may correspond to 40 milliseconds (ms), the second time scale may correspond to 60 ms, and the third time scale may correspond to 120 ms. Other sample sizes, other time horizon sizes, or both, may be used in some implementations").
Kirsch as well as Guo are directed towards LSTMs. Therefore, Kirsch as well as Guo are reasonably pertinent analogous art. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Kirsch with the teachings of Guo by operating over multiple time scales. Guo provides as additional motivation for combination ([¶0035] "using a relatively long time scale (e.g., one or more years) may assist in classifying a particular location of the plurality of geographic locations that is associated with a festival event. Alternatively or in addition, certain acoustic events may occur with a relatively short duration. As an illustrative example, a siren event may have a relatively short duration (e.g., as compared to a festival event). In this case, analyzing samples using a relatively short time scale (e.g., a time scale of several seconds) may assist in classifying a particular location of the plurality of geographic locations that is associated with a siren event").
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SIDNEY VINCENT BOSTWICK/Examiner, Art Unit 2124
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124