Last updated: May 29, 2026
Application No. 17/831,159
LTP-INDUCED ONLINE INCREMENTAL DEEP LEARNING

Non-Final OA §103
Filed
Jun 02, 2022
Examiner
ALABI, OLUWATOSIN O
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Actimize Ltd.
OA Round
1 (Non-Final)
This examiner grants 60% of cases after interview

— +22.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 209 resolved cases, 2023–2026
Examiner Intelligence

ALABI, OLUWATOSIN O View full profile →
Grants 60% of resolved cases
Career Allowance Rate
125 granted / 209 resolved
+4.8% vs TC avg
Strong +23% interview lift
Without
With
+22.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
27 currently pending
Career history
247
Total Applications
across all art units
Statute-Specific Performance

§101
2.6%
-37.4% vs TC avg
§103
86.8%
+46.8% vs TC avg
§102
7.1%
-32.9% vs TC avg
§112
2.3%
-37.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 209 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings were received on 06/02/2022.  These drawings are acceptable.

Specification
The substitute specification filed 06/14/2022 has been entered.
Regarding the claims filed 06/14/2022, these are considered claim amendments given that they were filed subsequent to the originally filed claims dated 06/02/2026. See requirements per MPEP1893.01(a)(4): 
All "currently amended" claims must include markings to indicate the changes made relative to the immediate prior version of the claims: underlining to indicate additions, strike-through or double brackets for deletions (see 37 CFR 1.121(c)  for further details regarding the format of claim amendments). Applicants should note that, in an amendment to the claims filed in a national phase application, the status identifier "original" must be used for claims that had been presented on the international filing date and not modified or canceled. The status identifier "previously presented" must be used in any amendment submitted during the national phase for any claims added or modified under PCT Articles 19  or 34  in the international phase that were subsequently entered in the national phase. The status identifier "canceled" must be used in any amendment submitted during the national phase for any claims canceled under a PCT Article 19  or 34  amendment in the international phase and subsequently entered in the national phase. (emphasis added)

Please follow the noted guidance in all future filings regarding claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 4-5, 8-10, 12-13, 16-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Mayer et al. (US 20210287089, hereinafter ‘May’) in view of Chen et al. (US 20180189645, hereinafter ‘Chen’).

Regarding independent claim 1, May teaches a machine learning system configured to induce neuron activity in a neural network of the machine learning system, the machine learning system comprising: a processor and a computer readable medium operably coupled thereto, the computer readable medium comprising a plurality of instructions stored in association therewith that are accessible to, and executable by, the processor, to perform selective neuron inducement operations which comprise: ([0014] In another aspect, the subject matter described in this specification can be embodied in a non-transitory computer-readable medium having instructions stored thereon that, when executed by one or more computer processors, cause the one or more computer processors to perform operations [a machine learning system configured to induce neuron activity in a neural network of the machine learning system, the machine learning system comprising: a processor and a computer readable medium operably coupled thereto, the computer readable medium comprising a plurality of instructions stored in association therewith that are accessible to, and executable by, the processor] including: oscillating a learning rate while performing a preliminary training of a neural network; determining, based on the preliminary training, a number of training epochs to perform for a subsequent training session; and training the neural network using the determined number of training epochs… [0021] In another aspect, the subject matter described in this specification can be embodied in a non-transitory computer-readable medium having instructions stored thereon that, when executed by one or more computer processors, cause the one or more computer processors to perform operations [a machine learning system configured to induce neuron activity in a neural network of the machine learning system, the machine learning system comprising: a processor and a computer readable medium operably coupled thereto, the computer readable medium comprising a plurality of instructions stored in association therewith that are accessible to, and executable by, the processor] including: providing a neural network and training data; determining, based on a size of the training data, one or more first hyperparameters including at least one of a mini-batch size or a dropout rate; determining, based on a type of predictive modeling problem to be solved using the neural network [executable by, the processor, to perform selective neuron inducement operations], one or more second hyperparameters including at least one of a learning rate, a batch normalization, a number of epochs, or an output activation function; and training the neural network using the training data, the one or more first hyperparameters, and the one or more second hyperparameters. )
selecting the neural network comprising a multilayer perceptron having activation functions and weights for activation of perceptrons in the multilayer perceptron and one or more predictive outputs by the perceptrons; (in As depicted in Fig. 1 and in [0003] Artificial neural networks (“neural networks”) are a family of computer models inspired by biological neural networks and can be used to estimate or approximate functions from a large number of unknown inputs [selecting the neural network comprising a multilayer perceptron having activation functions and weights for activation of perceptrons in the multilayer perceptron and one or more predictive outputs by the perceptrons]. Neural network models can be used for regression and/or classification. In one example involving classification, images of dogs can be collected and used to train a neural network model to recognize different dog breeds. When a new image of a dog is provided as input to the trained model, the model can provide a score indicating how closely the dog matches one or more of the breeds and/or can provide an identification of the breed. Neural networks can be used in self-driving cars, character recognition, image compression, stock market predictions, and other applications.[0004] A neural network model is based on a collection of connected units or nodes called neurons or perceptrons [selecting the neural network comprising a multilayer perceptron having activation functions and weights for activation of perceptrons in the multilayer perceptron and one or more predictive outputs by the perceptrons]. Connections  between the nodes [weights for activation of perceptrons in the multilayer perceptron] loosely resemble connections between neurons in a biological brain...; And in [0098] In various examples, the pre-processed data (e.g., training data, validation data, and/or prediction data) can be provided to a neural network model at an input layer [selecting the neural network comprising a multilayer perceptron… ]. Each neuron/perceptron in the model can be, in effect, a linear model, and outputs from the neurons can be sent through a non-linearity (e.g., a non-linear activation function) […having activation functions and weights for activation of perceptrons in the multilayer perceptron and one or more predictive outputs by the perceptrons], such as ReLU. All inputs at the input layer can be multiplied by weights and added to a bias of each neuron in a next layer (e.g., a first hidden layer). For example, each neuron in one layer can receive all neuron output of a previous layer, such that each neuron in the first hidden layer can receive all input data…)
performing an incremental learning cycle on the multilayer perceptron, wherein performing the incremental learning cycle comprises: (in [0122] In general, the learning rate can control or define how much the weights and/or biases of the neural network are adjusted at each training iteration based on an estimated prediction error [performing an incremental learning cycle on the multilayer perceptron, wherein performing the incremental learning cycle]. For example, a calculated loss function gradient (e.g., including a gradient of the loss function for each weight) at a given iteration or step can be multiplied by the learning rate to determine how much to update the weights of the network at that iteration [performing an incremental learning cycle on the multilayer perceptron, wherein performing the incremental learning cycle]…)
selecting a first input neuron to modify by strengthening first connections of the first input neuron to additional neurons in the multilayer perceptron; (in [0122] In general, the learning rate can control or define how much the weights and/or biases of the neural network are adjusted at each training iteration based on an estimated prediction error. For example, a calculated loss function gradient (e.g., including a gradient of the loss function for each weight) at a given iteration or step can be multiplied by the learning rate to determine how much to update the weights of the network at that iteration [selecting a first input neuron to modify by strengthening first connections of the first input neuron to additional neurons in the multilayer perceptron]… And in [0078] In various examples, each edge or connection 160 in the neural network 100 can be associated with a weight and/or bias that can be tuned during a neural network training process, which can enable the model to “learn” to recognize patterns that may be present in the input data 170. In general, a weight for a connection 160 between two neurons can increase [selecting a first input neuron to modify by strengthening first connections of the first input neuron to additional neurons in the multilayer perceptron] or decrease a “strength” (e.g., a contribution) for the connection 160. The weights can control how sensitive the network's predictions are to various features included in the input data 170. In various examples, neurons can have an activation function that controls how signals or values are sent to other connected neurons…[0080] ... For example, the training processes can repeatedly take a small batch of data (e.g., a mini-batch of training data), calculate a difference between predictions and actuals, and adjust weights (e.g., parameters within a neural network that transform input data within each of the network's hidden layers) [selecting a first input neuron to modify by strengthening first connections of the first input neuron to additional neurons in the multilayer perceptron; including a first layer of neurons having respective input data] in the model by a small amount, layer by layer, to generate predictions closer to actual values. Neural network models are flexible and allow for inclusion or composition of arbitrary functions…[0083] In general, the training module 214 can perform a training process in which model prediction errors are reduced by adjusting one or more parameters (e.g., weights and/or biases) for the model  [selecting a first input neuron to modify by strengthening first connections of the first input neuron to additional neurons in the multilayer perceptron; including a first layer of neurons having respective input data]. The training process can involve, for example performing a series of iterations in which (i) the training data is provided to the model, (ii) predictions are made based on the training data, (iii) errors between the predictions and actual values are determined, and (iv) the model is adjusted in an effort to reduce the errors. In some instances, the model is trained using mini-batches or subsets of the training data. For example, a mini-batch of training data can be provided to the model and the model can be adjusted based on the determined errors.)
selecting a second input neuron to modify by weakening second connections of the second input neuron to the additional neurons in the multilayer perceptron; (in [0122] In general, the learning rate can control or define how much the weights and/or biases of the neural network are adjusted at each training iteration based on an estimated prediction error. For example, a calculated loss function gradient (e.g., including a gradient of the loss function for each weight) at a given iteration or step can be multiplied by the learning rate to determine how much to update the weights of the network at that iteration [selecting a second input neuron to modify by weakening second connections of the second input neuron to the additional neurons in the multilayer perceptron]… And in [0078] In various examples, each edge or connection 160 in the neural network 100 can be associated with a weight and/or bias that can be tuned during a neural network training process, which can enable the model to “learn” to recognize patterns that may be present in the input data 170. In general, a weight for a connection 160 between two neurons can increase or decrease [selecting a second input neuron to modify by weakening second connections of the second input neuron to the additional neurons in the multilayer perceptron] a “strength” (e.g., a contribution) for the connection 160. The weights can control how sensitive the network's predictions are to various features included in the input data 170. In various examples, neurons can have an activation function that controls how signals or values are sent to other connected neurons…[0080] ... For example, the training processes can repeatedly take a small batch of data (e.g., a mini-batch of training data), calculate a difference between predictions and actuals, and adjust weights (e.g., parameters within a neural network that transform input data within each of the network's hidden layers) [selecting a second input neuron to modify by weakening second connections of the second input neuron to the additional neurons in the multilayer perceptron; including a second layer of neurons having respective input data] in the model by a small amount, layer by layer, to generate predictions closer to actual values. Neural network models are flexible and allow for inclusion or composition of arbitrary functions…[0083] In general, the training module 214 can perform a training process in which model prediction errors are reduced by adjusting one or more parameters (e.g., weights and/or biases) for the model  [selecting a second input neuron to modify by weakening second connections of the second input neuron to the additional neurons in the multilayer perceptron; including a second layer of neurons having respective input data]. The training process can involve, for example performing a series of iterations in which (i) the training data is provided to the model, (ii) predictions are made based on the training data, (iii) errors between the predictions and actual values are determined, and (iv) the model is adjusted in an effort to reduce the errors. In some instances, the model is trained using mini-batches or subsets of the training data. For example, a mini-batch of training data can be provided to the model and the model can be adjusted based on the determined errors.)
adjusting, based on the first input neuron and the second input neuron, activation function thresholds of the activation functions and the weights associated with the additional neurons for the first input neuron and the second input neuron in the multilayer perceptron; (in [0078] In various examples, each edge or connection 160 in the neural network 100 can be associated with a weight and/or bias that can be tuned during a neural network training process, which can enable the model to “learn” to recognize patterns that may be present in the input data 170. In general, a weight for a connection 160 between two neurons can increase or decrease a “strength” (e.g., a contribution) for the connection 160 [adjusting, based on the first input neuron and the second input neuron, … and the weights associated with the additional neurons for the first input neuron and the second input neuron in the multilayer perceptron]. The weights can control how sensitive the network's predictions are to various features included in the input data 170. In various examples, neurons can have an activation function that controls how signals or values are sent to other connected neurons. For example, the activation function can require a threshold value [adjusting, based on the first input neuron and the second input neuron, activation function thresholds of the activation functions] to be exceeded before a signal or value can be sent. In general, the activation function of a node can define a range for the output of the node, for a given input  or set of inputs [adjusting, based on the first input neuron and the second input neuron, activation function thresholds of the activation functions].)
and running batches of input data through the multilayer perceptron until a set constraint is met, (in [0128] In various examples, “number of epochs” can refer to a number of full passes made through the training data during the training process. For example, when the training data is divided into 100 mini-batches, and each mini-batch corresponds to one training iteration [and running batches of input data through the multilayer perceptron until a set constraint is met], a single pass through the training data (i.e., one epoch) can involve 100 iterations. Likewise, when the total number of epochs is set to five, training can involve five passes through the training data, for a total of 500 iterations [and running batches of input data through the multilayer perceptron until a set constraint is met].)
wherein a prediction is generated for each of the batches from input data from the neural network based on the adjusting. (in [0080] According to some embodiments, the neural network 100 can be trained using a set of training data (e.g., a subset of the input data 170) that includes one or more features and one or more actual values that can be compared with model predictions [wherein a prediction is generated for each of the batches from input data from the neural network based on the adjusting]. The training process can be a challenging task (e.g., involving use of an optimizer and back-propaga10tion) that requires a methodical approach and includes several complex operations. For example, the training processes can repeatedly take a small batch of data (e.g., a mini-batch of training data), calculate a difference between predictions and actuals, and adjust weights (e.g., parameters within a neural network that transform input data within each of the network's hidden layers) in the model by a small amount, layer by layer, to generate predictions closer to actual values. Neural network models are flexible and allow for inclusion or composition of arbitrary functions… [0083] In general, the training module 214 can perform a training process in which model prediction errors are reduced by adjusting one or more parameters (e.g., weights and/or biases) for the model. The training process can involve, for example performing a series of iterations [wherein a prediction is generated for each of the batches from input data from the neural network based on the adjusting] in which (i) the training data is provided to the model, (ii) predictions are made based on the training data, (iii) errors between the predictions and actual values are determined, and (iv) the model is adjusted in an effort to reduce the errors [wherein a prediction is generated for each of the batches from input data from the neural network based on the adjusting]. In some instances, the model is trained using mini-batches or subsets of the training data. For example, a mini-batch of training data can be provided to the model and the model can be adjusted based on the determined errors [and running batches of input data through the multilayer perceptron until a set constraint is met, wherein a prediction is generated for each of the batches from input data from the neural network based on the adjusting] .)
	Examiner notes that the claimed selection limitations is considered as the selected neural network for training and performing inferencing task, as noted above
	Alternative Chen discloses the selection of the neural network as including a neural network topology selected for training and performing inferencing task, in [0084] FIG. 8 depicts an example flow for mapping and accessing synapse weights in accordance with certain embodiments. The various operations may be performed by any suitable logic of a neuromorphic processor. At 802, neural network parameters are received. The neural network parameters may include a selection of a neural network topology type of a plurality of neural network topology types  [selecting the neural network comprising a multilayer perceptron having activation functions and weights for activation of perceptrons in the multilayer perceptron and one or more predictive outputs by the perceptron] that may be implemented by a neuromorphic computer. The neural network parameters may include any other suitable parameters. For example, the parameters may include size parameters, such as the number of neurons in the neural network, the number of layers in the neural network [multilayer perceptron having activation functions], the number of neurons per layer, and/or other suitable size parameters. As another example, the parameters may include synapse weight value parameters [weights for activation of perceptrons in the multilayer perceptron and one or more predictive outputs by the perceptron], such as the signs and magnitudes of synapse weights of the neural network. In various embodiments, the weight value parameters may be determined during a training process of a neural network that is performed, e.g., by one or more processors of a computer system and/or via other means. As another example, the parameters may include one or more neuron bias values to be applied to each neuron at each time-step of the neural network… [0085] At 804, a synapse memory mapping scheme is identified. The synapse memory mapping scheme may be identified based on the selected neural network topology type. In various embodiments, a plurality of neural network topology types that may be implemented by a neuromorphic processor include fully connected neural networks, sparsely connected neural networks, multi-layer perceptrons [neural network comprising a multilayer perceptron having activation functions and weights for activation of perceptrons in the multilayer perceptron and one or more predictive outputs by the perceptron], feed-forward neural networks, generative neural networks (e.g., Restricted Boltzmann Machines or Deep Belief Networks), …
Chen and May are analogous art because both involve developing information retrieval and modeling techniques using machine learning systems and algorithms.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for retrieving and processing information for configuring various neural network topologies to perform various data processing tasks as disclosed by Chen with the method of developing information retrieval and processing techniques for implementing methods and systems that automate the building, training, tuning, and interpretation of neural networks and other machine learning models as disclosed by May.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Chen and May above; Doing so allow for allowing developing and implementing a neural network model that is optimal for a particular processing task, (Chen, 0052).


Regarding claim 2, the rejection of claim 1 is incorporated and May in combination with Chen teaches the machine learning system of claim 1, wherein the set constraint comprises a predetermined number of passes through each neuron within the multilayer perceptron. (in [0122] In general, the learning rate can control or define how much the weights and/or biases of the neural network are adjusted at each training iteration based on an estimated prediction error. For example, a calculated loss function gradient (e.g., including a gradient of the loss function for each weight) at a given iteration [wherein the set constraint comprises a predetermined number of passes through each neuron within the multilayer perceptron.] or step can be multiplied by the learning rate to determine how much to update the weights of the network at that iteration …; ..[0013] In certain implementations, the training data can include tabular data and/or heterogeneous data. Determining the number of training epochs can include: monitoring a prediction accuracy of the neural network during the preliminary training; and determining the number of training epochs based on a rate of change of the prediction accuracy over successive training iterations. Training the neural network can include generating a learning rate schedule based on the determined number of epochs, and the learning rate schedule can define values for the learning rate over the determined number of training epochs [wherein the set constraint comprises a predetermined number of passes through each neuron within the multilayer perceptron]...; And in [0083] In general, the training module 214 can perform a training process in which model prediction errors are reduced by adjusting one or more parameters (e.g., weights and/or biases) for the model. The training process can involve, for example performing a series of iterations in which (i) the training data is provided to the model, (ii) predictions are made based on the training data, (iii) errors between the predictions and actual values are determined, and (iv) the model is adjusted in an effort to reduce the errors. In some instances, the model is trained using mini-batches or subsets of the training data. For example, a mini-batch of training data can be provided to the model and the model can be adjusted based on the determined errors.
Examiner notes that each iteration passes respective data through each layer for making parameter adjustments over a predetermined number of pass (e.g. training epochs/iterations))

Regarding claim 4, the rejection of claim 1 is incorporated and May in combination with Chen teaches the machine learning system of claim 1, wherein the adjusting the activation function thresholds comprises adjusting a threshold of a first additional neuron selected from the additional neurons based on a first chance of the first additional neuron activating from the running the batches of the input data, wherein the first chance increases a likelihood that the first additional neuron outputs data. (in [0078] In various examples, each edge or connection 160 in the neural network 100 can be associated with a weight and/or bias that can be tuned during a neural network training process [wherein the first chance increases a likelihood that the first additional neuron outputs data learned during a training iteration], which can enable the model to “learn” to recognize patterns that may be present in the input data 170. In general, a weight for a connection 160 between two neurons can increase [wherein the set constraint comprises a predetermined number of passes through each neuron within the multilayer perceptron] or decrease a “strength” (e.g., a contribution) for the connection 160. The weights can control how sensitive the network's predictions are to various features included in the input data 170. In various examples, neurons can have an activation function that controls how signals or values are sent to other connected neurons [wherein the adjusting the activation function thresholds comprises adjusting a threshold of a first additional neuron selected from the additional neurons based on a first chance of the first additional neuron activating from the running the batches of the input data associated with a training iteration]. For example, the activation function can require a threshold value to be exceeded before a signal or value can be sent. In general, the activation function of a node can define a range for the output of the node, for a given input or set of inputs  [wherein the adjusting the activation function thresholds comprises adjusting a threshold of a first additional neuron selected from the additional neurons based on a first chance of the first additional neuron activating from the running the batches of the input data]. And the training process cycles from a first instance to a predetermined number of iteration cycles/epochs, in  [0122] In general, the learning rate can control or define how much the weights and/or biases of the neural network are adjusted at each training iteration based on an estimated prediction error. For example, a calculated loss function gradient (e.g., including a gradient of the loss function for each weight) at a given iteration [wherein the adjusting the activation function thresholds comprises adjusting a threshold of a first additional neuron selected from the additional neurons based on a first chance of the first additional neuron activating from the running the batches of the input data, wherein the first chance increases a likelihood that the first additional neuron outputs data] or step can be multiplied by the learning rate to determine how much to update the weights of the network at that iteration …; ..[0013] In certain implementations, the training data can include tabular data and/or heterogeneous data. Determining the number of training epochs can include: monitoring a prediction accuracy of the neural network during the preliminary training [wherein the first chance increases a likelihood that the first additional neuron outputs data]; and determining the number of training epochs based on a rate of change of the prediction accuracy over successive training iterations. Training the neural network can include generating a learning rate schedule based on the determined number of epochs, and the learning rate schedule can define values for the learning rate over the determined number of training epochs [wherein the adjusting the activation function thresholds comprises adjusting a threshold of a first additional neuron selected from the additional neurons based on a first chance of the first additional neuron activating from the running the batches of the input data, wherein the first chance increases a likelihood that the first additional neuron outputs data]...; And in [0083] In general, the training module 214 can perform a training process in which model prediction errors are reduced by adjusting one or more parameters (e.g., weights and/or biases) for the model [wherein the adjusting the activation function thresholds comprises adjusting a threshold of a first additional neuron selected from the additional neurons based on a first chance of the first additional neuron activating from the running the batches of the input data, wherein the first chance increases a likelihood that the first additional neuron outputs data]. The training process can involve, for example performing a series of iterations in which (i) the training data is provided to the model, (ii) predictions are made based on the training data, (iii) errors between the predictions and actual values are determined, and (iv) the model is adjusted in an effort to reduce the errors. In some instances, the model is trained using mini-batches or subsets of the training data. For example, a mini-batch of training data can be provided to the model and the model can be adjusted based on the determined errors. And in [0018] In another aspect, the subject matter described in this specification can be embodied in a system having one or more computer systems programmed to perform operations including: providing a neural network and training data; determining, based on a size of the training data, one or more first hyperparameters including at least one of a mini-batch size or a dropout rate; determining, based on a type of predictive modeling problem to be solved using the neural network, one or more second hyperparameters including at least one of a learning rate, a batch normalization, a number of epochs, or an output activation function; and training the neural network using the training data, the one or more first hyperparameters, and the one or more second hyperparameters.)

Regarding claim 5, the rejection of claim 4 is incorporated and May in combination with Chen and May teaches the machine learning system of claim 4, wherein the adjusting the activation function thresholds comprises adjusting the threshold of a second additional neuron selected from the additional neurons based on a second chance of the second additional neuron activating from the running the batches of the input data, wherein the second chance decreases a likelihood that the second additional neuron outputs data. (in [0078] In various examples, each edge or connection 160 in the neural network 100 can be associated with a weight and/or bias that can be tuned during a neural network training process [wherein the second chance decreases a likelihood that the second additional neuron outputs data learned during a training iteration], which can enable the model to “learn” to recognize patterns that may be present in the input data 170. In general, a weight for a connection 160 between two neurons can increase or decrease [wherein the second chance decreases a likelihood that the second additional neuron outputs data] a “strength” (e.g., a contribution) for the connection 160. The weights can control how sensitive the network's predictions are to various features included in the input data 170. In various examples, neurons can have an activation function that controls how signals or values are sent to other connected neurons [wherein the adjusting the activation function thresholds comprises adjusting the threshold of a second additional neuron selected from the additional neurons based on a second chance of the second additional neuron activating from the running the batches of the input data, associated with a training iteration]. For example, the activation function can require a threshold value to be exceeded before a signal or value can be sent. In general, the activation function of a node can define a range for the output of the node, for a given input or set of inputs  [wherein the adjusting the activation function thresholds comprises adjusting the threshold of a second additional neuron selected from the additional neurons based on a second chance of the second additional neuron activating from the running the batches of the input data, …]. And the training process cycles from a first instance to a predetermined number of iteration cycles/epochs, in  [0122] In general, the learning rate can control or define how much the weights and/or biases of the neural network are adjusted at each training iteration based on an estimated prediction error. For example, a calculated loss function gradient (e.g., including a gradient of the loss function for each weight) at a given iteration [wherein the adjusting the activation function thresholds comprises adjusting the threshold of a second additional neuron selected from the additional neurons based on a second chance of the second additional neuron activating from the running the batches of the input data,…] or step can be multiplied by the learning rate to determine how much to update the weights of the network at that iteration …; ..[0013] In certain implementations, the training data can include tabular data and/or heterogeneous data. Determining the number of training epochs can include: monitoring a prediction accuracy of the neural network during the preliminary training; and determining the number of training epochs based on a rate of change of the prediction accuracy over successive training iterations [wherein the second chance decreases a likelihood that the second additional neuron outputs data]. Training the neural network can include generating a learning rate schedule based on the determined number of epochs, and the learning rate schedule can define values for the learning rate over the determined number of training epochs [wherein the adjusting the activation function thresholds comprises adjusting the threshold of a second additional neuron selected from the additional neurons based on a second chance of the second additional neuron activating from the running the batches of the input data, wherein the second chance decreases a likelihood that the second additional neuron outputs data]...; And in [0083] In general, the training module 214 can perform a training process in which model prediction errors are reduced by adjusting one or more parameters (e.g., weights and/or biases) for the model [wherein the adjusting the activation function thresholds comprises adjusting the threshold of a second additional neuron selected from the additional neurons based on a second chance of the second additional neuron activating from the running the batches of the input data, wherein the second chance decreases a likelihood that the second additional neuron outputs data]. The training process can involve, for example performing a series of iterations in which (i) the training data is provided to the model, (ii) predictions are made based on the training data, (iii) errors between the predictions and actual values are determined, and (iv) the model is adjusted in an effort to reduce the errors. In some instances, the model is trained using mini-batches or subsets of the training data. For example, a mini-batch of training data can be provided to the model and the model can be adjusted based on the determined errors. And in [0018] In another aspect, the subject matter described in this specification can be embodied in a system having one or more computer systems programmed to perform operations including: providing a neural network and training data; determining, based on a size of the training data, one or more first hyperparameters including at least one of a mini-batch size or a dropout rate; determining, based on a type of predictive modeling problem to be solved using the neural network, one or more second hyperparameters including at least one of a learning rate, a batch normalization, a number of epochs, or an output activation function; and training the neural network using the training data, the one or more first hyperparameters, and the one or more second hyperparameters.)

Regarding claim 8, the rejection of claim 1 is incorporated and May in combination with Chen teaches the machine learning system of claim 1, wherein the incremental learning cycle is repeated with at least one or more different input neurons after selecting the first input neuron and the second input neuron for the adjusting. (in [0080] According to some embodiments, the neural network 100 can be trained using a set of training data (e.g., a subset of the input data 170) that includes one or more features and one or more actual values that can be compared with model predictions. The training process can be a challenging task (e.g., involving use of an optimizer and back-propagation) that requires a methodical approach and includes several complex operations. For example, the training processes can repeatedly take a small batch of data (e.g., a mini-batch of training data) [wherein the incremental learning cycle is repeated with at least one or more different input neurons after selecting the first input neuron and the second input neuron for the adjusting], calculate a difference between predictions and actuals, and adjust weights (e.g., parameters within a neural network that transform input data within each of the network's hidden layers) […after selecting the first input neuron and the second input neuron for the adjusting] in the model by a small amount, layer by layer, to generate predictions closer to actual values. Neural network models are flexible and allow for inclusion or composition of arbitrary functions… [0083] In general, the training module 214 can perform a training process in which model prediction errors are reduced by adjusting one or more parameters (e.g., weights and/or biases) for the model. The training process can involve, for example performing a series of iterations [wherein the incremental learning cycle is repeated with at least one or more different input neurons after selecting the first input neuron and the second input neuron for the adjusting] in which (i) the training data is provided to the model, (ii) predictions are made based on the training data, (iii) errors between the predictions and actual values are determined, and (iv) the model is adjusted in an effort to reduce the errors [wherein a prediction is generated for each of the batches from input data from the neural network based on the adjusting]. In some instances, the model is trained using mini-batches or subsets of the training data. For example, a mini-batch of training data can be provided to the model and the model can be adjusted based on the determined errors [wherein the incremental learning cycle is repeated with at least one or more different input neurons after selecting the first input neuron and the second input neuron for the adjusting] .)

Regarding claims 9 and 17, the limitations are similar to those in claim 1 and thus rejected under the same rationale.
Regarding claims 10 and 18, the limitations are similar to those in claim 2 and thus rejected under the same rationale.
Regarding claims 12 and 20, the limitations are similar to those in claim 4 and thus rejected under the same rationale.
Regarding claim 13, the limitations are similar to those in claim 5, and thus rejected under the same rationale.
Regarding claim 16, the limitations are similar to those in claim 8, and thus rejected under the same rationale.

Claims 3, 11, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Mayer et al. (US 20210287089, hereinafter ‘May’) in view of Chen et al. (US 20180189645, hereinafter ‘Chen’) in further view of Mixter (US 20210150364, hereinafter ‘Mix’).

	Regarding claim 3, the rejection of claim 2 is incorporated and May in combination with Chen teaches the machine learning system of claim 2, wherein the predetermined number of passes comprises a sum of forward passes and backpropagation passes through each neuron in the multilayer perceptron. (in [0134] In various examples, the hyperparameters used to train the neural network model can be adjusted or adapted over time according to one or more training schedules. For example, one or more of the hyperparameters used for training the neural network can be set to the initial hyperparameter values (e.g., as determined by the initial hyperparameter module 218) and then adjusted over time, as training progresses (e.g., using the hyperparameter adaptation module 220)... A variety of different optimizers can be used. In some instances, the optimizer can be or include a function that is executed to determine what the weights of the network should be after a back-propagation step (e.g., at each iteration) [wherein the predetermined number of passes comprises a sum of forward passes and backpropagation passes through each neuron in the multilayer perceptron]…)
Examiner notes that backpropagations requires forward passes including summing of model values. 
Mix expressly teaches backpropagations requires forward passes including summing of model values, in [0003] Artificial neural networks can be trained to implement artificially intelligent processes and functions that can predict many things. Artificial neural network training and prediction can be distilled down to simple multiply and accumulation operations [wherein the predetermined number of passes comprises a sum of forward passes…]. During prediction, also known as forward propagation, the sums of the multiply and accumulate operations are fed into activation functions that inject nonlinearity into the network. During training, also known as backpropagation [wherein the predetermined number of passes comprises a sum of forward passes and backpropagation passes through each neuron in the multilayer perceptron], the derivative of the activation functions along with the multiplied inputs and weights, and the resulting accumulated sums, are used to determine the perceptron output error [wherein the predetermined number of passes comprises a sum of forward passes … through each neuron in the multilayer perceptron]. It is this error that is used to adjust perceptron input weights allowing the network to be trained. [0004] In typical artificial neural network training regimes, the network is backpropagated for every set of input data and/or is based on whether the classifier made a correct prediction [wherein the predetermined number of passes comprises a sum of forward passes and backpropagation passes through each neuron in the multilayer perceptron]. Currently, there are several ways to determine whether a network made a correct prediction and should be backpropagated. For example, a mean squared error can be calculated for each set of input data that is forward propagated through the network [wherein the predetermined number of passes comprises a sum of forward passes and backpropagation passes through each neuron in the multilayer perceptron]. If the mean squared error meets a threshold, then the network is backpropagated…)
Mix, Chen and May are analogous art because both involve developing information retrieval and modeling techniques using machine learning systems and algorithms.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for retrieving and processing information for implementing a system and method for backpropagating an artificial neural network, as disclosed by Mix with the method of developing information retrieval and processing techniques for implementing methods and systems that automate the building, training, tuning, and interpretation of neural networks and other machine learning models as collectively disclosed by Chen and May.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Mix, Chen and May above; Doing so allows for the network to be trained by accumulating  sums used to determine the perceptron output error and making adjusting perceptron input weights, (Mix, 0003).
Regarding claim 11, the limitations are similar to those in claim 3, and thus rejected under the same rationale.
Regarding claim 19, the limitations are similar to those in claim 3, and thus rejected under the same rationale.

Claims 6 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Mayer et al. (US 20210287089, hereinafter ‘May’) in view of Chen et al. (US 20180189645, hereinafter ‘Chen’) in further view of Zhou (US 20210256423, hereinafter ‘Zhou’).

Regarding claim 6, the rejection of claim 1 is incorporated and May in combination with Chen teaches the machine learning system of claim 1, and wherein the running of the batches is performed with the neural network in real-time when streaming the batches of the input data from the one or more online data sources. (in [0060] As used herein, “image data” may refer to a sequence of digital images (e.g., video), a set of digital images, a single digital image, and/or one or more portions of any of the foregoing. A digital image may include an organized set of picture elements (“pixels”). Digital images may be stored in computer-readable file… .[0061] As used herein, “non-image data” may refer to any type of data other than image data, including but not limited to structured textual data, unstructured textual data, categorical data, and/or numerical data. As used herein, “natural language data” may refer to speech signals representing natural language, text (e.g., unstructured text) representing natural language, and/or data derived therefrom...[0062] As used herein, “time-series data” may refer to data collected at different points in time [and wherein the running of the batches is performed with the neural network in real-time when streaming the batches of the input data from the one or more online data source]. For example, in a time-series data set, each data sample may include the values of one or more variables sampled at a particular time. In some embodiments, the times corresponding to the data samples are stored within the data samples (e.g., as variable values) or stored as metadata associated with the data set. In some embodiments, the data samples within a time-series data set are ordered chronologically. In some embodiments, the time intervals between successive data samples in a chronologically-ordered time-series data set are substantially uniform…
May and Chen do not expressly teach wherein the batches of the input data are collected from one or more online data sources,
Zhou expressly teaches wherein the batches of the input data are collected from one or more online data sources, in [0031] In an embodiment, the historical sample data can be historical streaming sample data before the current streaming sample data. For example, the historical streaming sample data may include one or more batches of streaming sample data before the current streaming sample data [wherein the batches of the input data are collected from one or more online data sources]. In this case, the shallow learning model can be obtained through online training based on the historical streaming sample data. And in [0018] The embodiments of the present specification provide technical solutions for training a learning model based on streaming sample data [wherein the batches of the input data are collected from one or more online data sources]. In the present specification, the streaming sample data usually can include sample data continuously generated from a data sample source, for example, log files, online shopping data, game player activity data, and social networking site information data generated by a web application. The streaming sample data can also be referred to as real-time sample data whose time span is usually between hundreds of milliseconds and several seconds [wherein the running of the batches is performed with the neural network in real-time when streaming the batches of the input data from the one or more online data sources]. Model training performed based on the streaming sample data usually can also be considered as online learning. [0019] Specifically, in the technical solutions of the present specification, current streaming sample data can be received. Then, a current deep learning model can be trained based on the current streaming sample data [wherein the running of the batches is performed with the neural network in real-time when streaming the batches of the input data from the one or more online data sources]. Parameters of a shallow learning model can be used as initialization parameters of the current deep learning model, and the shallow learning model can be obtained through training based on historical sample data associated with the current streaming sample data. [0020] It can be seen that in the technical solutions, when the current deep learning model is trained based on the current streaming sample data [wherein the running of the batches is performed with the neural network in real-time when streaming the batches of the input data from the one or more online data sources], the parameters of the trained shallow learning model are used as the initialization parameters of the current deep learning model, so that convergence of the deep learning model can be accelerated to efficiently finish a model training process, and performance of the deep learning model can also be improved.
Zhou, Chen and May are analogous art because both involve developing information retrieval and modeling techniques using machine learning systems and algorithms.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for retrieving and processing information for training learning models, as disclosed by Zhou with the method of developing information retrieval and processing techniques for implementing methods and systems that automate the building, training, tuning, and interpretation of neural networks and other machine learning models as collectively disclosed by Chen and May.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Zhou, Chen and May above; Doing so allows for training a learning model based on streaming sample data, (Zhou, 0018).

Regarding claim 14, the limitations are similar to those in claim 6, and thus rejected under the same rationale.

Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Mayer et al. (US 20210287089, hereinafter ‘May’) in view of Chen et al. (US 20180189645, hereinafter ‘Chen’) in further view of Di Febbo et al.  (US 20180268256, hereinafter ‘Di’).

Regarding claim 7, the rejection of claim 1 is incorporated and May in combination with Chen teaches the machine learning system of claim 1, wherein, after the set constraint is met, the activation function thresholds and the weights of the first input neuron and the second input neuron are reset. (in [0146] Once all desired or necessary cycles have been performed, the preliminary training session can proceed to the warm-down phase 1020. This can involve, for example, resetting the model parameters [wherein, after the set constraint is met, the activation function thresholds and the weights of the first input neuron and the second input neuron are reset] (e.g., weights and/or biases) to values corresponding to when the model had a lowest error prior to performing the warm-down phase 1020.)
May and Chen do not expressly disclose resetting the model parameters random values
Di expressly discloses resetting the model parameters random values, in [0143] In operation 850, the training system 320 re-trains the convolutional neural network using the updated training set. In some embodiments, the training begins from scratch (e.g., with random initial values of the parameters) [reset to random values]...

Di, Chen and May are analogous art because both involve developing information retrieval and modeling techniques using machine learning systems and algorithms.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for retrieving and processing information implementing neural network that can be trained to predict the output, as disclosed by Di with the method of developing information retrieval and processing techniques for implementing methods and systems that automate the building, training, tuning, and interpretation of neural networks and other machine learning models as collectively disclosed by Chen and May.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Di, Chen and May above; Doing so allows for re-computing learned neural network parameters from the second training set using backpropagation, (Di, 0019 & 0143).
Regarding claim 15, the limitations are similar to those in claim 7, and thus rejected under the same rationale.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Shlomot (US 5781128): teaches in 5:24-34: The operation of an artificial neural network can be described as an input-output mapping and the mapping pattern is determined mainly by the weights, w, provided at the inputs of each perceptron. The most commonly used artificial neural networks are of the multi-layered, feed-forward type. Efficient algorithms, such as back-propagation algorithms, can be used for training such a network based on a labeled training set, i.e. a set of possible input vectors together with the desired output of the network for each vector.

Pugsley (US 20190042930): teaches in  [0095] … In general, the duplication takes a snapshot of the current state of the NSM corresponding to a neural network, whatever that state happens to be. However, in an example, it may be advantageous to “reset” the new copy to a more neutral, random state, similar to how neural networks are often initialized.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516. The examiner can normally be reached Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/OLUWATOSIN ALABI/              Primary Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Jun 02, 2022
Application Filed
Feb 13, 2026
Non-Final Rejection mailed — §103
May 13, 2026
Response Filed
Precedent Cases

Applications granted by this same examiner with similar technology

18/656,024
Patent 12639573
Method, System, and Computer Program Product for Embedding Compression and Regularization
2y 0m to grant Granted May 26, 2026
17/798,578
Patent 12632740
METHOD AND SYSTEM FOR MULTIMODAL CLASSIFICATION BASED ON BRAIN-INSPIRED UNSUPERVISED LEARNING
3y 9m to grant Granted May 19, 2026
18/093,594
Patent 12579409
IDENTIFYING SENSOR DRIFTS AND DIVERSE VARYING OPERATIONAL CONDITIONS USING VARIATIONAL AUTOENCODERS FOR CONTINUAL TRAINING
3y 2m to grant Granted Mar 17, 2026
18/802,747
Patent 12572814
ARTIFICIAL NEURAL NETWORK BASED SEARCH ENGINE CIRCUITRY
1y 6m to grant Granted Mar 10, 2026
18/196,986
Patent 12561570
METHODS AND ARRANGEMENTS TO IDENTIFY FEATURE CONTRIBUTIONS TO ERRONEOUS PREDICTIONS
2y 9m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
60%
Grant Probability
82%
With Interview (+22.6%)
3y 11m (~0m remaining)
Median Time to Grant
Low
PTA Risk
Based on 209 resolved cases by this examiner. Grant probability derived from career allowance rate.