Prosecution Insights
Last updated: April 19, 2026
Application No. 18/069,163

FIRST-TO-SATURATE SINGLE MODAL LATENT FEATURE ACTIVATION FOR EXPLANATION OF MACHINE LEARNING MODELS

Non-Final OA §103
Filed
Dec 20, 2022
Examiner
MANG, VAN C
Art Unit
2126
Tech Center
2100 — Computer Architecture & Software
Assignee
Fair Isaac Corporation
OA Round
1 (Non-Final)
75%
Grant Probability
Favorable
1-2
OA Rounds
3y 10m
To Grant
99%
With Interview

Examiner Intelligence

Grants 75% — above average
75%
Career Allow Rate
181 granted / 241 resolved
+20.1% vs TC avg
Strong +27% interview lift
Without
With
+26.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 10m
Avg Prosecution
31 currently pending
Career history
272
Total Applications
across all art units

Statute-Specific Performance

§101
31.2%
-8.8% vs TC avg
§103
42.5%
+2.5% vs TC avg
§102
8.0%
-32.0% vs TC avg
§112
13.5%
-26.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 241 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claim(s) 1-7, 9-10, 13-18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Glorot et al. (“Understanding the difficulty of training deep feedforward neural networks”) in view of Hoefler et al. (“Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks”). Regarding claim 1 Glorot teaches …comprising: training, based at least on a plurality of training examples including a plurality of input features, (pg. 250 “set of the tiny-images dataset that contains 50,000 training examples (from which we extracted 10,000 as validation data) and 10,000 test examples. There are 10 classes corresponding to the main object in each image: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, or truck. The classes are balanced. Each image is in color, but is just 32 × 32 pixels in size, so the input is a vector of 32 × 32 × 3 = 3072 real values.” see input features on pg. 253 “Consider the hypothesis that we are in a linear regime at the initialization, that the weights are initialized independently and that the inputs features variances are the same (= V ar[x]). Then we can say that, with ni the size of layer i and x the network inpu”) a first machine learning model including at least one hidden node; (section 2.3 “We optimized feedforward neural networks with one to five hidden layers, with one thousand hidden units per layer, and with a softmax logistic regression for the output layer.”) determining, for each of the plurality of training examples and the at least one hidden node and based on the first machine learning model, (section 3.1 “We want to study possible saturation, by looking at the evolution of activations during training, and the figures in this section show results on the Shapeset-3 × 2 data, but similar behavior is observed with the other datasets. Figure 2 shows the evolution of the activation values (after the nonlinearity) at each hidden layer during training of a deep architecture with sigmoid activation functions. Layer 1 refers to the output of first hidden layer, and there are four hidden layers.”) a plurality of subsets of the plurality of input features including a minimum combination of the plurality of input features first to cause saturation of the at least one hidden node; (abstract “We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial.”) determining, for the at least one hidden node and based on the plurality of subsets of the plurality of input features for each of the plurality of training examples, (section 3.1 “We want to study possible saturation, by looking at the evolution of activations during training, and the figures in this section show results on the Shapeset-3 × 2 data, but similar behavior is observed with the other datasets. Figure 2 shows the evolution of the activation values (after the nonlinearity) at each hidden layer during training of a deep architecture with sigmoid activation functions. Layer 1 refers to the output of first hidden layer, and there are four hidden layers.”) a hidden node ordered saturation list including a subset of the plurality of subsets; (section 3 right col “We see that very quickly at the beginning, all the sigmoid activation values of the last hidden layer are pushed to their lower saturation value of 0. Inversely, the others layers have a mean activation value that is above 0.5, and decreasing as we go from the output layer to the input layer. We have found that this kind of saturation can last very long in deeper networks with sigmoid activations, e.g., the depthfive model never escaped this regime during training. The big surprise is that for intermediate number of hidden layers (here four), the saturation regime may be escaped. At the same time that the top hidden layer moves out of saturation, the first hidden layer begins to saturate and therefore to stabilize.”) and generating a ….trained machine learning model to determine an output for a training example of the plurality of training examples based on at least one input feature of the subset included in the hidden node ordered saturation list corresponding to the at least one hidden node, (section 3.1 “We want to study possible saturation, by looking at the evolution of activations during training, and the figures in this section show results on the Shapeset-3 × 2 data, but similar behavior is observed with the other datasets. Figure 2 shows the evolution of the activation values (after the nonlinearity) at each hidden layer during training of a deep architecture with sigmoid activation functions. Layer 1 refers to the output of first hidden layer, and there are four hidden layers. The graph shows the means and standard deviations of these activations. These statistics along with histograms are computed at different times during learning, by looking at activation values for a fixed set of 300 test examples.”) wherein the at least one input feature first causes saturation of the at least one hidden node for the training example. (Section 3.2 “As discussed above, the hyperbolic tangent networks do not suffer from the kind of saturation behavior of the top hidden layer observed with sigmoid networks, because of its symmetry around 0. However, with our standard weight initialization U h − √ 1 n , √ 1 n i , we observe a sequentially occurring saturation phenomenon starting with layer 1 and propagating up in the network, as illustrated in Figure 3. Why this is happening remains to be understood.”) Glorot does not teach a system comprising: at least one data processor; and at least one memory storing instructions, which when executed by the at least one processor result in operations… sparsely trained machine learning model…. Hoefler teaches a system comprising: at least one data processor; and at least one memory storing instructions, which when executed by the at least one processor result in operations… (pg. 5 “One of the main drivers behind the massive progress in deep learning between the 90’s and today was the nearly 1 million times increase in computational capability delivered by Moore’s law, Dennard scaling, and architectural specializations with GPUs and specialized machine learning accelerators. With the ending of those scaling laws and specialization opportunities, these developments will hit their natural limits and progress may stall. We see sparsity as potentially achieving a second significant “jump” in computational capability as, even with current methods, it promises to increase computational and storage efficiency by up to two orders of magnitude.”) sparsely trained machine learning model…. (section 2.4.3 “The fully-sparse training schedule starts with a sparse model and trains in the sparse regime where it may remove and add elements during the training process. Narasimha et al. (2008) showed early that this scheme can even outperform separate growing or pruning approaches for neuron-sparse training of simple MLPs. Evci et al. (2020a) achieve ResNet-50 performance for a fully sparse training schedule that is comparable to a fully-dense training but uses additional iterations. Weight-sparse training often uses complex hyperparameter settings and schedules. However, it could enable training of very high-dimensional models whose dense representations would simply not fit into the training devices. …Dynamic sparsity combines pruning and regrowth of elements during the training process, while static sparsity prunes once before the training starts and does not update the model structure during training”) Glorot and Hoefler are analogous art because they are both directed to machine learning. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combined training deep feedforward neural networks of Glorot with sparsity in deep learning of Hoefler. One of ordinary skill in the art would have been motivated to make this modification in order “to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice” as disclosed by (Hoefler abstract “We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.”). Regarding claim 15 Claim 15 recites analogous limitations to independent claim 1 and therefore is rejected on the same ground as independent claim 1. Regarding claim 20 Claim 20 recites analogous limitations to independent claim 1 and therefore is rejected on the same ground as independent claim 1. Regarding claim 2 Glorot in view of Hoefler teaches the system of claim 1. Glorot further teaches wherein the operations further include generating an explanation corresponding to at least one training example of the plurality of training examples, wherein the explanation includes an input feature-level contribution to the output. (“Figure 1: Top: Shapeset-3×2 images at 64×64 resolution. The examples we used are at 32×32 resolution. The learner tries to predict which objects (parallelogram, triangle, or ellipse) are present, and 1 or 2 objects can be present, yielding 9 possible classifications. Bottom: Small-ImageNet images at full resolution.”) Regarding claim 3 Glorot in view of Hoefler teaches the system of claim 2. Glorot further teaches wherein generating the explanation includes: determining the at least one input feature of the subset first causing saturation of the at least one hidden node for the training example; (abstract “We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial.”) determining, for the at least one hidden node of the …trained machine learning model, a hidden node weight contribution to the output, wherein the hidden node weight contribution corresponds to the at least one input feature; (section 2.3 “We optimized feedforward neural networks with one to five hidden layers, with one thousand hidden units per layer, and with a softmax logistic regression for the output layer. The cost function is the negative log-likelihood − log P(y|x), where (x, y) is the (input image, target class) pair.” pg. 255 left col “For each network the learning rate is separately chosen to minimize error on the validation set. We can remark that on Shapeset-3 × 2, because of the task difficulty, we observe important saturations during learning, this might explain that the normalized initialization or the softsign effects are more visible.”) determining, for the at least one hidden node of the sparsely trained machine learning model, a relative importance of the at least one input feature of the subset based on the hidden node ordered saturation list, (pg. 255 left col “For each network the learning rate is separately chosen to minimize error on the validation set. We can remark that on Shapeset-3 × 2, because of the task difficulty, we observe important saturations during learning, this might explain that the normalized initialization or the softsign effects are more visible.” Also see pg. 253 “This will cause the variance of the back-propagated gradient to be dependent on the layer (and decreasing). The normalization factor may therefore be important when initializing deep networks because of the multiplicative effect through layers, and we suggest the following initialization procedure to approximately satisfy our objectives of maintaining activation variances and back-propagated gradients variance as one moves up or down the network”) the hidden node weight contribution, and a weight corresponding to the at least one input feature; (section 4.1 “This is not a new observation (Solla et al., 1988) but we find it important to stress here. We found that the plateaus in the training criterion (as a function of the parameters) are less present with the log-likelihood cost function. We can see this on Figure 5, which plots the training criterion as a function of two weights for a two-layer network (one hidden layer) with hyperbolic tangent units, and a random input and target signal. There are clearly more severe plateaus with the quadratic cost.”) and defining the input feature-level contribution to the output by at least aggregating a list of most important input features based on the relative importance of the at least one input feature for each subset of the plurality of subsets. (Pg. 255 left col “For each network the learning rate is separately chosen to minimize error on the validation set. We can remark that on Shapeset-3 × 2, because of the task difficulty, we observe important saturations during learning, this might explain that the normalized initialization or the softsign effects are more visible.” Also see pg. 253 “This will cause the variance of the back-propagated gradient to be dependent on the layer (and decreasing). The normalization factor may therefore be important when initializing deep networks because of the multiplicative effect through layers, and we suggest the following initialization procedure to approximately satisfy our objectives of maintaining activation variances and back-propagated gradients variance as one moves up or down the network”) Hoefler teaches sparsely trained machine learning model…. (section 2.4.3 “The fully-sparse training schedule starts with a sparse model and trains in the sparse regime where it may remove and add elements during the training process. Narasimha et al. (2008) showed early that this scheme can even outperform separate growing or pruning approaches for neuron-sparse training of simple MLPs. Evci et al. (2020a) achieve ResNet-50 performance for a fully sparse training schedule that is comparable to a fully-dense training but uses additional iterations. Weight-sparse training often uses complex hyperparameter settings and schedules. However, it could enable training of very high-dimensional models whose dense representations would simply not fit into the training devices. …Dynamic sparsity combines pruning and regrowth of elements during the training process, while static sparsity prunes once before the training starts and does not update the model structure during training”) Regarding claim 16 Claim 16 recites analogous limitations to dependent claim 3 and therefore is rejected on the same ground as dependent claim 3. Regarding claim 4 Glorot in view of Hoefler teaches the system of claim 3. Glorot further teaches wherein when saturation of the at least one hidden node for the training example occurs prior to reaching an end of the hidden node ordered saturation list, at least one remaining input feature of the subset is ignored. (Section 3.1 “We want to study possible saturation, by looking at the evolution of activations during training, and the figures in this section show results on the Shapeset-3 × 2 data, but similar behavior is observed with the other datasets. Figure 2 shows the evolution of the activation values (after the nonlinearity) at each hidden layer during training of a deep architecture with sigmoid activation functions. Layer 1 refers to the output of first hidden layer, and there are four hidden layers.”) Regarding claim 5 Glorot in view of Hoefler teaches the system of claim 3. Glorot further teaches wherein when saturation of the at least one hidden node for the training example fails to occur prior to reaching an end of the hidden node ordered saturation list, the at least one input feature includes all input features of the subset. (Section 2.3 “We optimized feedforward neural networks with one to five hidden layers, with one thousand hidden units per layer, and with a softmax logistic regression for the output layer. The cost function is the negative log-likelihood − log P(y|x), where (x, y) is the (input image, target class) pair. The neural networks were optimized with stochastic back-propagation on mini-batches of size ten, i.e., the average g…” see saturation on section 3.1 “We want to study possible saturation, by looking at the evolution of activations during training, and the figures in this section show results on the Shapeset-3 × 2 data, but similar behavior is observed with the other datasets. Figure 2 shows the evolution of the activation values (after the nonlinearity) at each hidden layer during training of a deep architecture with sigmoid activation functions. Layer 1 refers to the output of first hidden layer, and there are four hidden layers.”) Regarding claim 6 Glorot in view of Hoefler teaches the system of claim 1. Glorot further teaches wherein determining the ordered hidden node saturation list for the at least one hidden node includes determining a most frequently occurring subset of the plurality of subsets of the plurality of input features causing saturation of the at least one hidden node; and defining the ordered saturation list as the most frequently occurring subset of input features of the plurality of subsets of the plurality of input features. (Section 3.2 “As discussed above, the hyperbolic tangent networks do not suffer from the kind of saturation behavior of the top hidden layer observed with sigmoid networks, because of its symmetry around 0. However, with our standard weight initialization U h − √ 1 n , √ 1 n i , we observe a sequentially occurring saturation phenomenon starting with layer 1 and propagating up in the network, as illustrated in Figure 3. Why this is happening remains to be understood.” See “Figure 3: Top:98 percentiles (markers alone) and standard deviation (solid lines with markers) of the distribution of the activation values for the hyperbolic tangent networks in the course of learning. We see the first hidden layer saturating first, then the second, etc. Bottom: 98 percentiles (markers alone) and standard deviation (solid lines with markers) of the distribution of activation values for the softsign during learning. Here the different layers saturate less and do so together”) Regarding claim 17 Claim 17 recites analogous limitations to dependent claim 6 and therefore is rejected on the same ground as dependent claim 6. Regarding claim 7 Glorot in view of Hoefler teaches the system of claim 1. Glorot further teaches wherein the plurality of subsets of the plurality of input features causes hidden node saturation of the least one hidden node when a weight contribution of at least one of the plurality of subsets of the plurality of input features is greater than a predetermined saturation threshold. (Section 4.3 “Interestingly, as shown in Figure 9, these observations on the weight gradient of standard and normalized initialization change during training (here for a tanh network). Indeed, whereas the gradients have initially roughly the same magnitude, they diverge from each other (with larger gradients in the lower layers) as training progresses, especially with the standard initialization. Note that this might be one of the advantages of the normalized initialization, since having gradients of very different magnitudes at different layers may yield to ill-conditioning and slower training.”) Regarding claim 18 Claim 18 recites analogous limitations to dependent claim 7 and therefore is rejected on the same ground as dependent claim 7. Regarding claim 9 Glorot in view of Hoefler teaches the system of claim 6. Glorot further teaches wherein the weight is assigned during the training of the first machine learning model. (“Figure 5: Cross entropy (black, surface on top) and quadratic (red, bottom surface) cost as a function of two weights (one at each layer) of a network with two layers, W1 respectively on the first layer and W2 on the second, output layer” pg. 253 left col “The variances will be expressed with respect to the input, outpout and weight initialization randomness. Consider the hypothesis that we are in a linear regime at the initialization, that the weights are initialized independently and that the inputs features variances are the same (= V ar[x]). Then we can say that, with ni the size of layer i and x the network input,”) Regarding claim 10 Glorot in view of Hoefler teaches the system of claim 1. Glorot further teaches wherein the training includes inputting the plurality of input features for each of the plurality of training examples in a predetermined order or a random order. (Pg. 250 left col “We call this dataset the Shapeset-3 × 2 dataset, with example images in Figure 1 (top). Shapeset-3 × 2 contains images of 1 or 2 two-dimensional objects, each taken from 3 shape categories (triangle, parallelogram, ellipse), and placed with random shape parameters (relative lengths and/or angles), scaling, rotation, translation and grey-scale.”) Regarding claim 13 Glorot in view of Hoefler teaches the system of claim 1. Glorot further teaches wherein each of the plurality of training examples includes an input vector containing the plurality of input features. (Pg. 250 right col “…set of the tiny-images dataset that contains 50,000 training examples (from which we extracted 10,000 as validation data) and 10,000 test examples. There are 10 classes corresponding to the main object in each image: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, or truck. The classes are balanced. Each image is in color, but is just 32 × 32 pixels in size, so the input is a vector of 32 × 32 × 3 = 3072 real values.”) Regarding claim 14 Glorot in view of Hoefler teaches the system of claim 1. Glorot further teaches wherein the subset includes one or more input features of the plurality of input features. (Pg. 253 “The variances will be expressed with respect to the input, outpout and weight initialization randomness. Consider the hypothesis that we are in a linear regime at the initialization, that the weights are initialized independently and that the inputs features variances are the same (= V ar[x]). Then we can say that, with ni the size of layer i and x the network input,”) Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Glorot et al. (“Understanding the difficulty of training deep feedforward neural networks”) in view of Hoefler et al. (“Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks”) and further in view of Sung et al. (“Ranking importance of input parameters of neural networks”). Regarding claim 8 Glorot in view of Hoefler teaches the system of claim 1. Glorot in view of Hoefler does not teach wherein determining the hidden node ordered saturation list of the at least one hidden node further includes ranking each input feature of the plurality of subsets of the plurality of input features based on at least one of a weight assigned to the input feature and a frequency of the input feature. Sung teaches wherein determining the hidden node ordered saturation list of the at least one hidden node further includes ranking each input feature of the plurality of subsets of the plurality of input features based on at least one of a weight assigned to the input feature and a frequency of the input feature. (Section 6 “The main objective of this paper is to analyze the effectiveness of three methods of ranking the importance of input parameters of BPNs. We derived a sensitivity analysis formula for BPNs with two hidden layers and compare it with the method of fuzzy curves and the method based on MSE changes. A large class of modeling, control, and prediction problems in chemical, petroleum, and process engineering, etc., among which the CBQ problem appears fairly typical, is characterized by having a large number of input parameters of varying degrees of importance.”) Glorot, Hoefler and Gao are analogous art because they are all directed to machine learning. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combined training deep feedforward neural networks of Glorot with Hoefler with ranking important input parameter using neural network of Sung. One of ordinary skill in the art would have been motivated to make this modification in order to provide a method or system for “identifying the important variables is a common issue since elimination of the unimportant inputs leads to a simplification” as disclosed by (Sung abstract “This paper addresses the issue of identifying important input parameters in building a multilayer, backpropagation network for a typical class of engineering problems. These problems are characterized by having a large number of input variables of varying degrees of importance; and identifying the important variables is a common issue since elimination of the unimportant inputs leads to a simplification of the problem and often a more accurate modeling or solution. We compare three different methods for ranking input importance: sensitivity analysis, fuzzy curves, and change of MSE (mean square error); and analyze their effectiveness. Simulation results based on experiments with simple mathematical functions as well as a real engineering problem are reported.”). Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Glorot et al. (“Understanding the difficulty of training deep feedforward neural networks”) in view of Hoefler et al. (“Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks”) and further in view of Gao et al. (“Neural Network Control of a Class of Nonlinear Systems With Actuator Saturation”). Regarding claim 11 Glorot in view of Hoefler teaches the system of claim 1. Glorot in view of Hoefler does not teach wherein the operations further include: determining a hidden node of the at least one hidden node is antipolarized based on a first proportion of the plurality of training examples meeting a positive saturation threshold and a second proportion of the plurality of training examples meeting a negative saturation threshold. Gao teaches wherein the operations further include: determining a hidden node of the at least one hidden node is antipolarized based on a first proportion of the plurality of training examples meeting a positive saturation threshold and a second proportion of the plurality of training examples meeting a negative saturation threshold. (Pg. 149 “Assuming ideal saturation, mathematically, the output of the actuator is given by… where is the chosen positive and is the negative saturation limits. If the control signal falls outside the range of the actuator, actuator saturation occurs and the control input can not be fully implemented by the device”) Glorot, Hoefler and Gao are analogous art because they are all directed to machine learning. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combined training deep feedforward neural networks of Glorot with Hoefler with generating random weights and biases in feedforward neural network of Gao. One of ordinary skill in the art would have been motivated to make this modification in order to provide a method or system that is “effectively compensate for the saturation nonlinearity in the presence of system uncertainty” as disclosed by (Gao abstract “A neural net (NN)-based actuator saturation compensation scheme for the nonlinear systems in Brunovsky canonical form is presented. The scheme that leads to stability, command following, and disturbance rejection is rigorously proved and verified using a general “pendulum type” and a robot manipulator dynamical systems. Online weights tuning law, the overall closed-loop system performance, and the boundedness of the NN weights are derived and guaranteed based on Lyapunov approach. The actuator saturation is assumed to be unknown and the saturation compensator is inserted into a feedforward path. Simulation results indicate that the proposed scheme can effectively compensate for the saturation nonlinearity in the presence of system uncertainty.”). Claim(s) 12 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Glorot et al. (“Understanding the difficulty of training deep feedforward neural networks”) in view of Hoefler et al. (“Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks”) and further in view of Dudek et al. (“A Method of Generating Random Weights and Biases in Feedforward Neural Networks with Random Hidden Nodes”). Regarding claim 12 Glorot in view of Hoefler teaches the system of claim 9. Glorot in view of Hoefler does not teach wherein the operations further comprise: replacing the at least one antipolarized hidden node with a first newly created hidden node and a second newly created hidden node, and wherein the determining the hidden node ordered saturation list of the at least one hidden node includes: determining, for the first newly created hidden node, a first hidden node ordered saturation list of the plurality of input features causing positive saturation of the at least one hidden node; and determining, for the second newly created hidden node, a second hidden node ordered saturation list of the plurality of input features causing negative saturation of the at least one hidden node. Dudek teaches wherein the operations further comprise: replacing the at least one antipolarized hidden node with a first newly created hidden node and a second newly created hidden node, (pg. 10 left col “MQ - Modified Quickprop algorithm proposed in [21] that iteratively finds the appropriate parameters for the new hidden node added in the incremental procedure. The parameters of MQ were set by authors [16] as follows: learning rate = 0.05, maximum iterative number = 200.”) and wherein the determining the hidden node ordered saturation list of the at least one hidden node includes: determining, for the first newly created hidden node, a first hidden node ordered saturation list of the plurality of input features causing positive saturation of the at least one hidden node; (pg. 11-12 “SCN searches for the random parameter ranges for each new node added to the hidden layer. Thus, this ranges are optimized for each neuron. This translates into much better results than for fixed ranges, which are set without any scientific justification. But in the light of the considerations carried out in this work, assigning the same ranges for weights and biases is questionable.”) and determining, for the second newly created hidden node, a second hidden node ordered saturation list of the plurality of input features causing negative saturation of the at least one hidden node. (Pg. 10 right col “This is a variant of IRVFL with random parameters generated with an inequality constraint from the adaptively selected scope [–λ, λ], ensuring the universal approximation property of the built randomized learner model. Among three algorithmic implementations of SCN, the most accurate one was chosen, signed SC-III in [16], where the output weights are recalculated all together through solving a global least squares problem each time a new hidden node is added. Sigmoidal activation function were used for the hidden nodes. The SCN parameters were selected by authors of [16] to ensure the best performance.”) Glorot, Hoefler and Dudek are analogous art because they are all directed to machine learning. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combined training deep feedforward neural networks of Glorot with Hoefler with generating random weights and biases in feedforward neural network of Dudek. One of ordinary skill in the art would have been motivated to make this modification in order to provide an “improvement in approximation performance of the network” as disclosed by (Dudek abstract “In this work a method of generating random weights and biases is proposed. This method generates the parameters of the hidden nodes in such a way that nonlinear fragments of the activation functions are located in the input space regions with data and can be used to construct the surface approximating a nonlinear target function. The weights and biases are dependent on the input data range and activation function type. The proposed methods allows us to control the generalization degree of the model. These all lead to improvement in approximation performance of the network. Several experiments show very promising results”). Regarding claim 19 Claim 19 recites analogous limitations to dependent claim 12 and therefore is rejected on the same ground as dependent claim 12. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598. The examiner can normally be reached Mon - Fri 8:00-5:00pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at 5712707519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /VAN C MANG/Primary Examiner, Art Unit 2126
Read full office action

Prosecution Timeline

Dec 20, 2022
Application Filed
Dec 12, 2025
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12591809
MACHINE LEARNING PLATFORM
2y 5m to grant Granted Mar 31, 2026
Patent 12591830
Machine Learning-Based Approach to Identify Software Components
2y 5m to grant Granted Mar 31, 2026
Patent 12586022
Machine Learning-Based Approach to Characterize Software Supply Chain Risk
2y 5m to grant Granted Mar 24, 2026
Patent 12579444
MACHINE LEARNING MODEL GENERATION AND UPDATING FOR MANUFACTURING EQUIPMENT
2y 5m to grant Granted Mar 17, 2026
Patent 12561555
NETWORK OF TENSOR TIME SERIES
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
75%
Grant Probability
99%
With Interview (+26.9%)
3y 10m
Median Time to Grant
Low
PTA Risk
Based on 241 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month