Prosecution Insights
Last updated: April 19, 2026
Application No. 17/221,962

DEEP NEURAL NETWORK WITH REDUCED PARAMETER COUNT

Final Rejection §101§103
Filed
Apr 05, 2021
Examiner
GODO, MORIAM MOSUNMOLA
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Nokia Technologies Oy
OA Round
4 (Final)
44%
Grant Probability
Moderate
5-6
OA Rounds
4y 8m
To Grant
78%
With Interview

Examiner Intelligence

Grants 44% of resolved cases
44%
Career Allow Rate
30 granted / 68 resolved
-10.9% vs TC avg
Strong +33% interview lift
Without
With
+33.4%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
47 currently pending
Career history
115
Total Applications
across all art units

Statute-Specific Performance

§101
16.1%
-23.9% vs TC avg
§103
56.7%
+16.7% vs TC avg
§102
12.7%
-27.3% vs TC avg
§112
12.9%
-27.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 68 resolved cases

Office Action

§101 §103
DETAILED ACTION 1. This office action is in response to the Application No. 17221962 filed on 11/17/2025. Claims 1-20 are presented for examination and are currently pending. Applicant’s arguments have been carefully and respectfully considered. Response to Arguments 2. On page 11 of the remarks, the Applicant argued that “The Applicant respectfully asks the Office to explain why the pending claims are not eligible if Example 39 was found to be eligible". The Applicant noted the similarity between claim 1 and Example 39 of the subject matter eligibility examples, where similar subject matter (i.e., training a neural network) was found to be eligible under Step 2A, Prong 1, and we respectfully asked the Office to explain why the pending claims are not eligible if Example 39 was found to be eligible”. It is noted that even though the claims in Example 39 is directed towards training of a neural network, the claims in Example 39 of the subject matter eligibility examples do not recite any abstract ideas as analyzed and as a result, it is eligible. However, the present claimed invention recite “perform a growth process by building up from the initial subset of the connections …”, “randomly select additional connections …”, “reset the weights …”, “determine a training accuracy …”. These limitations are all abstract ideas as analyzed in the office action and thus, it is not eligible. On page 12 of the remarks, the Applicant argued that “The Applicant submits that at least the training of connections (i.e., the initial subset of the connections and the additional connections) of a DNN as in claim 1 "does not recite a mental process because the steps are not practically performed in the human mind". Claim 1 provides an improvement to the training of a DNN as the apparatus discovers a smaller subnetwork in the DNN having smaller parameter counts, which can be trained more efficiently (e.g., lower processing requirements) and have lower storage requirements”. Also, on page 12, the Applicant argued that “The Applicant submits that the training of the connections as in claim 1 provides a practical application, such as an improvement to the technical field of machine learning”. On page 13 of the remarks, the Applicant argued that “This trainable subnetwork that is discovered can be trained more efficiently (e.g., lower processing requirements) and has lower storage requirements compared to the "full" DNN. This is an improvement to the technical field of machine learning, indicating practical application in claim 1”. The above arguments are not persuasive because the judicial exception alone cannot provide improvement for the claims to be eligible but the improvement can be provided by the additional element(s) in combination with the recited judicial exception, MPEP 2106.05(a). Furthermore, the improvement argued by the Applicant that the apparatus discovers a smaller subnetwork in the DNN that have smaller parameter counts which can be trained more efficiently (e.g., lower processing requirements) and have lower storage requirements, is not persuasive because the subnetwork discovered is not applied to any application. To achieve the lower processing/lower storage requirements, the subnetwork discovered needs to be implemented to realize the improvements argued above. The claimed invention does not include how the discovered subnetwork is implemented to achieve the improvement argued. On page 14 of the remarks, the Applicant argued that “The improvements to the technical field of machine learning/DNNs are evident from the benefits provided by the apparatus of claim 1. One technical benefit of claim 1 is the qualified subset (i.e., subnetwork) discovered from the DNN has a reduced parameter count (i.e., less weights) compared to the DNN but still has an acceptable training accuracy (see pages 13-14, lines 31-2). Thus, the qualified subset has far lower storage and processing requirements than the full DNN, and may have applications in resource-limited devices, such as a mobile phone or a robot (see page 2, lines 8-10). Another technical benefit is discovery of the qualified subset (i.e., subnetwork) from the DNN overcomes problems associated with long and processing-intensive training of DNNs with large parameter counts (see page 9, lines 16-19). Thus, training of a subnetwork discovered from a DNN may be more efficient” and that “Thus, the Applicant submits that claim 1 as a whole provides improvements to the technical field of machine learning/DNNs, and indeed has practical application. For at least these reasons, the Applicant submits that the claim 1 is eligible at least under Step 2A Prong Two”. The Applicant’s argument above is not persuasive because the implementation of the qualified subset from the DNN appears missing in the claimed invention. The improvement argued by the Applicant requires the discovered to be applied. Furthermore, the applications mentioned above by the Applicant for the implementation of the qualified subset is not claimed. As a result, the claimed invention does not integrate the abstract idea into a practical application nor amount to significantly more than judicial exception. Applicant’s arguments regarding the prior art rejection has been considered and are moot in view of the new grounds of rejection. The Examiner is withdrawing the rejections in the previous office action 08/18/2025 because the applicant amendments necessitated the new grounds of rejection presented in this office action. Furthermore, Laszlo in view of Faibish has been applied to the independent claims. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. 3. Claims 1-20 are rejected under 35 U.S.C 101 because the claimed invention is directed towards an abstract idea without significantly more. Step 1 Independent claim 1 is directed to an apparatus, and falls into one of the four statutory categories. Step 2A, Prong 1 Claim 1 recites the following abstract ideas: initialize a Deep Neural Network (DNN) by randomly assigning initial values to weights associated with connections of the DNN (Mental process directed to initializing the weight values of the connections which can be done by evaluating values of the weights and making a judgement of what values to assign to the weight); randomly select an initial subset of the connections from the DNN based on an initial percentage of a total number of the connections such that a training accuracy of the initial subset is below an accuracy threshold (Mental process directed to randomly selected connections of the DNN based on initial number of connections. This can be performed by evaluating the initial number of connections and making a judgement on which connections to select randomly); and perform a growth process by building up from the initial subset of the connections with remaining connections from the DNN not included in the initial subset to discover a qualified subset of the connections, wherein for the growth process (Mental process directed to growing a neural network by adding connections to create another neural network connections. This can be done by evaluating the initial subset connections and additional connects and making a judgement of the qualified subset), randomly select additional connections from the remaining connections of the DNN Mental process directed to randomly selected connections of the DNN based on initial number of connections. This can be performed by evaluating the initial number of connections and making a judgement on which connections to select randomly; add the additional connections to the initial subset to form a candidate subset of the connections (Mental process directed to growing a neural network by adding connections to create a subset neural network connections. This can be done by evaluating the initial subset connections and additional connects and making a judgement on the qualified subset); reset the weights of the connections in the candidate subset to the initial values (Mental process directed to resetting the weights of the connections to the candidate subset. This can be done by evaluating the weight connections and making a judgement on the resetting of the weights); determine a training accuracy of the candidate subset (Mental process directed to determining the training accuracy of the candidate subset. This can be done by observing the accuracy of the subset and making a judgement on the accuracy), wherein the growth process is performed until the candidate subset comprises the qualified subset having a training accuracy that reaches the accuracy threshold (Mental process of performing a growth process which is done by evaluating the training accuracy and making a judgement on when the training accuracy reaches a threshold); Step 2A, Prong 2 Claim 1 recites the following additional elements: at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least (this limitation is directed to merely using a computer (processor) as a tool to perform an abstract idea. This does not integrate the abstract idea into a practical application. See MPEP 2016.05(f)) to: train the candidate subset of the connections based on a training dataset to adjust the weights of the connections in the candidate subset and (this limitation is directed to mere instructions to apply a judicial exception. This does not integrate the abstract idea into practical application). See MPEP 2016.05(f). Step 2B Claim 1 recites the following additional elements: at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least (this limitation is directed to merely using a computer (processor) as a tool to perform an abstract idea. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f)) to: train the candidate subset of the connections based on a training dataset to adjust the weights of the connections in the candidate subset and (this limitation is directed to mere instructions to apply a judicial exception. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f)), 4. Dependent claim 2 is directed to an apparatus, and falls into one of the four statutory categories. Claim 2 recites the following abstract ideas: perform a pruning process to remove a portion of the connections from the qualified subset to generate a pruned subset of the connections (Mental process directed to removing some connections out of the network, this can be performed by evaluating the connections and making a judgement on which connections will be removed), wherein the pruned subset after training with the training dataset has a training accuracy that reaches the accuracy threshold (Mental process of evaluating the training accuracy and making a judgement of when the training accuracy reaches a threshold). Claim 2 do not have any additional elements. 5. Dependent claim 3 is directed to an apparatus, and falls into one of the four statutory categories. Claim 3 recites the following abstract ideas: reset the weights of the connections in the qualified subset to the initial values (Mental process directed to setting the weight to a value. This can be performed by making a judgment of setting the value of the weight to zero); remove a portion of the connections from the qualified subset based on a prune percentage (Mental process directed to removing some connections based on a percentage value. This can be performed by making a judgement on which portion of connections from the subset to be pruned using the percentage) Claim 3 recites the following additional limitations: train the qualified subset of the connections based on the training dataset to adjust the weights of the connections in the qualified subset based on the training dataset (This limitation is directed to updating a trained neural network which is directed to mere instructions to apply an abstract idea. This does not integrate the abstract idea into a practical application. See MPEP 2106.5(f)); and Claim 3 recites the following additional limitations: train the qualified subset of the connections based on the training dataset to optimize the weights of the connections in the qualified subset based on the training dataset (This limitation is directed to updating a trained neural network which is directed to mere instructions to apply an abstract idea. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f)); and 6. Dependent claim 4 is directed to an apparatus, and falls into one of the four statutory categories. Claim 4 recite the following abstract ideas: perform multiple iterations of the pruning process (Mental process directed to removing some connections out of the network, this can be performed by evaluating the connections and making a judgement on which connections will be removed multiple times). Claim 4 do not recite any additional elements. 7. Dependent claim 5 is directed to an apparatus, and falls into one of the four statutory categories. Claim 5 recites the following abstract ideas: for the growth process, use a binary search to identify the target percentage for the qualified subset (Mental process directed to using a binary search algorithm for identification of a percentage value of the growth process. This can be performed by making a judgment on choosing the target percentage for the growth process). Claim 5 recites the following additional limitations: wherein: the qualified subset contains a target percentage of the connections of the DNN (This limitation is directed to a particular type or source of data, which is field of use. This does not integrate the abstract idea into a practical application); and the instructions when executed by the at least one processor cause the apparatus at least (this limitation is directed to merely using a computer (processor) as a tool to perform an abstract idea. This does not integrate the abstract idea into a practical application. See MPEP 2106.5(f)) to Claim 5 recites the following additional limitations: wherein: the qualified subset contains a target percentage of the connections of the DNN (This limitation is directed to a particular type or source of data, which is field of use. This does not amount to significantly more than judicial exception. See MPEP 2106.05(h)); and the instructions when executed by the at least one processor cause the apparatus at least to (this limitation is directed to merely using a computer (processor) as a tool to perform an abstract idea. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f)) 8. Dependent claim 6 is directed to an apparatus, and falls into one of the four statutory categories. Claim 6 recites the following abstract ideas: (a) identify the initial percentage as a lower bound of a search interval for the binary search (Mental process directed to using a binary search algorithm for identification of a percentage. This can be performed by evaluating the initial percentage and making a judgement on which percentage to choose as a lower bound); (b) select an upper bound of the search interval for the binary search as an upper bound percentage that is larger than the initial percentage (Mental process directed to using a binary search algorithm for selection of a percentage value as an upper bound. This can be performed by making a judgement to choose an upper bound); (c) select an intermediate percentage for the search interval (Mental process directed to using a binary search algorithm for selection of an intermediate value percentage. This can be performed by making a judgement on which intermediate percentage choose); (d) add the additional connections from the DNN to the initial subset based on the intermediate percentage to form the candidate subset of the connections (Mental process directed to adding connections from DNN to another subset of connections which can be done by evaluating the intermediate percentage and making a judgement to add additional connections); (e) reset the weights of the connections in the candidate subset to the initial values (Mental process directed to setting the weight to a value which can be done by making a judgement to set the weight connections to zero); (g) determine a training accuracy of the candidate subset (Mental process directed to determining an accuracy value. This is done by evaluating the training accuracy of the candidate subset); (h) narrow the search interval to an upper half of the search interval when the training accuracy is below the accuracy threshold (Mental process directed to binary searching an interval to an upper bound given a threshold. This can be done evaluating the training accuracy and making a judgement whether the training accuracy reaches below the accuracy threshold); and (i) narrow the search interval to a lower half of the search interval when the training accuracy meets the accuracy threshold (Mental process directed to binary searching an interval to a lower bound given a threshold. This can be done evaluating the training accuracy and making a judgement whether the training accuracy meets the accuracy threshold); and Claim 6 recites the following additional limitations: the instructions when executed by the at least one processor cause the apparatus at least (this limitation is directed to merely using a computer (processor) as a tool to perform an abstract idea This does not integrate the abstract idea into a practical application. See MPEP 2106.5(f)) to: (f) train the candidate subset of the connections based on the training dataset to adjust the weights of the connections in the candidate subset (This limitation is directed to updating a trained neural network which is directed to mere instructions to apply an abstract idea. This does not integrate the abstract idea into a practical application); repeat (c)-(i) to converge on the target percentage (This limitation is directed to performing an iterative mental process and repeating the additional elements which does not integrate the abstract idea into a practical application). Claim 6 recites the following additional limitations: the instructions when executed by the at least one processor cause the apparatus at least (This limitation is directed to merely using a computer (processor) as a tool to perform an abstract idea. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f)) to: (f) train the candidate subset of the connections based on the training dataset to adjust the weights of the connections in the candidate subset (This limitation is directed to updating a trained neural network which is directed to mere instructions to apply an abstract idea. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f)); repeat (c)-(i) to converge on the target percentage (This limitation is directed to performing an iterative mental process and repeating the additional elements which not amount to significantly more than judicial exception. See MPEP 2106.05(f)). 9. Dependent claim 7 is directed to an apparatus, and falls into one of the four statutory categories. Claim 7 recites the following abstract ideas: select a growth percentage (Mental process directed to selecting a growth percentage which can be performed by making a judgment on choosing the percentage for the growth); add the additional connections to the initial subset based on the growth percentage to form the candidate subset of the connections (Mental process of growing a neural network. This can be performed by evaluating the initial connections and making a judgment on adding the additional connections using the growth percentage); reset the weights of the connections in the candidate subset to the initial values (Mental process directed to setting the weight which can be done by making a judgement of setting the weights to zero); determine a training accuracy of the candidate subset (Mental process directed to determining training accuracy value which can be performed by evaluating the training accuracy of the subset); identify the candidate subset as the qualified subset when the training accuracy reaches the accuracy threshold (Mental process directed of identification of a subset given a threshold. This can be performed by evaluating the training accuracy of the subset and making a judgement when the training accuracy reaches a threshold); and initiate another iteration of the growth process when the training accuracy is below the accuracy threshold (Mental process of continuing the growth process by evaluating the training accuracy and judging whether the training accuracy id below threshold). Claim 7 recites the following additional limitations: train the candidate subset of the connections based on the training dataset to adjust the weights of the connections in the candidate subset to optimize the weights of the connections in the candidate subset based on the training dataset (This limitation is directed to updating a trained neural network which is directed to mere instructions to apply an abstract idea. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(f)); Claim 7 recites the following additional limitations: train the candidate subset of the connections based on the training dataset to adjust the weights of the connections in the candidate subset to optimize the weights of the connections in the candidate subset based on the training dataset (This limitation is directed to updating a trained neural network. This limitation is directed to updating a trained neural network which is directed to mere instructions to apply an abstract idea. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f)); 10. Dependent claim 8 is directed to an apparatus, and falls into one of the four statutory categories. Claim 8 recites the following abstract ideas: select the initial percentage based on a ratio of a size of the training dataset (Mental process of selecting a percentage based on ratio of a size of the dataset. This can be performed by evaluating the percentage based on ratio of the size of the dataset and making a judgement to choose a percentage), and a total number of the weights in the DNN (Mental process of selecting the total weight. This can be performed by evaluating the percentage based on ratio of the size of the dataset and making a judgement to choose the total weights). Claim 8 recites the following additional limitations: the instructions when executed by the at least one processor cause the apparatus at least (This limitation is directed to merely using a computer (processor) as a tool to perform an abstract idea. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(f)) to: Claim 8 recites the following additional limitations: the instructions when executed by the at least one processor cause the apparatus at least to: (This limitation is directed to merely using a computer (processor) as a tool to perform an abstract idea. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f)) 11. Independent claim 9 is directed to a method, and falls into one of the four statutory categories. With regards to claim 9, it is substantially similar to claim 1, and is rejected in the same manner and reasoning applying. 12. Dependent claim 10 is directed to a method, and falls into one of the four statutory categories. With regards to claim 10, it is substantially similar to claim 2, and is rejected in the same manner and reasoning applying. 13. Dependent claim 11 is directed to a method, and falls into one of the four statutory categories. With regards to claim 11, it is substantially similar to claim 3, and is rejected in the same manner and reasoning applying. 14. Dependent claim 12 is directed to a method, and falls into one of the four statutory categories. With regards to claim 12, it is substantially similar to claim 4, and is rejected in the same manner and reasoning applying. 15. Dependent claim 13 is directed to a method, and falls into one of the four statutory categories. With regards to claim 13, it is substantially similar to claim 5, and is rejected in the same manner and reasoning applying. 16. Dependent claim 14 is directed to a method, and falls into one of the four statutory categories. With regards to claim 14, it is substantially similar to claim 6, and is rejected in the same manner and reasoning applying. 17. Dependent claim 15 is directed to a method, and falls into one of the four statutory categories. With regards to claim 15, it is substantially similar to claim 7, and is rejected in the same manner and reasoning applying. 18. Dependent claim 16 is directed to a method, and falls into one of the four statutory categories. With regards to claim 16, it is substantially similar to claim 8, and is rejected in the same manner and reasoning applying. 19. Independent claim 17 is directed to a machine, and falls into one of the four statutory categories. With regards to claim 17, it is substantially similar to claim 1, and is rejected in the same manner and reasoning applying. Claim 17 further recites “a non-transitory computer readable medium comprising program instructions that, when executed by an apparatus, cause the apparatus to perform at least the following:”, this limitation is directed to merely using a computer (apparatus) as a tool to perform an abstract idea. This limitation does not integrate the abstract idea into a practical application and does not amount to significantly more. See MPEP 2106.05(f). 20. Dependent claim 18 is directed to a machine, and falls into one of the four statutory categories. With regards to claim 18, it is substantially similar to claim 2, and is rejected in the same manner and reasoning applying. 21. Dependent claim 19 is directed to a machine, and falls into one of the four statutory categories. With regards to claim 19, it is substantially similar to claim 6, and is rejected in the same manner and reasoning applying. 22. Dependent claim 20 is directed to a machine, and falls into one of the four statutory categories. With regards to claim 20, it is substantially similar to claim 7, and is rejected in the same manner and reasoning applying. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 23. Claims 1, 9, 17 are rejected under 35 U.S.C. 103 as being unpatentable over Laszlo et al. (US20210201115 filed 07/01/2021) in view of Faibish (US20210125053 Filed 10/25/2019) Regarding claim 1, Laszlo teaches an apparatus comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor (FIG. 15 is a block diagram of an example computer system 1500 that can be used to perform operations described previously. The system 1500 includes a processor 1510, a memory 1520, a storage device 1530, and an input/output device 1540. Each of the components 1510, 1520, 1530, and 1540 can be interconnected, for example, using a system bus 1550. The processor 1510 is capable of processing instructions for execution within the system 1500 [0276]), cause the apparatus at least to: initialize a Deep Neural Network (DNN) by randomly assigning initial values to weights associated with connections of the DNN (and the weight values corresponding to the connections in the architecture 302 may be determined randomly. For example, the weight value corresponding to each connection in the architecture 302 may be randomly sampled from a predetermined probability distribution, e.g., a standard Normal (N(0,1)) probability distribution [0154]; neural network architecture 302 [0132]. The Examiner notes randomly sampling of weight values corresponding to each connection indicates neural network architecture 302 is initialized); randomly select an initial subset of the connections from the DNN (randomly sampling a plurality of current graphs from the set of current graphs [0075]; the transformation engine 304 may randomly sample a set of node pairs from the graph (i.e., where each node pair specifies a first node and a second node) [0134]; The structure of the graph 108 may be used to specify the architecture of the brain emulation neural network 100. For example, each node of the graph 108 may mapped to an artificial neuron … in the brain emulation neural network 100. Further, each edge of the graph 108 may be mapped to a connection between artificial neurons, layers, or groups of layers in the brain emulation neural network 100 [0111]) based on an initial percentage of a total number of the connections (the sampling engine 830 may select (e.g., randomly sample) a set of current graphs 832 from the population of graphs 828 [0238]. The Examiner notes edges in population of graphs 828 is the initial percentage of a total number of the connections) such that a training accuracy of the initial subset is below an accuracy threshold (determining, for each sampled graph, a performance measure on the machine learning task of a neural network having a neural network architecture that is specified by the sampled graph [0075]; In some implementations, updating the set of current graphs based on the performance measures of the sampled graphs includes … having a performance measure that does not satisfy a threshold from the set of current graphs [0076]); and perform a growth process (Input and output artificial neurons that are added to the architecture 302 may be connected to the other neurons in the architecture in any of a variety of ways [0157]. The Examiner notes growth process is by adding connections) by building up from the initial subset of the connections (generating one or more new graphs based on the randomly sampled graphs [0077]) with remaining connections from the DNN not included in the initial subset (and updating the set of current graphs based on the performance measures of the sampled graphs [0075]) to discover a qualified subset of the connections (to determine a respective quality measure 834 corresponding to each of the sampled graphs 832 [0238]), wherein for the growth process (Input and output artificial neurons that are added to the architecture 302 may be connected to the other neurons in the architecture in any of a variety of ways [0157]. The Examiner notes growth process is by adding connections), the instructions when executed by the at least one processor cause the apparatus at least to (one or more computers; and one or more storage devices communicatively coupled to the one or more computers, where the one or more storage devices store instructions that, when executed by the one or more computers, cause the one or more computers to perform the operations of the method [0083]): randomly select additional connections from the remaining connections of the DNN (The random modifications may include, e.g., adding or removing edges between randomly selected pairs of nodes in the graph [0237]); add the additional connections to the initial subset (add randomly mutated (i.e., modified) copies of the sampled graphs 832 [0239]; Mutating a graph refers to making a random change to the graph, e.g., by randomly adding … edges or nodes from the graph [0208]) to form a candidate subset of the connections (After a final iteration of the plurality of iterations, each current graph in the set of current graphs is identified as a candidate graph [0075]); train the candidate subset of the connections (candidate graph 802 is received by Evaluation Engine 804, Fig. 8A; The architecture search system 800 may use an evaluation engine 804 to determine a quality measure 806 for each candidate graph 802 that characterizes the performance of the neural network architecture specified by the candidate graph on the machine learning task [0203]) based on a training dataset (The evaluation engine 804 may measure the performance of a neural network on a machine learning task, e.g., by training the neural network on a set of training data 814 [0211], Fig. 8A) to adjust the weights of the connections in the candidate subset (updating the candidate graph to cause the candidate graph to satisfy a corresponding constraint [0073]; The current values of the transformation operation parameters are updated based at least in part on the performance measure [0082]; parameter (weight) values [0181]); and determine a training accuracy of the candidate subset (In some implementations, determining a performance measure on a machine learning task of a neural network having a neural network architecture that is specified by a candidate graph [0078]), wherein the growth process is performed until the candidate subset comprises the qualified subset having a training accuracy that reaches the accuracy threshold (In some implementations, selecting a final neural network architecture for performing the machine learning task based on the performance measures includes selecting the neural network architecture specified by the candidate graph associated with the highest performance measure [0079]; … to achieve an threshold level of performance (e.g., prediction accuracy) [0145]). Laszlo does not explicitly reset the weights of the connections in the candidate subset to the initial values; Faibish teaches initialize a Deep Neural Network (DNN) by randomly assigning initial values to weights associated with connections of the DNN (The step 1204 may be performed prior to training the neural network using a training data set. The initialization processing of the step 1204 may include specifying initial values … For example, initial values may be specified for the weights applied to the synaptic connections or inputs to the neurons … Additionally, values may be specified for one or more other parameters affecting the neural network [0090], Fig. 12) reset the weights of the connections in the candidate subset to the initial values (NN1 1302 may be reset or reinitialized. Such resetting or reinitializing NN1 1302 may include reinitializing the weights and bias values of NN1 1302 to be as they were prior [0129]); It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Laszlo to incorporate the teachings of Faibish for the benefit of detecting that one or more of the threshold conditions to reconfigure the NN undergoing retraining include adding a node to the first NN [0172] to enable accurate prediction of the desired outputs for particular corresponding inputs (Faibish [0087]) Regarding claim 9, claim 9 is similar to claim 1. It is rejected in the same manner and reasoning applying. Regarding claim 17, claim 17 is similar to claim 1. It is rejected in the same manner and reasoning applying. Further, Laszlo teaches a non-transitory computer readable medium comprising program instructions that, when executed by an apparatus, cause the apparatus to perform at least the following (one or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform the operations of the method of any preceding aspect [0084]) 24. Claims 2-4, 10-12, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Laszlo et al. (US20210201115 filed 07/01/2021) in view of Faibish (US20210125053 filed 10/25/2019) and further in view of Dai et al. (Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks, arXiv:1905.10952v1 [cs.NE] 27 May 2019, hereinafter “Dai NPL”) Regarding claim 2, Laszlo and Faibish teaches the apparatus of claim 1, however they do not teach the limitation of claim 2. Dai NPL teaches wherein the at instructions when executed by the at least one processor cause the apparatus at least to (We implement our framework using PyTorch on Nvidia GeForce GTX 1060 GPU (with 1.708 GHz frequency and 6 GB memory) and Tesla P100 GPU (with 1.329 GHz frequency and 16 GB memory), pg. 6, left col., section 5): perform a pruning process to remove a portion of the connections from the qualified subset to generate a pruned subset of the connections (In the pruning process, we remove a connection w by setting its value as well as the value of its corresponding mask to 0 if and only if the following condition is satisfied: PNG media_image1.png 26 393 media_image1.png Greyscale where β is a pre-defined pruning ratio. Typically, we use 3 ≤ β ≤ 5 in our experiments. Note that connection pruning is an iterative process. In each iteration, we prune the weights that have the smallest values (e.g., smallest 5%), pg. 5, right col. section 4.3.1), wherein the pruned subset after training with the training dataset has a training accuracy that reaches the accuracy threshold (and retrain the network to recover its accuracy. Once the desired accuracy is achieved, we start the next pruning iteration, pg. 5, right col. section 4.3.1). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Laszlo and Faibish to incorporate the teachings of Dai NPL for the benefit of an incremental learning framework based on a grow-and-prune neural network synthesis paradigm which improves accuracy, shrinks network size, and significantly reduces the additional training cost for incoming data (Dai NPL) Regarding claim 3, Laszlo, Faibish and Dai NPL teaches the apparatus of claim 2, Dai NPL teaches wherein for the pruning process, the instructions when executed by the at least one processor cause the apparatus at least to (We implement our framework using PyTorch on Nvidia GeForce GTX 1060 GPU (with 1.708 GHz frequency and 6 GB memory) and Tesla P100 GPU (with 1.329 GHz frequency and 16 GB memory), pg. 6, left col., section 5): reset the weights of the connections in the qualified subset to the initial values (In the pruning process, we remove a connection w by setting its value as well as the value of its corresponding mask to 0 if and only if the following condition is satisfied: PNG media_image1.png 26 393 media_image1.png Greyscale where β is a pre-defined pruning ratio. Typically, we use 3 ≤ β ≤ 5 in our experiments. Note that connection pruning is an iterative process. In each iteration, we prune the weights that have the smallest values (e.g., smallest 5%), pg. 5, right col. section 4.3.1); train the qualified subset of the connections (Connection growth and parameter training are interleaved in the growth phase, where we periodically conduct connection growth during training, pg. 4, right col., first para.) based on the training dataset to adjust the weights of the connections in the qualified subset (In the incremental learning experiments, we start with one part to train the initial model for subsequent updates. We then add one part as new data each time in the incremental learning scenario. For each update, we perform growth on new data and all data for 15 epochs and 20 epochs in the growth phase, respectively); and remove a portion of the connections from the qualified subset based on a prune percentage (we remove a connection w by setting its value as well as the value of its corresponding mask to 0 if and only if the following condition is satisfied: PNG media_image1.png 26 393 media_image1.png Greyscale where β is a pre-defined pruning ratio. Typically, we use 3 ≤ β ≤ 5 in our experiments. Note that connection pruning is an iterative process. In each iteration, we prune the weights that have the smallest values (e.g., smallest 5%), pg. 5, right col. section 4.3.1). The same motivation to combine dependent claim 2 applies here. Regarding claim 4, Laszlo, Faibish and Dai NPL teaches the apparatus of claim 3, Dai NPL teaches wherein the instructions when executed by the at least one processor cause the apparatus at least to (We implement our framework using PyTorch on Nvidia GeForce GTX 1060 GPU (with 1.708 GHz frequency and 6 GB memory) and Tesla P100 GPU (with 1.329 GHz frequency and 16 GB memory), pg. 6, left col., section 5): perform multiple iterations of the pruning process (Note that connection pruning is an iterative process, pg. 5, right col. section 4.3.1). The same motivation to combine dependent claim 2 applies here. Regarding claim 10, claim 10 is similar to claim 2. It is rejected in the same manner and reasoning applying. Regarding claim 11, claim 11 is similar to claim 3. It is rejected in the same manner and reasoning applying. Regarding claim 12, claim 12 is similar to claim 4. It is rejected in the same manner and reasoning applying. Regarding claim 18, claim 18 is similar to claim 2. It is rejected in the same manner and reasoning applying. 25. 7, 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Laszlo et al. (US20210201115 filed 07/01/2021) in view of Faibish (US20210125053 filed 10/25/2019) in view of Dai et al. (Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks, arXiv:1905.10952v1 [cs.NE] 27 May 2019, hereinafter “Dai NPL”) and further in view of Dai et al. (US20210133540 filed 03/14/2019) Regarding claim 7, Laszlo and Faibish teaches the apparatus of claim 1, Laszlo teaches wherein for the growth process, the instructions when executed by the at least one processor cause the apparatus at least (Input and output artificial neurons that are added to the architecture 302 may be connected to the other neurons in the architecture in any of a variety of ways [0157]. The Examiner notes growth process is by adding connections): identify the candidate subset as the qualified subset when the training accuracy reaches the accuracy threshold (identifying one or more of the sampled graphs having the highest performance measures [0077]) initiate another iteration of the growth process when the training accuracy is below the accuracy threshold (In some implementations, updating the set of current graphs based on the performance measures of the sampled graphs includes … having a performance measure that does not satisfy a threshold from the set of current graphs [0076]). Faibish teaches reset the weights of the connections in the candidate subset to the initial values ((NN1 1302 may be reset or reinitialized. Such resetting or reinitializing NN1 1302 may include reinitializing the weights and bias values of NN1 1302 to be as they were prior [0129]); The same motivation to combine independent claim 1 applies here. Laszlo and Faibish do not explicitly teach select a growth percentage; add additional connections to the initial subset based on the growth percentage to form a candidate subset of the connections; train the candidate subset of the connections based on the training dataset to optimize the weights of the connections in the candidate subset; determine a training accuracy of the candidate subset Dai teaches select a growth percentage (Growth policy: Activate a dormant ω in W iff |ω.grad| is larger than the (100α)th percentile of all elements in |W.grad| [0056]); add additional connections to the initial subset based on the growth percentage to form a candidate subset of the connections (Then each dormant connection whose gradient magnitude |ω.grad|=|∂L/∂w| surpasses the (100α)th percentile of the gradient magnitudes of its corresponding weight matrix is activated [0059]); train the candidate subset of the connections based on the training dataset (The network growth phase allows a CNN to grow neurons, connections, and feature maps, as necessary, during training [0052]; The AN4 dataset is used to evaluate the performance of the DeepSpeech2 architecture. It contains 948 training utterances [0084]; H-LSTMs were GP-trained for image captioning and speech recognition applications [0098]) to adjust the weights of the connections in the candidate subset (It is shown how the mask Msk and weight matrix W is updated in the gradient-based growth and magnitude-based pruning process in the methodology in FIGS. 5 [0061]); determine a training accuracy of the candidate subset (growth phase 40, a leaky ReLU is adopted as the activation function for H * in Eq. (4). A reverse slope s of 0.01 is chosen in one embodiment. Then, for the activation function shift 42, all of the activation functions are changed from leaky ReLU to ReLU while keeping the weights unchanged. This may incur a minor accuracy drop. The network is retrained to recover performance [0064]); It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Laszlo and Faibish to incorporate the teachings of Dai for the benefit of using grow-and-prune (GP) training which is employed to iteratively adjust the hidden layers through gradient-based growth and magnitude-based pruning of connections (Dai [0034]) Regarding claim 15, claim 15 is similar to claim 7. It is rejected in the same manner and reasoning applying. Regarding claim 20, claim 20 is similar to claim 7. It is rejected in the same manner and reasoning applying. 26. Claims 5, 6, 13, 14 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Laszlo et al. (US20210201115 filed 07/01/2021) in view of Faibish (US20210125053 filed 10/25/2019) in view of Dai et al. (US20210133540 filed 03/14/2019) and further in view Peranandam et al. (US20200202214 filed 12/20/2018) Regarding claim 5, Laszlo and Faibish teaches the apparatus of claim 1, however, they do not teach the limitations of claim 5. Dai teaches wherein: the qualified subset contains a target percentage of the connections of the DNN (For compactness, an accuracy threshold for both GP training and the pruning-only process is set to 10.52% [0091]); and for the growth process (The network growth phase allows a CNN to grow neurons, connections, and feature maps, as necessary, during training [0052]), the instructions when executed by the at least one processor cause the apparatus at least (a processor configured to perform a method for generating an optimal hidden-layer long short-term memory (H-LSTM) architecture is disclosed [0013]) to use a search to identify the target percentage for the qualified subset (Thus, it enables automated search in the architecture space [0052]). Laszlo and Faibish and Dai do not explicitly teach a binary search to identify the target percentage for the qualified subset Peranandam teaches binary search to identify the target percentage for the qualified subset (Within each iteration, the example process 700 includes a binary search of the ranking (operation 708) of neurons, wherein, in this example, a rank of 1 identifies a highly enable neuron 705 and a rank of 100 identifies the lowest enable neurons 707 [0074]) It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Laszlo, Faibish and Dai to incorporate the teachings of Peranandam for the benefit of a DNN optimization methodology that eliminates neurons through neuron analysis to reduce computing power and resource requirements of a trained DNN while balancing accuracy requirements. (Peranandam [0001]) Regarding claim 6, Laszlo, Faibish, Dai and Peranandam teaches the apparatus of claim 5, Peranandam teaches wherein for the binary search, the instructions when executed by the at least one processor cause the apparatus at least to (In the binary search of the ranking (operation 708), the process 700 also includes calculating half of the interval rank [0075]): (a) identify the initial percentage as a lower bound of a search interval for the binary search (In one embodiment, to designate some of the lower ranked neurons for removal the processing system is configured to identify the lowest ranked neurons that fall within a predetermined neuron reduction limit [0023]; a reduction limit 703 (e.g., 40%) is input to set a binary search limit (operation 704) for identifying the maximum number of neurons that may be eliminated to produce a lean DNN [0074]; The half interval rank 709 can be determined by identifying the lower limit 711 … of the neuron ranking 715 [0075]); (b) select an upper bound of the search interval for the binary search as an upper bound percentage that is larger than the initial percentage (The half interval rank 709 can be determined by identifying the … upper limit 713 of the neuron ranking 715 [0075]; After calculating half of the interval rank, the example process includes setting as a new DNN (e.g., DNNnew), the active DNN (e.g., DNNact) wherein a traversal of DNNact is performed to eliminate Neurons Ni where the Rank(Ni)>=Half int. Rank 709 (operation 710)[0076]); (c) select an intermediate percentage for the search interval (In the binary search of the ranking (operation 708), the process 700 also includes calculating half of the interval rank [0075]; 5% is the accuracy threshold that has been set in this example [0078]); (d) add the additional connections from the DNN to the initial subset based on the intermediate percentage to form the candidate subset of the connections (In other words, the neurons with a rank equal to or higher than the neuron at the half interval rank 709 are eliminated from the DNN to form a new DNN. After neuron elimination, an accuracy evaluation of DNNnew is performed (operation 712) [0076]); (e) reset the weights of the connections in the candidate subset to the initial values (Patterns are presented to the network via an input layer 506, which communicates to one or more hidden layers 508 where the actual processing is done via a system of weighted connections. The activation function 504 identifies weights that are applied to inputs to the associated neuron to generate an output [0068]; The process 800 may further include retraining the DNN with the removed neurons using a data training set used to train the DNN before the removal of neurons. [0087]); (f) train the candidate subset of the connections based on the training dataset to adjust the weights of the connections in the candidate subset (using a data training set used to train the DNN before the removal of neurons [0026]; The removal of neurons and the accuracy check is performed iteratively to allow the example neuron elimination selection module 318 to remove just enough neurons to stay within the accuracy threshold limit [0060]); (g) determine a training accuracy of the candidate subset (and perform an accuracy analysis (operation 322) to ensure that the removal of neurons does not result in the accuracy of the lean DNN 304 falling outside of an accuracy threshold limit [0060]); (h) narrow the search interval to an upper half of the search interval when the training accuracy is below the accuracy threshold (In the binary search of the ranking (operation 708), the process 700 also includes calculating half of the interval rank. The half interval rank 709 can be determined by identifying the … upper limit 713 of the neuron ranking 715 [0075]; If the accuracy drop is less than 5% (yes at decision 714), then the Upper limit 713 for the next iteration is set at the half interval rank 709 from the last iteration and the DNNact for the next iteration is set to be equal to the DNNnew for the last iteration and a new iteration is begun with operation 708 (operation 715) [0078]); and (i) narrow the search interval to a lower half of the search interval when the training accuracy meets the accuracy threshold (In the binary search of the ranking (operation 708), the process 700 also includes calculating half of the interval rank. The half interval rank 709 can be determined by identifying the lower limit 711 … of the neuron ranking 715 [0075]; If the accuracy drop is not less than 5% (no at decision 714), then the Lower limit 711 for the next iteration is set at the half interval rank 709 from the last iteration and a new iteration is begun with operation 708 (operation 717) [0078]); and repeat (c)-(i) to converge on the target percentage (After completion of the preset number of iterations (e.g., 20), operation 706 concludes and the DNNact for the last iteration is output as the lean DNN (operation 716). The process 700 then stops (718) [0079]). The same motivation to combine dependent claim 5 applies here. Regarding claim 13, claim 13 is similar to claim 5. It is rejected in the same manner and reasoning applying. Regarding claim 14, claim 14 is similar to claim 6. It is rejected in the same manner and reasoning applying. Regarding claim 19, claim 19 is similar to claim 6. It is rejected in the same manner and reasoning applying. 27. Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Laszlo et al. (US20210201115 filed 07/01/2021) in view of Faibish (US20210125053 filed 10/25/2019) in view of Dai et al. (Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks, arXiv:1905.10952v1 [cs.NE] 27 May 2019, hereinafter “Dai NPL”) and further in view of Timofejevs et al. (US20210406662 filed 03/09/2021) Regarding claim 8, Laszlo and Faibish teaches the apparatus of claim 1, Laszlo and Faibish do not teach the limitations of claim 8. Dai NPL teaches wherein the instructions when executed by the at least one processor cause the apparatus at least to: select the initial percentage based on a ratio of a size of the training dataset (In Table 2, the initial model is trained on 90% of the MNIST training data. New data and all data refer to the remaining 10% of training data and the entire MNIST training set, respectively. To reach the same target accuracy of 98.67%, our proposed method only requires 15 and 20 training epochs first on new data and then on all data, respectively. Since the number of training instances in new data is 10× smaller than in all data, pg. 5, left col. first para.), and It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Laszlo and Faibish to incorporate the teachings of Dai NPL for the benefit of an incremental learning framework based on a grow-and-prune neural network synthesis paradigm which improves accuracy, shrinks network size, and significantly reduces the additional training cost for incoming data (Dai NPL) Laszlo, Faibish and Dai NPL does not explicitly teach select a total number of the weights in the DNN. Timofejevs teaches select a total number of the weights in the DNN (Number of layers in T-NN constructed by means of the algorithm Neuron2TNN1 is h=┌logNK┐. The total number of weights in T-NN is: PNG media_image2.png 34 275 media_image2.png Greyscale [0224]. The Examiner notes that total number of weights that is calculated is selected). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Laszlo, Faibish and Dai NPL to incorporate the teachings of Timofejevs for the benefit of hardware implementations that consume less than 50 milliwatts of power (Timofejevs [0003]) Regarding claim 16, claim 16 is similar to claim 8. It is rejected in the same manner and reasoning applying. Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 8am-5pm EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle T Bechtold can be reached on (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /M.G./Examiner, Art Unit 2148 /MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148
Read full office action

Prosecution Timeline

Apr 05, 2021
Application Filed
Jul 19, 2024
Non-Final Rejection — §101, §103
Oct 18, 2024
Applicant Interview (Telephonic)
Oct 18, 2024
Examiner Interview Summary
Oct 25, 2024
Response Filed
Feb 08, 2025
Final Rejection — §101, §103
Apr 16, 2025
Response after Non-Final Action
Jun 14, 2025
Request for Continued Examination
Jun 20, 2025
Response after Non-Final Action
Aug 11, 2025
Non-Final Rejection — §101, §103
Nov 13, 2025
Examiner Interview Summary
Nov 13, 2025
Applicant Interview (Telephonic)
Nov 17, 2025
Response Filed
Jan 24, 2026
Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602586
SUPERVISORY NEURON FOR CONTINUOUSLY ADAPTIVE NEURAL NETWORK
2y 5m to grant Granted Apr 14, 2026
Patent 12530583
VOLUME PRESERVING ARTIFICIAL NEURAL NETWORK AND SYSTEM AND METHOD FOR BUILDING A VOLUME PRESERVING TRAINABLE ARTIFICIAL NEURAL NETWORK
2y 5m to grant Granted Jan 20, 2026
Patent 12511528
NEURAL NETWORK METHOD AND APPARATUS
2y 5m to grant Granted Dec 30, 2025
Patent 12367381
CHAINED NEURAL ENGINE WRITE-BACK ARCHITECTURE
2y 5m to grant Granted Jul 22, 2025
Patent 12314847
TRAINING OF MACHINE READING AND COMPREHENSION SYSTEMS
2y 5m to grant Granted May 27, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

5-6
Expected OA Rounds
44%
Grant Probability
78%
With Interview (+33.4%)
4y 8m
Median Time to Grant
High
PTA Risk
Based on 68 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month