Last updated: April 19, 2026
Application No. 17/688,654
COMPILER-BASED NEURON-AWARE DEEP NEURAL NETWORK ENSEMBLE TRAINING

Final Rejection §103§112
Filed
Mar 07, 2022
Examiner
WELCH, JENNIFER N
Art Unit
2143
Tech Center
2100 — Computer Architecture & Software
Assignee
Northwestern University
OA Round
2 (Final)
Interview Optional

— +29.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 334 resolved cases, 2023–2026
Examiner Intelligence

WELCH, JENNIFER N View full profile →
Grants 75% — above average
Career Allow Rate
249 granted / 334 resolved
+19.6% vs TC avg
Strong +29% interview lift
Without
With
+29.1%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
24 currently pending
Career history
358
Total Applications
across all art units
Statute-Specific Performance

§101
16.8%
-23.2% vs TC avg
§103
40.6%
+0.6% vs TC avg
§102
16.3%
-23.7% vs TC avg
§112
18.5%
-21.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 334 resolved cases
Office Action

§103 §112
DETAILED ACTION

Remarks
Claims 1-20 have been examined and rejected. This Office action is responsive to the amendment filed on 10/28/2025, which has been entered in the above identified application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 1, claim 1 recites “wherein the compiler ensembles the DNN through analyzing, identifying and removing any inter-network neuron redundancy in the Nnetworks to obtain both a savings in training time constraints and a reduction in an original memory footprint”.  The relationship between these elements is unclear.  It is unclear whether the limitation is intended to recite ensembling through 1) analyzing, 2) identifying, 3) removing any inter-network redundancy.  It is unclear whether the limitation is intended to recite ensembling through 1) analyzing any inter-network redundancy, 2) identifying the any inter-network redundancy 3) and removing the any inter-network redundancy.  It is further unclear which previous limitation “to obtain” is intended to modify (ensembles, analyzing, identifying, removing) For the purposes of examination, this limitation is interpreted as: wherein the compiler ensembles the DNN through a first step of analyzing, a second step of identifying and a third step of removing any inter-network neuron redundancy in the Nnetworks, wherein a savings in training time constraints and a reduction in an original memory footprint are obtained

Regarding claims 1 and 11, the claims recite “identify and remove dead neurons from the DNN”.  It is unclear whether the limitations are intended to recite steps of 1) identify 2) remove dead neurons or 1) identify dead neurons 2) remove the dead neurons.  It is further unclear whether the limitations are intended to recite “remove, from the DNN, dead neurons” or “remove dead neurons, wherein the dead neurons are from the DNN”.  For the purposes of examination, this limitation is interpreted as: identify dead neurons and remove, from the DNN, the dead neurons.

Regarding claims 1 and 11, the claims recite “additional DNNs of the N network other than the main DNN and the peer DNN”.  It is unclear how “the N network other than the main DNN and the peer DNN” is intended to relate the previously recited N Networks.  For the purposes of examination, this limitation is interpreted as: additional DNNs of an N network other than the main DNN and the peer DNN

Regarding claim 11, claim 11 recites “a plurality of neurons ni... nx, wherein each neuron ni being a computation node in a DNN” and “each neuron of the plurality of neurons ni... nx being a computation node comprised in the N networks”.  It is unclear how “each neuron ni” relates to “ach neuron of the plurality of neurons ni... nx” and whether they are the same or different computation nodes the same or different network.  For the purposes of examination, these limitations are interpreted as: a plurality of neurons, wherein each neuron of the plurality of neurons is a computation node.

Regarding claim 11, claim 11 further recites “utilize the plurality of inputs i...I to train the N networks to ensemble the DNN through analyzing, identifying and removing inter-network neuron redundancy in the N networks to obtain”.  The relationship between these elements is unclear.  It is unclear whether “to ensemble” refers the inputs, or the N networks.  It is unclear whether the analyzing, identifying and removing refer to the utilizing, the training, or the ensembling.  It is further unclear whether inter-network neuron redundancy is intended to refer only the removing, or whether it is intended to also refer to the analyzing and the identifying.  It is unclear which previous limitation or limitations “to obtain” is intended to refer.  For the purposes of examination, this limitation is interpreted as: utilize the plurality of inputs i...I to train the N networks, ensembling the DNN, a first step of analyzing, a second step of identifying, a third step of removing inter-network neuron redundancy in the N networks, and obtaining

Regarding claims 2-10 and 12-20, claims 2-10 and 12-20 are also rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for depending on an indefinite parent claim.  Claims 3 and 13 recite similar “the analyzing, identifying and removing of the inter-network neuron redundancy” limitations, which are likewise rejected and interpreted.  Claims 9 and 19 recite similar “the identifying and the removing of the inter-network neuron redundancy” limitations, which are likewise rejected and interpreted.

Regarding claims 4 and 14, claims 4 and 14 recite “the first step of eliminating a redundancy to train the DNN through the dead neuron analysis (DNA) and the dead neuron elimination (DNE) to logically partition the N networks into at least the main DNN and the peer DNN”.  It is unclear how this limitation is intended to relate to the previously recited first step of eliminating redundancy from the main DNN and a peer DNN through a dead neuron analysis (DNA) and a dead neuron elimination (DNE).  It is further unclear how to logically partition is intended to relate to the eliminating redundance and the training.  For the purposes of examination, this limitation is interpreted as: carrying out a second first step of eliminating redundancy, training the DNN through the dead neuron analysis (DNA) and the dead neuron elimination (DNE), and logically partitioning the N networks into at least the main DNN and the peer DNN

Regarding claims 5 and 15, claims 5 and 15 recite “the main DNN that is free from the dead neurons”.  It is unclear how this limitation whether this limitation is intended to refer to the previously recited main DNN or a different main DNN.  For the purposes of examination, this limitation is interpreted as: a second main DNN that is free from the dead neurons

Claims 5 and 15 further recite “subjected to neuron dependence analysis (NDA) to identify connected neurons that are caused to fire together according to different neuron activation functions provided to a fraction of the plurality of inputs”.  It is unclear as to which previous limitation “according to different neuron activation functions” is intended to refer.  It is unclear as to which previous limitation “provided to a fraction of the plurality of inputs” is intended to refer.  For the purposes of examination, this limitation is interpreted as: subjected to neuron dependence analysis (NDA) to identify connected neurons that are caused to fire together, wherein different neuron activation functions are provided to a fraction of the plurality of inputs.  

Claims 5 and 15 further recite “connected neurons that are caused to fire together” and “the connected neurons that fire together are linked together by an edge in a neuron dependence graph (NDG)”.  The relationship between these elements is unclear.  For the purposes of examination, these limitations are interpreted as: first connected neurons that are caused to fire together and second connected neurons that fire together are linked together by an edge in a neuron dependence graph (NDG)

Regarding claims 8 and 18, claims 8 and 18 recite “the output combining function as inputs”.  The claims do not previously recite an output combining function as inputs.  For the purposes of examination, this limitation is interpreted as: an output combining function as an input.

Regarding claims 9 and 19, claims 9 and 19 recite “a combination of Python code written to interact with a tensor flow stack and C++ code written to implement the identifying and the removing of the inter-network neuron redundancy”.  It is unclear whether “combination” refers to only to the Python code, or whether it is intended to refer to both the Python code and the C++ code.  It is unclear whether written refers to the combination or the Python code.  It is unclear whether the Python is written to interact with both the tensor flow stack and the C++ code.  For the purposes of examination, this limitation is interpreted as: Python code and C++ code written to implement the identifying and removing of the inter-network neuron redundancy, wherein the Python code is written to interact with a tensor flow stack

Regarding claims 10 and 20, claims 10 and 20 recite “the plurality of neurons ni... nx comprised in the DNN”.  It is unclear whether “the plurality of neurons ni... nx comprised in the DNN” refers any of the previously recited neurons of the parent claims.  For the purposes of examination, this limitation is interpreted as: a second plurality of neurons ni... nx comprised in the DNN

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 3, 11, 12, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Alhalabi et al. (Alhalabi, Besher, Mohamed Medhat Gaber, and Shadi Basurra. "EnSyth: A pruning approach to synthesis of deep learning ensembles." 2019 IEEE international conference on systems, man and cybernetics (SMC). IEEE, 2019.), hereinafter Alhalabi, in view of Han et al. (Han, Song, et al. "Learning both weights and connections for efficient neural network." Advances in neural information processing systems 28 (2015)), hereinafter Han, in further view of Bian et al. (Bian, Yijun, et al. "Sub-Architecture Ensemble Pruning in Neural Architecture Search." arXiv e-prints (2019): arXiv-1910, published 10/01/2019), hereinafter Bian.

Regarding claim 11, Alhalabi teaches the claim comprising:
A system for training a Deep Neural Network ensemble, comprising: comprising a plurality of neurons ni... nx, wherein each neuron ni being a computation node in a DNN; and at least a processor in a computer that executes program code comprising a compiler which is stored in a non-transitory computer-readable medium, wherein the compiler is enabled to configure a Deep Neural Network (DNN) into N networks, including at least a main DNN of the N networks and a peer DNN of the N networks, to perform training steps, comprising (Alhalabi 1, In this paper, we propose EnSyth an approach to synthesise deep learning ensembles from a baseline model, leading to boosting up its performance, in terms of accuracy and inference time; 3 C Fig. 1 illustrates EnSyth, it could be summarised as: • train a baseline model (plurality of neurons shown); • prune the baseline model with different values of the hyperparameters to formulate the solution’s space; • synthesis deep learning ensembles that belong to the existing space; • apply backward elimination on the composed ensembles to select the best performing ensembles; 3 D, Our goal is to have accurate ensembles that have the lowest possible number of models without compromising the predictability of the ensembles; Fig. 1 illustrates EnSyth, it could be summarised as: • train a baseline model; • prune the baseline model with different values of the hyperparameters to formulate the solution’s space; • synthesis deep learning ensembles that belong to the existing space; • apply backward elimination on the composed ensembles to select the best performing ensembles; 4 C, Our experiment has been conducted on: OS: Ubuntu Desktop 18.04.1 LTS, CPU: Intel KVM 64bit 2,400 GHz, CASH: 16384 KB, RAM:32 GB, Python:3.6.7, TensorFlow:1.10.0, Keras:2.2.4):
receive a plurality of inputs i...I by the plurality of neurons ni... nx, wherein each neuron of the plurality of neurons ni... nx being a computation node comprised in the N networks (Alhalabi 1, EnSyth works by applying multiple sets of pruning methods by varying the corresponding hyperparameters. This, in turn, leads to a pool of diverse pruned deep learning models. Using such a pool to form deep learning ensembles, will have a powerful potential of boosting up the accuracy because of the awarded diversity from varying different values for pruning method’s hyperparameters. The number of possible ensembles that can be formed is 2 m −1, where m is the number of pruned models; 3, we first introduce the topology of the feed-forward neural network models, then we explain the pruning method [37] that has been used to generate the compressed deep learning models; 3 A, the training of a network is done using xp training example; Suppose X ∈ R N×P is a one dimensional matrix represents the training samples as X = [x1, · · · , xP ], L is a layer in the network; 3 B, Net-Trim is a post-processing pruning framework, this means it prunes a network after a training process; 3 C, Synthesising sets of divers compressed models into ensemble predictions is a critical element in our approach because ensemble learning will not only allow to create better classifiers; Let m a pruned model generated by Net-Trim and the decision of the model mi about a class wj is defined as: yi,j ∈ {0, 1} where: • i = 1, 2, · · · , N : N is the number of the classifiers. • j = 1, 2, · · · , C : C is the number of classes. if mi predicted correctly a class wj then yi,j = 1 otherwise yi,j = 0. We use plurality voting [38] as a simple ensemble learning technique to synthesise the classifiers);
 and utilize the plurality of inputs i...I to train the N networks to ensemble the DNN through analyzing, identifying and removing any inter-network neuron redundancy in the N networks to obtain both a savings in training time constraints and a reduction in an original memory footprint (Alhalabi abs. Deep neural networks have achieved state-of-art performance in many domains including computer vision, natural language processing and self-driving cars. However, they are very computationally expensive and memory intensive which raises significant challenges when it comes to deploy or train them on strict latency applications or resource-limited environments; we generate a set of diverse compressed deep learning models using different hyperparameters for a pruning method, after that we utilize ensemble learning to synthesise the outputs of the compressed models to compose a new pool of classifiers; 1, elimination of the impact of compression on deep learning models, by producing compressed models with better predictability measures; 2, most of those pruning techniques depend on the famous L1 and L2 regularisation [5] [6] which take a very long time to cover; 3 B, Net-Trim removes the redundant connections and redirect the processing to a small group of important connections; 3 C, we suggest backward elimination scheme to remove the models with reduced predictability levels and find the optimal combinations that achieve better result; 3 D, Our goal is to have accurate ensembles that have the lowest possible number of models without compromising the predictability of the ensembles; Fig. 1 illustrates EnSyth, it could be summarised as: • train a baseline model; • prune the baseline model with different values of the hyperparameters to formulate the solution’s space; • synthesis deep learning ensembles that belong to the existing space; • apply backward elimination on the composed ensembles to select the best performing ensembles; 4 D, The following is a summary statistics about the size of the 36 pruned models where only the weights and biases are saved as compressed numpy arrays [42]: 1) CIFAR-10: Max:4.2 MB, Min:1.6 MB, Avg: 2.94 MB (baseline: 4.8 MB); 2) CIFAR-5: Max: 4.2 MB, Min: 1.3 MB, Avg: 2.89 MB (baseline: 4.8 MB); 3) MNIST-FASHION: Max:7.7 MB, Min: 0.88 MB, Avg: 2.56 MB (baseline: 7.7MB); 4 I, As shown in Table. II, all of the solution space models have a smaller number of trainable parameters than the baseline model in the three data sets; 4 J, with a small number of models in an ensemble, the classifier can achieve high predictability levels. This is particularly interesting when it comes to deploying those ensembles on smartphones and Internet of Things (IoT) devices, because the ensemble size is small compared to the baseline model)
However, Alhalabi fails to expressly disclose wherein the compiler is configured to identify and remove dead neurons from the DNN.  In the same field of endeavor, Han teaches:
wherein the compiler is configured to identify and remove dead neurons from the DNN (Han 1, The model size reduction from pruning also facilitates storage and transmission of mobile applications incorporating DNNs; 3, Our pruning method employs a three-step process, as illustrated in Figure 2, which begins by learning the connectivity via normal network training; 3.5, After pruning connections, neurons with zero input connections or zero output connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron. The retraining phase automatically arrives at the result where dead neurons will have both zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have no contribution to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons will be automatically removed during retraining)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein the compiler is configured to identify and remove dead neurons from the DNN as suggested in Han into Alhalabi.  Doing so would be desirable because neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy (see Han abs.).  Neural networks are prone to suffer the vanishing gradient problem [25] as the networks get deeper, which makes pruning errors harder to recover for deep networks (see Han 3.3).  Neural networks have become ubiquitous.  While these large neural networks are very powerful, their size consumes considerable storage, memory bandwidth, and computational resources. For embedded mobile applications, these resource demands become prohibitive.  Our goal in pruning networks is to reduce the energy required to run such large networks so they can run in real time on mobile devices. The model size reduction from pruning also facilitates storage and transmission of mobile applications incorporating DNNs. To achieve this goal, we present a method to prune network connections in a manner that preserves the original accuracy (see Han 1).  
However, Alhalabi in view of Han fails to expressly disclose wherein the compiler is configured to identify a set of semantically-equivalent neurons between the main DNN of the N networks and the peer DNN of the N networks, wherein the compiler is configured to extract, from the main DNN of the N networks, the set of semantically-equivalent neurons that were identified, wherein the compiler is configured to form a common sub-network based on the set of semantically-equivalent neurons that were extracted from the main DNN of the N networks, and wherein the compiler is configured to link the common sub-network to additional DNNs of the N network other than the main DNN and the peer DNN, wherein, when training the additional DNNs, the compiler is configured to refrain from training parameters of the common sub-network.  In the same field of endeavor, Bian teaches:
wherein the compiler is configured to identify a set of semantically-equivalent neurons between the main DNN of the N networks and the peer DNN of the N networks, wherein the compiler is configured to extract, from the main DNN of the N networks, the set of semantically-equivalent neurons that were identified, wherein the compiler is configured to form a common sub-network based on the set of semantically-equivalent neurons that were extracted from the main DNN of the N networks, and wherein the compiler is configured to link the common sub-network to additional DNNs of the N network other than the main DNN and the peer DNN, wherein, when training the additional DNNs, the compiler is configured to refrain from training parameters of the common sub-network (Bian p. 1, To tackle the NAS ensemble pruning problems, we seek diverse sub-ensemble architectures in a smaller size yet still with the comparable performance to the original ensemble architecture without pruning; our proposed method would lead to distinct deeper architectures than the original ensemble; p. 2, We propose a NAS ensemble pruning method to seek sub-ensemble architectures in a smaller size, benefiting from an essential characteristic, i.e., diversity in ensemble learning; AdaNet attempts to train multiple weak subarchitectures with less computation cost to comprise powerful neural architectures inspired by ensemble methods; Given an ensemble architecture f(x) = P 16k6l wk · hk(x) ∈ F searched by ensemble NAS methods such as AdaNet, and a training set S = {(x1, y1), ...,(xm, ym)} where all training instances are assumed to be drawn i.i.d. (independent and identically distributed) (multiple weak subarchitectures trained independently); p. 3, Figure 1: Layers in blue and green indicate the input and output layer, respectively; prunes the less valuable sub-architectures based on certain criteria during the searching process (lines 10–11 in Algorithm 1); p. 4, To consider accuracy and diversity simultaneously, we propose another strategy, named “Pruning by Information Entropy (PIE).”; To reveal the redundancy between two sub-architectures (wi and wj ) in the ensemble architecture, the normalized variation of information (Zadeh et al. 2017), VI(wi, wj ) = 1 − I(wi; wj ) H(wi, wj ) , (12) is used to indicate the diversity between them; 7, an ensemble pruning method named “Sub-Architecture Ensemble Pruning in Neural Architecture Search (SAEP)” to reduce the redundant sub-architectures; 7, ensemble models usually benefit from diverse individual learners; we target the ensemble learning methods in NAS and propose an ensemble pruning method named “Sub-Architecture Ensemble Pruning in Neural Architecture Search (SAEP)” to reduce the redundant sub-architectures during the searching process (as, described, and shown in Fig. 1, similar subarchitecutres are pruned, with the remaining substructures linked). Three solutions are proposed as the guiding criteria in SAEP that reflects the characteristics of the ensemble architecture (i.e., PRS, PAP, and PIE) to prune the less valuable subarchitectures. Experimental  results indicate that SAEP could guide diverse sub-architectures to create sub-ensemble architectures in a smaller size yet still with comparable performance to the ensemble architecture that is not pruned. Besides, PIE might lead to distinct deeper sub-architectures if diversity is not sufficient)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein the compiler is configured to identify a set of semantically-equivalent neurons between the main DNN of the N networks and the peer DNN of the N networks, wherein the compiler is configured to extract, from the main DNN of the N networks, the set of semantically-equivalent neurons that were identified, wherein the compiler is configured to form a common sub-network based on the set of semantically-equivalent neurons that were extracted from the main DNN of the N networks, and wherein the compiler is configured to link the common sub-network to additional DNNs of the N network other than the main DNN and the peer DNN, wherein, when training the additional DNNs, the compiler is configured to refrain from training parameters of the common sub-network as suggested in Bian into Alhalabi in view of Han.  Doing so would be desirable because Neural architecture search (NAS) is gaining more and more attention in recent years due to its flexibility and the remarkable capability of reducing the burden of neural network design.  To achieve better performance, however, the searching process usually costs massive computation, which might not be affordable to researchers and practitioners. While recent attempts have employed ensemble learning methods to mitigate the enormous  computation, an essential characteristic of diversity in ensemble methods is missed out, causing more similar sub-architectures to be gathered and potential redundancy in the final ensemble architecture. To bridge this gap, we propose a pruning method for NAS ensembles, named as “Sub-Architecture Ensemble Pruning in Neural Architecture Search (SAEP).” It targets to utilize diversity and achieve sub-ensemble architectures in a smaller size with comparable performance to the unpruned ensemble architectures (see Bian abs.).  SAEP might lead to distinct deeper and more effective architectures than the original one if the degree of diversity is not sufficient, which could be a bonus by pruning.  Experimental results demonstrate the effectiveness of the proposed method in largely reducing the number of subarchitectures in ensemble architectures and increasing diversity while maintaining the final performance.  Our proposed pruning criteria could also be generalized upon other ensemble methods, which could be interesting for future exploration (see Bian p. 2).

Regarding claim 1, claim 1 contains substantially similar limitations to those found in claim 11, the only difference being utilizing by the compiler (Alhalabi abs. we generate a set of diverse compressed deep learning models using different hyperparameters for a pruning method, after that we utilize ensemble learning to synthesise the outputs of the compressed models to compose a new pool of classifiers; 1, elimination of the impact of compression on deep learning models, by producing compressed models with better predictability measures; 2, most of those pruning techniques depend on the famous L1 and L2 regularisation [5] [6] which take a very long time to cover; 3 B, Net-Trim removes the redundant connections and redirect the processing to a small group of important connections; 3 C, we suggest backward elimination scheme to remove the models with reduced predictability levels and find the optimal combinations that achieve better result; 3 D, Our goal is to have accurate ensembles that have the lowest possible number of models without compromising the predictability of the ensembles; Fig. 1 illustrates EnSyth, it could be summarised as: • train a baseline model; • prune the baseline model with different values of the hyperparameters to formulate the solution’s space; • synthesis deep learning ensembles that belong to the existing space; • apply backward elimination on the composed ensembles to select the best performing ensembles; 4 D, The following is a summary statistics about the size of the 36 pruned models where only the weights and biases are saved as compressed numpy arrays [42]: 1) CIFAR-10: Max:4.2 MB, Min:1.6 MB, Avg: 2.94 MB (baseline: 4.8 MB); 2) CIFAR-5: Max: 4.2 MB, Min: 1.3 MB, Avg: 2.89 MB (baseline: 4.8 MB); 3) MNIST-FASHION: Max:7.7 MB, Min: 0.88 MB, Avg: 2.56 MB (baseline: 7.7MB); see also 4 I).  Consequently, claim 1 is rejected for the same reasons.

Regarding claim 2, Alhalabi in view of Han in further view of Bian teaches all the limitations of claim 1, further comprising:
wherein the plurality of inputs i…I comprises one or a combination of: a tensor flow graph describing the DNN, a training dataset, a validation dataset, an ensemble cardinality and an ensemble combining function (Alhalabi 1, EnSyth works by applying multiple sets of pruning methods by varying the corresponding hyperparameters. This, in turn, leads to a pool of diverse pruned deep learning models. Using such a pool to form deep learning ensembles, will have a powerful potential of boosting up the accuracy because of the awarded diversity from varying different values for pruning method’s hyperparameters. The number of possible ensembles that can be formed is 2 m −1, where m is the number of pruned models; 3, we first introduce the topology of the feed-forward neural network models, then we explain the pruning method [37] that has been used to generate the compressed deep learning models; 3 A, the training of a network is done using xp training example; Suppose X ∈ R N×P is a one dimensional matrix represents the training samples as X = [x1, · · · , xP ], L is a layer in the network; 3 B, Net-Trim is a post-processing pruning framework, this means it prunes a network after a training process; 3 C, Synthesising sets of divers compressed models into ensemble predictions is a critical element in our approach because ensemble learning will not only allow to create better classifiers; Let m a pruned model generated by Net-Trim and the decision of the model mi about a class wj is defined as: yi,j ∈ {0, 1} where: • i = 1, 2, · · · , N : N is the number of the classifiers. • j = 1, 2, · · · , C : C is the number of classes. if mi predicted correctly a class wj then yi,j = 1 otherwise yi,j = 0. We use plurality voting [38] as a simple ensemble learning technique to synthesise the classifiers; 4 C, Our experiment has been conducted on: OS: Ubuntu Desktop 18.04.1 LTS, CPU: Intel KVM 64bit 2,400 GHz, CASH: 16384 KB, RAM:32 GB, Python:3.6.7, TensorFlow:1.10.0, Keras:2.2.4)

Regarding claim 12, claim 12 contains substantially similar limitations to those found in claim 2.  Consequently, claim 12 is rejected for the same reasons.

Regarding claim 3, Alhalabi in view of Han in further view of Bian teaches all the limitations of claim 2, further comprising:
wherein the analyzing, identifying and removing of the inter-network neuron redundancy comprises: using a portion of the plurality of inputs i…I to train at least the main DNN and the peer DNN, wherein the compiler carrying out a first step of eliminating redundancy from the main DNN and the peer DNN (Alhalabi 1, elimination of the impact of compression on deep learning models, by producing compressed models with better predictability measures; 2, most of those pruning techniques depend on the famous L1 and L2 regularisation [5] [6] which take a very long time to cover; 3 B, Net-Trim removes the redundant connections and redirect the processing to a small group of important connections; 3 C we suggest backward elimination scheme to remove the models with reduced predictability levels and find the optimal combinations that achieve better result; 3 D, Our goal is to have accurate ensembles that have the lowest possible number of models without compromising the predictability of the ensembles; Fig. 1 illustrates EnSyth, it could be summarised as: • train a baseline model; • prune the baseline model with different values of the hyperparameters to formulate the solution’s space; • synthesis deep learning ensembles that belong to the existing space; • apply backward elimination on the composed ensembles to select the best performing ensembles)
However, Alhalabi fails to expressly disclose through a dead neuron analysis (DNA) and a dead neuron elimination (DNE).  In the same field of endeavor, Han teaches:
through a dead neuron analysis (DNA) and a dead neuron elimination (DNE) (Han 1, The model size reduction from pruning also facilitates storage and transmission of mobile applications incorporating DNNs; 3, Our pruning method employs a three-step process, as illustrated in Figure 2, which begins by learning the connectivity via normal network training; 3.5, After pruning connections, neurons with zero input connections or zero output connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron. The retraining phase automatically arrives at the result where dead neurons will have both zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have no contribution to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons will be automatically removed during retraining)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated through a dead neuron analysis (DNA) and a dead neuron elimination (DNE) as suggested in Han into Alhalabi in view of Bian.  Doing so would be desirable because neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy (see Han abs.).  Neural networks are prone to suffer the vanishing gradient problem [25] as the networks get deeper, which makes pruning errors harder to recover for deep networks (see Han 3.3).  Neural networks have become ubiquitous.  While these large neural networks are very powerful, their size consumes considerable storage, memory bandwidth, and computational resources. For embedded mobile applications, these resource demands become prohibitive.  Our goal in pruning networks is to reduce the energy required to run such large networks so they can run in real time on mobile devices. The model size reduction from pruning also facilitates storage and transmission of mobile applications incorporating DNNs. To achieve this goal, we present a method to prune network connections in a manner that preserves the original accuracy (see Han 1).  

Regarding claim 13, claim 13 contains substantially similar limitations to those found in claim 3.  Consequently, claim 13 is rejected for the same reasons.

Claims 4-10 and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over in view of Han in further view of Bian in further view of Goldsborough (Goldsborough, Peter. "A tour of tensorflow." arXiv preprint arXiv:1610.01178 (2016)).

Regarding claim 4, Alhalabi in view of Han in further view of Bian teaches all the limitations of claim 3, further comprising:
wherein the portion of the plurality of inputs i…I comprises: tensor flow, the training dataset and the validation dataset as inputs by the compiler to carry out the first step of eliminating redundancy to train the DNN to logically partition the N networks into at least the main DNN and the peer DNN, wherein redundant neurons do not contribute to any of unique outputs Ui... UI have been eliminated in both the peer DNN and in the main DNN (Alhalabi 1, elimination of the impact of compression on deep learning models, by producing compressed models with better predictability measures; 2, most of those pruning techniques depend on the famous L1 and L2 regularisation [5] [6] which take a very long time to cover; 3 A, the training of a network is done using xp training example; Suppose X ∈ R N×P is a one dimensional matrix represents the training samples as X = [x1, · · · , xP ], L is a layer in the network; 3 B, Net-Trim is a post-processing pruning framework, this means it prunes a network after a training process; 3 B, Net-Trim removes the redundant connections and redirect the processing to a small group of important connections; 3 C we suggest backward elimination scheme to remove the models with reduced predictability levels and find the optimal combinations that achieve better result; 3 D, Our goal is to have accurate ensembles that have the lowest possible number of models without compromising the predictability of the ensembles; Fig. 1 illustrates EnSyth, it could be summarised as: • train a baseline model; • prune the baseline model with different values of the hyperparameters to formulate the solution’s space; • synthesis deep learning ensembles that belong to the existing space; • apply backward elimination on the composed ensembles to select the best performing ensembles; 4 C, TensorFlow; 4 E, Ideally, the elimination process should be performed on the validation set, rather than the testing set; The next stage of this work is going to utilise multi-objective optimisation methods on the validation sets to find even more accurate and smaller ensembles)
Han further teaches:
through the dead neuron analysis (DNA) and the dead neuron elimination (DNE), wherein dead neurons that do not contribute to any of unique outputs Ui... UI have been eliminated through the DNA and the DNE (Han 1, The model size reduction from pruning also facilitates storage and transmission of mobile applications incorporating DNNs; 3, Our pruning method employs a three-step process, as illustrated in Figure 2, which begins by learning the connectivity via normal network training; 3.5, After pruning connections, neurons with zero input connections or zero output connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron. The retraining phase automatically arrives at the result where dead neurons will have both zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have no contribution to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons will be automatically removed during retraining)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated through the dead neuron analysis (DNA) and the dead neuron elimination (DNE), wherein dead neurons that do not contribute to any of unique outputs Ui... UI have been eliminated through the DNA and the DNE as suggested in Han into Alhalabi.  Doing so would be desirable because neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy (see Han abs.).  Neural networks are prone to suffer the vanishing gradient problem [25] as the networks get deeper, which makes pruning errors harder to recover for deep networks (see Han 3.3).  Neural networks have become ubiquitous.  While these large neural networks are very powerful, their size consumes considerable storage, memory bandwidth, and computational resources. For embedded mobile applications, these resource demands become prohibitive.  Our goal in pruning networks is to reduce the energy required to run such large networks so they can run in real time on mobile devices. The model size reduction from pruning also facilitates storage and transmission of mobile applications incorporating DNNs. To achieve this goal, we present a method to prune network connections in a manner that preserves the original accuracy (see Han 1).  
	However, in view of Han in further view of Bian fails to expressly disclose the tensor flow graph.  In the same field of endeavor, Goldsborough teaches: 
the tensor flow graph (Goldsborough abs. Deep learning is a branch of artificial intelligence employing deep neural network architectures that has significantly advanced the state-of-the-art in computer vision, speech recognition, natural language processing and other domains. In November 2015, Google released TensorFlow, an open source deep learning software library for defining, training and deploying machine learning models; 3, machine learning algorithms may be expressed in its dataflow graph language; TensorFlow graphs are assigned to available hardware units in a local as well as distributed environment; 3 A, In TensorFlow, machine learning algorithms are represented as computational graphs; Fig. 6: An example of how common subgraph elimination; 5, The core feature of TensorBoard is the lucid visualization of computational graphs)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated the tensor flow graph as suggested in Goldsborough into in view of Han in further view of Bian.  Doing so would be desirable because the major benefit of representing an algorithm in form of a graph is not only the intuitive (visual) expression of dependencies between units of a computational model, but also the fact that the definition of a node within the graph can be kept very general (see Goldsborough 3 A).  Deep learning models often employ neural networks with a highly complex and intricate structure. For example, [19] reports of deep convolutional network based on the Google Inception model with more than 36,000 individual units, while [8] states that certain long short-term memory (LSTM) architectures can span over 15,000 nodes. To maintain a clear overview of such complex networks, facilitate model debugging and allow inspection of values on various levels of detail, powerful visualization tools are required. TensorBoard, a web interface for graph visualization and manipulation built directly into TensorFlow, is an example for such a tool (see Goldsborough 5).

Regarding claim 14, claim 14 contains substantially similar limitations to those found in claim 4.  Consequently, claim 14 is rejected for the same reasons.

Regarding claim 5, Alhalabi in view of Han in further view of Bian in further view of Goldsborough teaches all the limitations of claim 4, further comprising:
wherein the main DNN is subjected to neuron dependence analysis (NDA) to identify connected neurons that are caused to fire together according to different neuron activation functions provided to a fraction of the plurality of inputs i…I, wherein the connected neurons that fire together are linked together by an edge in a neuron dependence graph (NDG) (Alhalabi 1, elimination of the impact of compression on deep learning models, by producing compressed models with better predictability measures; 2, most of those pruning techniques depend on the famous L1 and L2 regularisation [5] [6] which take a very long time to cover; 3 A, the training of a network is done using xp training example; Suppose X ∈ R N×P is a one dimensional matrix represents the training samples as X = [x1, · · · , xP ], L is a layer in the network; 3 B, Net-Trim is a post-processing pruning framework, this means it prunes a network after a training process; 3 B, Net-Trim removes the redundant connections and redirect the processing to a small group of important connections; 3 C we suggest backward elimination scheme to remove the models with reduced predictability levels and find the optimal combinations that achieve better result; 3 D, Our goal is to have accurate ensembles that have the lowest possible number of models without compromising the predictability of the ensembles; Fig. 1 illustrates EnSyth, it could be summarised as: • train a baseline model; • prune the baseline model with different values of the hyperparameters to formulate the solution’s space; • synthesis deep learning ensembles that belong to the existing space; • apply backward elimination on the composed ensembles to select the best performing ensembles; 4 C, TensorFlow; 4 E, Ideally, the elimination process should be performed on the validation set, rather than the testing set; The next stage of this work is going to utilise multi-objective optimisation methods on the validation sets to find even more accurate and smaller ensembles)
Han further teaches:
wherein the main DNN that is free from the dead neurons (Han 1, The model size reduction from pruning also facilitates storage and transmission of mobile applications incorporating DNNs; 3, Our pruning method employs a three-step process, as illustrated in Figure 2, which begins by learning the connectivity via normal network training; 3.5, After pruning connections, neurons with zero input connections or zero output connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron. The retraining phase automatically arrives at the result where dead neurons will have both zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have no contribution to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons will be automatically removed during retraining)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein the main DNN that is free from the dead neurons as suggested in Han into Alhalabi in view of Bian in further view of Goldsborough.  Doing so would be desirable because neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy (see Han abs.).  Neural networks are prone to suffer the vanishing gradient problem [25] as the networks get deeper, which makes pruning errors harder to recover for deep networks (see Han 3.3).  Neural networks have become ubiquitous.  While these large neural networks are very powerful, their size consumes considerable storage, memory bandwidth, and computational resources. For embedded mobile applications, these resource demands become prohibitive.  Our goal in pruning networks is to reduce the energy required to run such large networks so they can run in real time on mobile devices. The model size reduction from pruning also facilitates storage and transmission of mobile applications incorporating DNNs. To achieve this goal, we present a method to prune network connections in a manner that preserves the original accuracy (see Han 1).  

Regarding claim 15, claim 15 contains substantially similar limitations to those found in claim 5.  Consequently, claim 15 is rejected for the same reasons.

Regarding claim 6, Alhalabi in view of Han in further view of Bian in further view of Goldsborough teaches all the limitations of claim 5, further comprising:
wherein the first step of eliminating the redundancy further comprising performing (Alhalabi 1, elimination of the impact of compression on deep learning models, by producing compressed models with better predictability measures; 2, most of those pruning techniques depend on the famous L1 and L2 regularisation [5] [6] which take a very long time to cover; 3 A, the training of a network is done using xp training example; Suppose X ∈ R N×P is a one dimensional matrix represents the training samples as X = [x1, · · · , xP ], L is a layer in the network; 3 B, Net-Trim is a post-processing pruning framework, this means it prunes a network after a training process; 3 B, Net-Trim removes the redundant connections and redirect the processing to a small group of important connections; 3 C we suggest backw
Read full office action
Prosecution Timeline

Mar 07, 2022
Application Filed
Jul 18, 2025
Non-Final Rejection — §103, §112
Oct 28, 2025
Response Filed
Nov 14, 2025
Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/531,132
Patent 12585984
POINT-OF-INTEREST RECOMMENDATION
2y 5m to grant Granted Mar 24, 2026
17/668,200
Patent 12585929
Layered Gradient Accumulation and Modular Pipeline Parallelism for Improved Training of Machine Learning Models
2y 5m to grant Granted Mar 24, 2026
17/345,612
Patent 12581159
METHOD AND APPARATUS FOR OPTIMIZING VIDEO PLAYBACK START, DEVICE AND STORAGE MEDIUM
2y 5m to grant Granted Mar 17, 2026
18/322,775
Patent 12541282
LEARNING USER INTERFACE
2y 5m to grant Granted Feb 03, 2026
17/660,593
Patent 12530106
STACKED MEDIA ELEMENTS WITH SELECTIVE PARALLAX EFFECTS
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
75%
Grant Probability
99%
With Interview (+29.1%)
4y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 334 resolved cases by this examiner. Grant probability derived from career allow rate.