Prosecution Insights
Last updated: May 29, 2026
Application No. 17/148,619

METHOD AND APPARATUS WITH NEURAL NETWORK DATA PROCESSING

Final Rejection §101§103§112
Filed
Jan 14, 2021
Priority
May 22, 2020 — continuation of 63/028,680 +1 more
Examiner
KARTHOLY, REJI P
Art Unit
2143
Tech Center
2100 — Computer Architecture & Software
Assignee
Snu R&Db Foundation
OA Round
4 (Final)
65%
Grant Probability
Moderate
5-6
OA Rounds
0m
Est. Remaining
99%
With Interview

Examiner Intelligence

Grants 65% of resolved cases
65%
Career Allowance Rate
99 granted / 153 resolved
+9.7% vs TC avg
Strong +72% interview lift
Without
With
+72.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
15 currently pending
Career history
172
Total Applications
across all art units

Statute-Specific Performance

§101
2.4%
-37.6% vs TC avg
§103
94.9%
+54.9% vs TC avg
§102
1.9%
-38.1% vs TC avg
§112
0.5%
-39.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 153 resolved cases

Office Action

§101 §103 §112
DETAILED ACTION This Office Action is in response to Applicant's Response filed on 07/21/2025 for the above identified application. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Amendment The amendment filed on 07/21/2025 has been entered. Claims 1, 17, and 21 have been amended. Claims 1-21 are pending in the application. Claim Objection Claim 21 is objected to because of the following informalities: Claim 21 recites the terms “the channels” and “the batch normalization parameters”, which have no antecedent basis. Appropriate correction is required. Claim Rejections - 35 USC § 112 The following is a quotation of the first paragraph of 35 U.S.C. 112(a): (a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention. The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112: The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention. Claims 1-21 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA the inventor(s), at the time the application was filed, had possession of the claimed invention. Independent Claims 1 and 17 recite “performing an inference operation including object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network; and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition” independent Claim 21 recites “perform the inference operation including object verification or recognition using a result of performing a calculation based on the input data using the determined channel; and authenticate the object associated with the input data based on a result of performing the inference operation including the object verification or recognition”. The background/ related art section of the specification (see [0003]) describes that "Technology may perform user authentication.. may be based on neural network.. neural network may be used to output recognition result.." (underlining added for emphasis). User authentication is briefly mentioned in the background section as noted above and nowhere else in the specification. That is not the same as "performing an inference operation including object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network; and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition." Performing an inference operation using a result of a calculation using the determined portion of channels and authenticating an object associated with input data based on the performed inference operation is not disclosed in the specification. At best, the specification suggests that performing object verification or recognition in relation to the input data. Therefore, the language - performing an inference operation including object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network; and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition - constitutes new matter. (See also 37 C.F.R. 1.121(f), MPEP 608.04, 706.03(o)). For the purposes of examination, the Examiner will interpret this limitation as: performing object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network. Claims 2-16 and 18-20 are also rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as being dependent on parent claims failing to comply with the written description requirement. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Step 1 Claims 1-15 are directed to a method and claims 17-21 are directed to an apparatus/ device. Thus, the claims fall within one of the statutory categories (process and machine) and are eligible under Step 1. Step 2A Prong 1 Independent Claims Claims 1 and 17 recite: determining a portion of channels to be used for calculation among channels of a neural network based on importance values respectively corresponding to the channels of the neural network, wherein the importance values are determined based on cumulative distribution functions (CDFs) of the channels determined, using batch normalization parameters; performing normalization on the channels by determining a mean and a standard deviation of the channels and performing a transformation on the normalization using the batch normalization parameters; performing an inference operation including object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network; and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition, wherein the determining the portion of channels comprises: setting a predetermined threshold according to a required lightweight degree; and determining a channel to be deactivated by comparing an importance value of each channel to the threshold- these limitations encompass mental processes and mathematical concepts, such as a user drawing a tree of a neural network, selecting a subset of labeled channels of the network to be used for making evaluation and/or judgement based on some priority value using mathematical calculation and/ or relationship, perform normalization using mathematical calculation and/ or relationship, making evaluation and/or judgement related to the input data (i.e., verification or recognition) using the results of calculations performed based on the decided upon channels, making evaluation and/or judgement to select a threshold according to a required proportion/ lightweight degree, and deciding to not use a labeled channel by comparing the priority value to the threshold. Thus, the claims recite an abstract idea that falls under the “Mental Processes” and “Mathematical Concepts” grouping. Claim 21 recites: determine a channel included in a neural network to be a channel to be used for performing an inference operation based on an important value of the channel determined using a cumulative distribution function (CDF) of the channel learned using a mask having continuous values by applying a logistic function to the CDF; perform normalization on the channels by determining a mean and a standard deviation of the channels and perform a transformation on the normalization using the batch normalization parameters; perform the inference operation including object verification or recognition using a result of performing a calculation based on the input data using the determined channel; and authenticate the object associated with the input data based on a result of performing the inference operation including the object verification or recognition; set a predetermined threshold according to a required lightweight degree; and determine a channel to be deactivated by comparing an importance value of each channel to the threshold - these limitations encompass mental processes and mathematical concepts, such as a user drawing a tree of a neural network, selecting a subset of labeled channels of the network to be used for making evaluation and/or judgement based on some priority value using mathematical calculation and/ or relationship, perform normalization using mathematical calculation and/ or relationship, making evaluation and/or judgement related to the input data (i.e., verification or recognition) using the results of calculations performed based on the decided upon channels, making evaluation and/or judgement to select a threshold according to a required proportion/ lightweight degree, and deciding to not use a labeled channel by comparing the priority value to the threshold. Thus, the claim recites an abstract idea that falls under the “Mental Processes” and “Mathematical Concepts” grouping. Step 2A Prong 2 Independent Claims Additional elements Claims 1 and 17: receiving input data including an object – this limitation amount to insignificant extra-solution activity of mere data gathering (see MPEP § 2106.05(g)). processor-implemented neural network data processing - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components to apply the judicial exception (see MPEP § 2106.05(f)). Claim 17: a neural network data processing apparatus, the apparatus comprising: a processor - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components to apply the judicial exception (see MPEP § 2106.05(f)). Claim 21: receive input data including an object - this limitation amount to insignificant extra-solution activity of mere data gathering (see MPEP § 2106.05(g)). a neural network data processing electronic device, the electronic device comprising: a processor - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components to apply the judicial exception (see MPEP § 2106.05(f)). Accordingly, these additional elements do not integrate the judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims are directed to the abstract idea. Step 2B Independent Claims Additional elements Claims 1 and 17: receiving input data including an object – this limitation amounts to insignificant extra-solution activity of mere data gathering, which is well-understood, routine, and conventional activity (see MPEP § 2106.05(d), “receiving/ transmitting data”). processor-implemented neural network data processing - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components to apply the judicial exception (see MPEP § 2106.05(f)). Claim 17: a neural network data processing apparatus, the apparatus comprising: a processor - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components to apply the judicial exception (see MPEP § 2106.05(f)). Claim 21: receive input data including an object - this limitation amounts to insignificant extra-solution activity of mere data gathering, which is well-understood, routine, and conventional activity (see MPEP § 2106.05(d), “receiving/ transmitting data”). a neural network data processing electronic device, the electronic device comprising: a processor - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components to apply the judicial exception (see MPEP § 2106.05(f)). Accordingly, these additional elements do not amount to significantly more than the judicial exception. As such, the claims are patent ineligible. Step 2A Prong 1 Dependent Claims Claims 2 and 18: the number of channels in the determined portion varies based on a lightweight degree of the neural network - this limitation encompasses mathematical concepts and/ or mental process. Claim 3: the lightweight degree is a proportion of the channels of the neural network to be used for calculation, and the lightweight degree is determined based on any one or any combination of a memory usage, a processing speed, and a processing time of an apparatus - this limitation encompasses mathematical concepts. Claim 4: the determining comprises determining a portion of channels satisfying the lightweight degree based on an order of the importance values of the channels of the neural network - this limitation encompasses mathematical concepts and/ or mental process. Claim 5: the order of the importance values is an order from greatest to least among the importance values - this limitation encompasses mental process. Claim 6: the determining comprises determining a current channel included in the neural network to be a channel to be used for the calculation, in response to an importance value of the current channel being greater than a threshold - this limitation encompasses mathematical concepts and/ or mental process. Claim 7: the determining comprises determining to deactivate the current channel such that the channel is not used for the calculation, in response to the importance value of the current channel being less than or equal to the threshold - this limitation encompasses mathematical concepts and/ or mental process. Claim 8: the threshold is determined based on a lightweight degree of the neural network - this limitation encompasses mathematical concepts and/ or mental process. Claim 9: the importance value of the current channel is a probability value corresponding to a degree of influence on the calculation for the input data in response to the current channel being deactivated - this limitation encompasses mathematical concepts and/ or mental process. Claim 10 and 20: the importance values respectively corresponding to the channels are determined based on cumulative distribution functions (CDFs) of the channels determined by a process of training the neural network - this limitation encompasses mathematical concepts and/ or mental process. Claim 11: the determining of the portion of channels comprises: determining a binary mask based on the CDFs and a threshold; and determining the portion of channels to be used for the calculation based on the determined binary mask - this limitation encompasses mathematical concepts and/ or mental process. Claim 12: parameters of the CDFs are learned using a mask having continuous values in the form of a logistic function, in a process of training the neural network - this limitation encompasses mathematical concepts and/ or mental process. Claim 13: in the process of training, a differentiable soft mask is determined using a Gumbel-softmax function, and backward propagation training is performed based on the soft mask - this limitation encompasses mathematical concepts and/ or mental process. Claim 15: determining comprises determining channels to be used for calculation for each of hidden layers of the neural network - this limitation encompasses mental process. Claim 19: for the determining determine a current channel included in the neural network to be a channel to be used for the calculation, in response to an importance value of the current channel being greater than a threshold, and the threshold is determined based on a lightweight degree required for the neural network - this limitation encompasses mathematical concepts and/ or mental process. Thus, the claims recite the abstract idea. Step 2A Prong 2 Dependent Claims Additional elements Claim 14: the neural network is a convolutional neural network, and hidden layers of the convolutional neural network include a convolutional layer, a batch normalization layer, and a rectified linear unit (ReLU) layer - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components and using generic class of algorithms to apply the judicial exception (see MPEP § 2106.05(f)). This limitation can also be viewed as generally linking the judicial exception to the technological environment of neural networks (see MPEP § 2106.05(h)). Claim 16: non-transitory computer-readable storage medium storing instructions that, when executed by a processor, configure the processor to perform the method - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components and using generic class of algorithms to apply the judicial exception (see MPEP § 2106.05(f)). Accordingly, these additional elements do not integrate the judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims are directed to the abstract idea. Step 2B Dependent Claims Additional elements Claim 14: the neural network is a convolutional neural network, and hidden layers of the convolutional neural network include a convolutional layer, a batch normalization layer, and a rectified linear unit (ReLU) layer - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components and using generic class of algorithms to apply the judicial exception (see MPEP § 2106.05(f)). This limitation can also be viewed as generally linking the judicial exception to the technological environment of neural networks (see MPEP § 2106.05(h)). Claim 16: non-transitory computer-readable storage medium storing instructions that, when executed by a processor, configure the processor to perform the method - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components and using generic class of algorithms to apply the judicial exception (see MPEP § 2106.05(f)). Accordingly, these additional elements do not amount to significantly more than the judicial exception. As such, the claims are patent ineligible. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-11 and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Xu et al. (US 2019/0362235 A1 hereinafter Xu) in view of Teig et al. (US 11,900,238 B1 hereinafter Teig), further in view of Lin et al. (US 2020/0234128 A1 hereinafter Lin). Regarding Claim 1, Xu teaches a processor-implemented neural network data processing method ([0024] FIG. 1 illustrating a system that includes various devices (e.g., 120, 125, 130, 135) capable of utilizing machine learning models in the course of their operation; devices utilize neural network models in connection with detecting persons, animals, or objects within their respective environments and/or conditions, characteristics, and events within these environments based on sensor data generated at the devices; [0027] system for use in performing preprocessing on existing neural network models to adapt and prepare the models for distribution to and use by resource-constrained devices; pre-processing system implement a network pruner tool, implemented in hardware- and/or software-based logic on the preprocessing system; the preprocessing system include processors), the method comprising: receiving input data including an object ([0024] devices utilize neural network models in connection with detecting persons, animals, or objects within their respective environments; devices include vehicles, drones, robots, and other devices, which possess autonomous navigation capabilities, allowing the devices to detect attributes and conditions within physical space, plan paths within the environment, avoid collisions, and interact with things within the environment utilizing one or more sensors; the data generated from these sensors may be provided as an input to a machine learning model, such as a neural network model (e.g., convolutional neural network (CNN), deep neural network (DNN), spiking neural network (SNN), etc.); [0029] a preprocessing system provide the network pruner tool as a service to resource constrained system 125; a query or request submitted to the preprocessing system identifying a particular neural network model and requesting that the model be pruned); determining a portion of channels to be used for calculation among channels of a neural network based on importance values respectively corresponding to the channels of the neural network ([0024] devices, such as, 120, 125, 130, 135, utilize neural network models in connection with detecting persons, animals, or objects within their respective environments and/or conditions, characteristics, and events within these environments based on sensor data generated at the devices; [0028] network pruner tool provide functionality to perform both coarse-grained neural network pruning to prune channels, kernels, or nodes from the neural network model as well as more fine-grained neural network pruning to prune individual weights from the model; [0030] coarse-grained pruning logic block of network pruner tool identify the relative importance of various channels, kernels, and/or nodes of a neural network (i.e., importance values respectively corresponding to the channels of the neural network ) and iteratively prune the model to first remove those portions of the neural network determined to be less important; provide test data as an input to the pruned neural network and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network - thus, determining a portion of channels/ unpruned channels to be used to perform test/ accuracy calculation); performing an inference operation including object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network; and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition (based on the 112(a) discussion above, this limitation is interpreted as: performing object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network. [0024] devices, such as, 120, 125, 130, 135, utilize neural network models in connection with detecting persons, animals, or objects within their respective environments and/or conditions, characteristics, and events within these environments based on sensor data generated at the devices (i.e., inference operation including object verification or recognition); the data generated from these sensors provided as an input to a machine learning model, such as a neural network model, from which one or more outputs may be generated that cause actuators of the device (e.g., 125, 130, 135) to autonomously direct movement of the device within the environment; [0030] provide test data as an input to the pruned neural network (i.e., using the determined portion of channels/ unpruned channels of the neural network) and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network based on one or more outputs generated from the test data input to the pruned neural network (i.e., performing calculation); [0052] when the fine-tuning is completed, the pruned (or thinned or sparse) neural network model is ready for use and deployment on resource-constrained computing systems - thus, the deployed pruned network model is used to perform inference operation including object verification or recognition based on the output generated/ calculations performed based on the input), wherein the determining the portion of channels comprises: setting a predetermined threshold according to a required lightweight degree ([0045]-[0047] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning (i.e., lightweight degree of neural network); in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights - thus, the channels are ranked based on their importance and those with importance less than 30% (i.e., threshold) of the ranked channels are selected for pruning); and determining a channel to be deactivated by comparing an importance value of each channel to the threshold ([0045]-[0047] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning (i.e., lightweight degree of neural network); in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights - thus, the channels are ranked based on their importance and those with importance less than 30% (i.e., threshold according to lightweight degree/ percentage of channels) of the ranked channels, are pruned (i.e., determining channel to be deactivated)). However, Xu fails to expressly teach wherein the importance values are determined based on cumulative distribution functions (CDFs) of the channels determined using batch normalization parameters. In the same filed of endeavor, Teig teaches wherein the importance values are determined based on cumulative distribution functions (CDFs) of the channels determined using batch normalization parameters (column 20, line 65 to column 21, line 10 and lines 21 to 24 - VIB moves layer-by-layer to identify portions of the network (e.g., nodes, edges, or even entire filters) that are not passing important information; VIB introduces probabilistic (e.g., Gaussian) noise into the output values of a set of computation nodes of the network (e.g., the nodes of one or more layers of the network); the outputs of such nodes (which are passed to nodes in the next layer) are made to vary probabilistically around the actual computed output value during training; this noise enables the training system to identify nodes that are less important to the eventual output of the network (e.g., the classification decision, etc.) and remove these nodes; each layer of the network is treated as a bottleneck for the purpose of identifying the nodes, edges, and/or filters that can be removed from the network; column 21, lines 45 to 52 - the loss function represents the loss for a single layer, with the subscript c representing each channel output by the layer (e.g., the outputs for each filter of the layer); the complete loss function, then is a sum over all of the layers, with a different y and σc for each layer; the σc represents the noise variance for the channel, while the coefficient y is a multiplicative variable that can be changed per layer (i.e., loss function estimates information transmitted by each layer in order to determine whether to remove/ keep the nodes/ channel); column 22, lines 33 to 34 and lines 62 to 65 - a channel can be removed once its noise variance (σc) exceeds a threshold (e.g., 1); various sigmoid functions may be used for the loss function: these include the logistic function, an algebraic sigmoid function, and a Cauchy cumulative distribution function (CDF) - thus, determining to remove/ keep channels based on the loss function/ CDFs of the channels; column 7, lines 10 to 25 - before a multi-layer network can be used to solve a particular problem (e.g., image classification, face recognition, etc.), the network is put through a supervised training process that adjusts the network's configurable parameters; the training process uses different input value sets with known output value sets; for each selected input value set (i.e., batch), the training process typically (1) forward propagates the input value set through the network's nodes to produce a computed output value set and then (2) backpropagates a gradient (rate of change) of a loss function (output error) that quantifies in a particular way the difference between the input set's known output value set and the input set's computed output value set, in order to adjust the network's configurable parameters (e.g., the weight values); column 4, lines 44 to 49 - the neural networks are convolutional feed-forward neural networks. In this case, the intermediate layers (referred to as “hidden” layers) may include convolutional layers, pooling layers, fully-connected layers, and normalization layers - thus, in feed forward neural networks with normalization layer, the training process propagates batch of inputs through network's nodes and the loss function/ CDF is determined using normalization parameters for the batch of inputs). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein the importance values are determined based on cumulative distribution functions (CDFs) of the channels determined using batch normalization parameters, as taught by Teig into Xu. Doing so would be desirable because it would greatly reduce the resources required to achieve optimal network structure (Teig, column 34, lines 53 to 56) and removing the subset of computation nodes from the trained network increases sparsity of the network, thereby reducing memory and power resources required to execute the trained network (Teig, claim 6). However, Xu and Teig fail to expressly teach wherein performing normalization on the channels by determining a mean and a standard deviation of the channels and performing a transformation on the normalization using the batch normalization parameters. In the same filed of endeavor, Lin teaches wherein performing normalization on the channels by determining a mean and a standard deviation of the channels and performing a transformation on the normalization using the batch normalization parameters ([0033] the utilization monitoring module 202 may detect dead neurons using the scale parameter γ from the following equations (which a batch normalization layer uses to normalize outputs of a convolutional layer before the batch normalization layer): z^=zin-μβσβ2+ϵ;zout=γz^+β (i.e., performing normalization on the channel), where: zin is an input to the batch normalization layer; zout is an output of the batch normalization layer; ββ and σβ are the respective mean and standard deviation values of input activations over β (i.e., mean and standard deviation of the channel); and γ and β are trainable affine transformation parameters (scale and shift) (i.e., transformation using the batch normalization parameters)). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein performing normalization on the channels by determining a mean and a standard deviation of the channels and performing a transformation on the normalization using the batch normalization parameters, as taught by Lin into Xu and Teig. Doing so would be desirable because it would help identify neurons of the network which have little or no impact on the network's ability to reliably accomplish a task (Lin [0020]), thereby improve the performance of the neural network in real-time as resource needs are identified (Lin [0032]). As to dependent Claim 2, Xu, Teig, and Lin teach all the limitations of Claim 1. Xu further teaches wherein the number of channels in the determined portion varies based on a lightweight degree of the neural network ([0030] coarse-grained pruning logic block of network pruner tool identify the relative importance of various channels, kernels, and/or nodes of a neural network and iteratively prune the model to first remove those portions of the neural network determined to be less important; importance reflects the neural network's sensitivity to the removal of these portions affecting the pruned neural network's accuracy; [0046]-[0047] initial prune defined such that a particular starting percentage of channels is identified for pruning (i.e., lightweight degree of neural network); in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights; if it is determined during the test that the initial prune of a layer allows the neural network to still retain sufficient accuracy, the initial pruning percentage may be increased (e.g., incremented by 5%, 10%, etc.) by the pruner tool (i.e., the number of channels in the determined portion varies based on lightweight degree of the neural network )). As to dependent Claim 3, Xu, Teig, and Lin teach all the limitations of Claim 2. Xu further teaches wherein the lightweight degree is a proportion of the channels of the neural network to be used for calculation ([0046]-[0047] initial prune defined such that a particular starting percentage of channels is identified for pruning; in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights (i.e., the remaining 70% of channels/ proportion of unpruned channels used for calculation); if it is determined during the test that the initial prune of a layer allows the neural network to still retain sufficient accuracy, the initial pruning percentage may be increased (e.g., incremented by 5%, 10%, etc.) by the pruner tool) , and the lightweight degree is determined based on any one or any combination of a memory usage, a processing speed, and a processing time of an apparatus ([0029] preprocessing system provide the network pruner tool as a service to resource constrained system; a query or request submitted to the preprocessing system identifying a particular neural network model and requesting that the model be pruned; [0038] pruned, thinned, or sparse, neural network model is dramatically smaller in size, making the model well-suited for use by and implementation on a resource-constrained device possessing significantly lower memory and processing power - thus, the lightweight degree is determined based on any one or any combination of memory usage, processing speed, and a processing time of apparatus/ resource-constrained device). As to dependent Claim 4, Xu, Teig, and Lin teach all the limitations of Claim 2. Xu further teaches wherein the determining comprises determining a portion of channels satisfying the lightweight degree based on an order of the importance values of the channels of the neural network ([0045]-[0046] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel (i.e., order of the importance values of the channels); initial prune defined such that a particular starting percentage of channels is identified for pruning; in an initial prune, 30% of the lowest ranked channels may be selected for pruning). As to dependent Claim 5, Xu, Teig, and Lin teach all the limitations of Claim 4. Xu further teaches wherein the order of the importance values is an order from greatest to least among the importance values ([0045]-[0046] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning; in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights (i.e., order of the importance values is an order from greatest to least)). As to dependent Claim 6, Xu, Teig, and Lin teach all the limitations of Claim 1. Xu further teaches wherein the determining comprises determining a current channel included in the neural network to be a channel to be used for the calculation, in response to an importance value of the current channel being greater than a threshold ([0045]-[0047] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning; in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights - thus, the channels are ranked based on their importance and those with importance higher than 30% (i.e., threshold) of the ranked channels, are kept unpruned; [0030] provide test data as an input to the pruned neural network and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network (i.e., using the unpruned channels for test/accuracy calculation). As to dependent Claim 7, Xu, Teig, and Lin teach all the limitations of Claim 6. Xu further teaches wherein the determining comprises determining to deactivate the current channel such that the channel is not used for the calculation, in response to the importance value of the current channel being less than or equal to the threshold ([0045]-[0047] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning; in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights - thus, the channels are ranked based on their importance and those with importance less than 30% (i.e., threshold) of the ranked channels are selected for pruning; [0030] provide test data as an input to the pruned neural network and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network - thus, the pruned neural network do not include channels with importance less than threshold (i.e., those channels deactivated) and not used for test/accuracy calculation). As to dependent Claim 8, Xu, Teig, and Lin teach all the limitations of Claim 6. Xu further teaches wherein the threshold is determined based on a lightweight degree of the neural network ([0046]-[0047] initial prune defined such that a particular starting percentage of channels is identified for pruning (i.e., lightweight degree of neural network); in an initial prune, 30% of the lowest ranked channels may be selected for pruning; if it is determined during the test that the initial prune of a layer allows the neural network to still retain sufficient accuracy, the initial pruning percentage may be increased (e.g., incremented by 5%, 10%, etc.) by the pruner tool - thus, the threshold is based on lightweight degree of the neural network). As to dependent Claim 9, Xu, Teig, and Lin teach all the limitations of Claim 6. Xu further teaches wherein the importance value of the current channel is a probability value corresponding to a degree of influence on the calculation for the input data in response to the current channel being deactivated ([0030] identify the relative importance of various channels, kernels, and/or nodes of a neural network and iteratively prune the model to first remove those portions of the neural network determined to be less important; importance reflects the neural network's sensitivity to the removal of these portions affecting the pruned neural network's accuracy; after each pruning iteration, the pruned neural network may be tested for accuracy to determine whether additional portions may be pruned while keeping the accuracy of the model within an acceptable threshold or range of values - thus, the importance value corresponds to the degree of influence on the network's accuracy in response to the channel being pruned/ deactivated). As to dependent Claim 10, Xu, Teig, and Lin teach all the limitations of Claim 1. Teig further teaches wherein the importance values respectively corresponding to the channels are determined based on cumulative distribution functions (CDFs) of the channels determined by a process of training the neural network (column 20, line 65 to column 21, line 10 and lines 21 to 24 - VIB moves layer-by-layer to identify portions of the network (e.g., nodes, edges, or even entire filters) that are not passing important information; VIB introduces probabilistic (e.g., Gaussian) noise into the output values of a set of computation nodes of the network (e.g., the nodes of one or more layers of the network); the outputs of such nodes (which are passed to nodes in the next layer) are made to vary probabilistically around the actual computed output value during training; this noise enables the training system to identify nodes that are less important to the eventual output of the network (e.g., the classification decision, etc.) and remove these nodes; each layer of the network is treated as a bottleneck for the purpose of identifying the nodes, edges, and/or filters that can be removed from the network; column 21, lines 45 to 52 - the loss function represents the loss for a single layer, with the subscript c representing each channel output by the layer (e.g., the outputs for each filter of the layer); the complete loss function, then is a sum over all of the layers, with a different y and σc for each layer; the σc represents the noise variance for the channel, while the coefficient y is a multiplicative variable that can be changed per layer (i.e., loss function (equivalent to importance value) estimates information transmitted by each layer in order to determine whether to remove/ keep the nodes/ channel); column 22, lines 33 to 34 and lines 62 to 65 - a channel can be removed once its noise variance (σc) exceeds a threshold (e.g., 1); various sigmoid functions may be used for the loss function: these include the logistic function, an algebraic sigmoid function, and a Cauchy cumulative distribution function (CDF) - thus, determining to remove/ keep channels based on the loss function/ CDFs of the channels). As to dependent Claim 11, Xu, Teig, and Lin teach all the limitations of Claim 10. Teig further teaches wherein determining of the portion of channels based on CDFs (column 20, line 65 to column 21, line 10 and lines 21 to 24 - VIB moves layer-by-layer to identify portions of the network (e.g., nodes, edges, or even entire filters) that are not passing important information; VIB introduces probabilistic (e.g., Gaussian) noise into the output values of a set of computation nodes of the network (e.g., the nodes of one or more layers of the network); the outputs of such nodes (which are passed to nodes in the next layer) are made to vary probabilistically around the actual computed output value during training; this noise enables the training system to identify nodes that are less important to the eventual output of the network (e.g., the classification decision, etc.) and remove these nodes; each layer of the network is treated as a bottleneck for the purpose of identifying the nodes, edges, and/or filters that can be removed from the network; column 21, lines 45 to 52 - the loss function represents the loss for a single layer, with the subscript c representing each channel output by the layer (e.g., the outputs for each filter of the layer); the complete loss function, then is a sum over all of the layers, with a different y and σc for each layer; the σc represents the noise variance for the channel, while the coefficient y is a multiplicative variable that can be changed per layer (i.e., loss function (equivalent to importance value) estimates information transmitted by each layer in order to determine whether to remove/ keep the nodes/ channel); column 22, lines 33 to 34 and lines 62 to 65 - a channel can be removed once its noise variance (σc) exceeds a threshold (e.g., 1); various sigmoid functions may be used for the loss function: these include the logistic function, an algebraic sigmoid function, and a Cauchy cumulative distribution function (CDF) - thus, determining to remove/ keep channels based on the loss function/ CDFs of the channels). Xu further teaches wherein determining of the portion of channels comprises determining a binary mask based on the CDFs and a threshold ([0045]-[0046] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel to effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; in an initial prune, 30% of the lowest ranked channels (e.g., those with the lowest aggregate weights) selected for pruning and a mask may be generated based on this pruning percentage and the sorting (i.e., mask based on threshold to prune channels with importance below the threshold); [0053] a layer-wise weight threshold computed based on the statistical distribution (equivalent to CDF) of full dense weights in each channel-pruned layer and weight pruning may be performed to mask out those weights that are less than the corresponding layer-specific threshold; [0056] a layer-wise mask defined to represent the binary mask governing which weights to prune during any given training iteration - thus, the binary-mask based on the CDF/ statistical distribution and threshold); and determining the portion of channels to be used for the calculation based on the determined binary mask ([0045]-[0046] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel to effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; in an initial prune, 30% of the lowest ranked channels (e.g., those with the lowest aggregate weights) selected for pruning and a mask may be generated based on this pruning percentage and the sorting; the channels may then be pruned according to the mask to generate a pruned version of the layer; ([0030] provide test data as an input to the pruned neural network (i.e., using the determined portion of channels/ unpruned channels of the neural network) and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network based on one or more outputs generated from the test data input to the pruned neural network (i.e., performing calculation)). As to dependent Claim 14, Xu, Teig, and Lin teach all the limitations of Claim 1. Teig further teaches wherein the neural network is a convolutional neural network, and hidden layers of the convolutional neural network include a convolutional layer, a batch normalization layer, and a rectified linear unit (ReLU) layer (column 4, lines 44 to 49 - the neural networks are convolutional feed-forward neural networks; the intermediate layers (referred to as “hidden” layers) may include convolutional layers, pooling layers, fully-connected layers, and normalization layers; column 33, lines 65 to 66-some embodiments use a first variable multiplied by a ReLU or leaky ReLU function). As to dependent Claim 15, Xu, Teig, and Lin teach all the limitations of Claim 1. Xu further teaches wherein the determining comprises determining channels to be used for calculation for each of hidden layers of the neural network ([0028] network pruner tool provide functionality to perform both coarse-grained neural network pruning to prune channels, kernels, or nodes from the neural network model as well as more fine-grained neural network pruning to prune individual weights from the model; [0030]-[0031] coarse-grained pruning logic block of network pruner tool identify the relative importance of various channels, kernels, and/or nodes of a neural network and iteratively prune the model to first remove those portions of the neural network determined to be less important; provide test data as an input to the pruned neural network and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network; a threshold weight value may be used to determine whether a weight should be pruned; multiple different layer-specific threshold weight values may be determined for a neural network model, and fine-grained pruning at any given layer within the neural network model may be pruned based on the corresponding layer-specific value; [0037]-[0038] a neural network 305 may include a number of layers, including an input layer, output layer, and a number of different hidden layers interconnected between the input and output layers; the hidden layers may include one or more different types of layers such as dense layers, convolutional layers, pooling layers, and recurrent layers; pruning may involve pruning each of the layers of the model; as illustrated in the example of FIG. 3, the input and output layers 308, 325 may be left unpruned, with the pruning instead focused on hidden layers 310, 315, 320 - thus, determining the channels to be used (i.e., unpruned channels) for test/accuracy calculation for the hidden layers). Claim 16 is a medium claim that is corresponding to the method claim 1 above and therefore, rejected for the same reasons. Xu further teaches wherein a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, configure the processor to perform the method ([0104] the non-transitory, machine readable medium include instructions to direct the processor to perform a specific sequence or flow of actions). Claim 17 is an apparatus claim that is corresponding to the method claim 1 above and therefore, rejected for the same reasons. Xu further teaches wherein a neural network data processing apparatus, the apparatus comprising: a processor ([0024] FIG. 1 illustrating a system that includes various devices (e.g., 120, 125, 130, 135) capable of utilizing machine learning models in the course of their operation; devices utilize neural network models in connection with detecting persons, animals, or objects within their respective environments and/or conditions, characteristics, and events within these environments based on sensor data generated at the devices (i.e., neural network data processing electronic device); [0027] system including a system 105 for use in performing preprocessing on existing neural network models to adapt and prepare the models for distribution to and use by resource-constrained devices (e.g., 125); the preprocessing system include processors). As to dependent Claim 18, Xu, Teig, and Lin teach all the limitations of Claim 17. Xu further teaches wherein determine the portion of channels to be used for the calculation based on a lightweight degree of the neural network ([0030] coarse-grained pruning logic block of network pruner tool identify the relative importance of various channels, kernels, and/or nodes of a neural network and iteratively prune the model to first remove those portions of the neural network determined to be less important; importance reflects the neural network's sensitivity to the removal of these portions affecting the pruned neural network's accuracy; [0046]-[0047] initial prune defined such that a particular starting percentage of channels is identified for pruning (i.e., lightweight degree of neural network); in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights; if it is determined during the test that the initial prune of a layer allows the neural network to still retain sufficient accuracy, the initial pruning percentage may be increased (e.g., incremented by 5%, 10%, etc.) by the pruner tool (i.e., the portion of channels determined based on lightweight degree of the neural network )). As to dependent Claim 19, Xu, Teig, and Lin teach all the limitations of Claim 17. Xu further teaches wherein determine a current channel included in the neural network to be a channel to be used for the calculation, in response to an importance value of the current channel being greater than a threshold ([0045]-[0047] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning; in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights - thus, the channels are ranked based on their importance and those with importance higher than 30% (i.e., threshold), are kept unpruned; [0030] provide test data as an input to the pruned neural network and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network (i.e., using the unpruned channels for test/accuracy calculation), and the threshold is determined based on a lightweight degree required for the neural network ([0046]-[0047] initial prune defined such that a particular starting percentage of channels is identified for pruning (i.e., lightweight degree of neural network); in an initial prune, 30% of the lowest ranked channels may be selected for pruning; if it is determined during the test that the initial prune of a layer allows the neural network to still retain sufficient accuracy, the initial pruning percentage may be increased (e.g., incremented by 5%, 10%, etc.) by the pruner tool - thus, the threshold is based on lightweight degree of the neural network). Claim 20 is an apparatus claim that is corresponding to the method claim 10 above and therefore, rejected for the same reasons. Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Xu in view of Teig and Lin, further in view of Ishikawa et al. (US 2023/0004811 A1 hereinafter Ishikawa). As to dependent Claim 12, Xu, Teig, and Lin teach all the limitations of Claim 10. Xu further teaches wherein parameters of the CDFs are learned using a mask, in a process of training the neural network ([0030] coarse-grained pruning logic block of network pruner tool identify the relative importance of various channels, kernels, and/or nodes of a neural network and iteratively prune the model to first remove those portions of the neural network determined to be less important; [0045]-[0047] the channels in a layer sorted based on the respective sum of the absolute values of the weights in the channel to effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; in an initial prune, 30% of the lowest ranked channels (e.g., those with the lowest aggregate weights) selected for pruning and a mask may be generated based on this pruning percentage and the sorting; the channels then pruned according to the mask; a new mask created to prune an additional number of channels from the layer according to the incremented percentage; [0052] FIG. 5B shows a pipeline for generating a pruned version of a neural network from coarse-grained pruning performed by a pruner tool; when the fine-tuning is completed, the pruned (or thinned or sparse) neural network model is ready for use and deployment on resource-constrained computing systems; [0053] fine-grained weight pruning performed in connection with the training/ fine-tuning of the pruned network; a layer-wise weight threshold computed based on the statistical distribution (equivalent to CDF) of full dense weights in each channel-pruned layer and weight pruning may be performed to mask out those weights that are less than the corresponding layer-specific threshold (i.e., statistical distribution of weights of the channels learned using masks having values based on thresholds to prune channels, in a process of training the network); [0056] a layer-wise mask defined to represent the binary mask governing which weights to prune during any given training iteration). However, Xu, Teig, and Lin fail to expressly teach wherein mask having continuous values in the form of a logistic function. In the same filed of endeavor, Ishikawa teaches wherein mask having continuous values in the form of a logistic function ([0128] in order to delete the channels at a unit of feature channel of the convolution layer 1302, the mask layer 1304 learns the parameter ν; [0132] the parameter v is further sampled from a relaxed Bernoulli distribution during learning; when the relaxed Bernoulli distribution is used, continuous values having values that fall within a range from 0 to 1 such as 0.1 and 0.5 are sampled as the parameter ν (i.e., mask having continuous values in the form of a logistic function within the range from 0 and 1); the mask layer 1304 calculates and outputs products of the sampled parameter ν and the entire channels corresponding to inputted feature maps). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein mask having continuous values in the form of a logistic function, as taught by Ishikawa into Xu, Teig, and Lin. Doing so would be desirable because it would enable the learning of a lightweight model to be finished within a short period of time (Ishikawa [0008]). Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Xu in view of Teig, Lin, and Ishikawa, further in view of Kruglov (US 2020/0394520 A1). As to dependent Claim 13, Xu, Teig, Lin, and Ishikawa teach all the limitations of Claim 12. Ishikawa further teaches wherein in the process of training, a differentiable soft mask is determined using a Gumbel-softmax function ([0132] the parameter v is further sampled from a relaxed Bernoulli distribution during learning; the relaxed Bernoulli distribution is also known as Gumbel-softmax; when the relaxed Bernoulli distribution is used, continuous values having values that fall within a range from 0 to 1 such as 0.1 and 0.5 are sampled as the parameter ν; the mask layer 1304 calculates and outputs products of the sampled parameter ν and the entire channels corresponding to inputted feature maps - thus, determining differentiable soft mask between 0 to 1 using Gumbel-softmax function. However, Xu, Teig, Lin, and Ishikawa fail to expressly teach wherein backward propagation training is performed based on the soft mask. In the same filed of endeavor, Kruglov teaches wherein backward propagation training is performed based on the soft mask ([0031] a loss of CNN may result from masking, wherein the backward pass of the iteration communicates or otherwise determines a relationship between such loss and one or more parameters of system; during or after the backward pass, some or all such parameters may be variously updated to determine a possible pruning of CNN 104 and/or to prepare for a next iteration of the evaluation process; [0053]-[0054] to facilitate processing of an iteration's backward pass, evaluation logic send to mask layer information which is determined based on the forward pass portion of that iteration; respective derivatives of the loss may be evaluated for a given mask layer; the derivative values variously evaluated using a respective backpropagation algorithm, which is implemented in a deep learning framework of device; [0056] the determining of mask values with a Lipschitz-continuous and differentiable mask function enable learnable mask layers; for a given mask layer, evaluations of the various derivatives is used to perform an iterative update which facilitate dynamic adaptation by the mask layer during gradient descent optimization - thus, backward propagation training is performed based on the soft mask). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein backward propagation training is performed based on the soft mask, as taught by Kruglov into Xu, Teig, Lin, and Ishikawa. Doing so would be desirable because it would provide improvements to efficient implementation of neural networks (Kruglov [0005]). Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Xu et al. (US 2019/0362235 A1 hereinafter Xu) in view of Teig et al. (US 11,900,238 B1 hereinafter Teig), Ishikawa et al. (US 2023/0004811 A1 hereinafter Ishikawa), Steck (US 2017/0024391 A1), and Lin et al. (US 2020/0234128 A1 hereinafter Lin) . Regarding Claim 21, Xu teaches a neural network data processing electronic device ([0024] FIG. 1 illustrating a system that includes various devices (e.g., 120, 125, 130, 135) capable of utilizing machine learning models in the course of their operation; devices utilize neural network models in connection with detecting persons, animals, or objects within their respective environments and/or conditions, characteristics, and events within these environments based on sensor data generated at the devices (i.e., neural network data processing electronic device)), the electronic device comprising: a processor ([0027] system including a system 105 for use in performing preprocessing on existing neural network models to adapt and prepare the models for distribution to and use by resource-constrained devices (e.g., 125); the preprocessing system include processors) configured to: receive input data including an object ([0024] devices utilize neural network models in connection with detecting persons, animals, or objects within their respective environments; devices include vehicles, drones, robots, and other devices, which possess autonomous navigation capabilities, allowing the devices to detect attributes and conditions within physical space, plan paths within the environment, avoid collisions, and interact with things within the environment utilizing one or more sensors; the data generated from these sensors may be provided as an input to a machine learning model, such as a neural network model (e.g., convolutional neural network (CNN), deep neural network (DNN), spiking neural network (SNN), etc.); [0029] a preprocessing system provide the network pruner tool as a service to resource constrained system 125; a query or request submitted to the preprocessing system identifying a particular neural network model and requesting that the model be pruned); determine a channel included in a neural network to be a channel to be used for performing an inference operation based on an important value of the channel determined using a distribution of the channel learned using a mask ([0024] devices, such as, 120, 125, 130, 135, utilize neural network models in connection with detecting persons, animals, or objects within their respective environments and/or conditions, characteristics, and events within these environments based on sensor data generated at the devices (i.e., inference operation); [0027]-[0028] performing preprocessing on existing neural network models to adapt and prepare the models for distribution to and use by resource-constrained devices (e.g., 125); [0030] coarse-grained pruning logic block of network pruner tool identify the relative importance of various channels, kernels, and/or nodes of a neural network (i.e., importance values of channels) and iteratively prune the model to first remove those portions of the neural network determined to be less important; [0045]-[0047] the channels in a layer sorted based on the respective sum of the absolute values of the weights in the channel to effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; in an initial prune, 30% of the lowest ranked channels (e.g., those with the lowest aggregate weights) selected for pruning and a mask may be generated based on this pruning percentage and the sorting; the channels then pruned according to the mask; a new mask created to prune an additional number of channels from the layer according to the incremented percentage; [0052] FIG. 5B shows a pipeline for generating a pruned version of a neural network from coarse-grained pruning performed by a pruner tool; when the fine-tuning is completed, the pruned (or thinned or sparse) neural network model is ready for use and deployment on resource-constrained computing systems - thus, determining unpruned channel to be used to perform inference operation; [0053] fine-grained weight pruning performed in connection with the fine-tuning of the pruned network; a layer-wise weight threshold computed based on the statistical distribution of full dense weights in each channel-pruned layer and weight pruning may be performed to mask out those weights that are less than the corresponding layer-specific threshold (i.e., pruning/ determining unpruned channel to perform inference operation is based on statistical distribution of weights of the channels/ impaortance, learned using masks having values based on thresholds to prune channels); [0056] a layer-wise mask defined to represent the binary mask governing which weights to prune during any given training iteration); perform the inference operation including object verification or recognition using a result of performing a calculation based on the input data using the determined channel and authenticate the object associated with the input data based on a result of performing the inference operation including the object verification or recognition (based on the 112(a) discussion above, this limitation is interpreted as: performing object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network. [0024] devices, such as, 120, 125, 130, 135, utilize neural network models in connection with detecting persons, animals, or objects within their respective environments and/or conditions, characteristics, and events within these environments based on sensor data generated at the devices (i.e., inference operation including object verification or recognition); the data generated from these sensors provided as an input to a machine learning model, such as a neural network model, from which one or more outputs may be generated that cause actuators of the device (e.g., 125, 130, 135) to autonomously direct movement of the device within the environment; [0030] provide test data as an input to the pruned neural network (i.e., using the determined portion of channels/ unpruned channels of the neural network) and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network based on one or more outputs generated from the test data input to the pruned neural network (i.e., performing calculation); [0052] when the fine-tuning is completed, the pruned (or thinned or sparse) neural network model is ready for use and deployment on resource-constrained computing systems - thus, the deployed pruned network model is used to perform inference operation including object verification or recognition based on the output generated/ calculations performed based on the input); set a predetermined threshold according to a required lightweight degree ([0045]-[0047] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning (i.e., lightweight degree of neural network); in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights - thus, the channels are ranked based on their importance and those with importance less than 30% (i.e., threshold according to lightweight degree/ percentage of channels) of the ranked channels are selected for pruning); and determine a channel to be deactivated by comparing an importance value of each channel to the threshold ([0045]-[0047] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning (i.e., lightweight degree of neural network); in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights - thus, the channels are ranked based on their importance and those with importance less than 30% (i.e., threshold) of the ranked channels, are pruned (i.e., determining channel to be deactivated)). However, Xu fails to expressly teach wherein determining importance value based on a cumulative distribution function (CDF) of the channel. In the same filed of endeavor, Teig teaches wherein determining importance value based on a cumulative distribution function (CDF) of the channel (column 20, line 65 to column 21, line 10 and lines 21 to 24 - VIB moves layer-by-layer to identify portions of the network (e.g., nodes, edges, or even entire filters) that are not passing important information; VIB introduces probabilistic (e.g., Gaussian) noise into the output values of a set of computation nodes of the network (e.g., the nodes of one or more layers of the network); the outputs of such nodes (which are passed to nodes in the next layer) are made to vary probabilistically around the actual computed output value during training; this noise enables the training system to identify nodes that are less important to the eventual output of the network (e.g., the classification decision, etc.) and remove these nodes; each layer of the network is treated as a bottleneck for the purpose of identifying the nodes, edges, and/or filters that can be removed from the network; column 21, lines 45 to 52 - the loss function represents the loss for a single layer, with the subscript c representing each channel output by the layer (e.g., the outputs for each filter of the layer); the complete loss function, then is a sum over all of the layers, with a different y and σc for each layer; the σc represents the noise variance for the channel, while the coefficient y is a multiplicative variable that can be changed per layer (i.e., loss function estimates information transmitted by each layer in order to determine whether to remove/ keep the nodes/ channel); column 22, lines 33 to 34 and lines 62 to 65 - a channel can be removed once its noise variance (σc) exceeds a threshold (e.g., 1); various sigmoid functions may be used for the loss function: these include the logistic function, an algebraic sigmoid function, and a Cauchy cumulative distribution function (CDF) - thus, the less important nodes/ channels are removed based on the loss function/ CDFs of the channels). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein determining importance value based on a cumulative distribution function (CDF) of the channel, as taught by Teig into Xu. Doing so would be desirable because it would greatly reduce the resources required to achieve optimal network structure (Teig, column 34, lines 53 to 56) and removing the subset of computation nodes from the trained network increases sparsity of the network, thereby reducing memory and power resources required to execute the trained network (Teig, claim 6). However, Xu and Teig fail to expressly teach wherein mask having continuous values. In the same filed of endeavor, Ishikawa teaches wherein mask having continuous values ([0128] in order to delete the channels at a unit of feature channel of the convolution layer 1302, the mask layer 1304 learns the parameter ν; [0132] the parameter v is further sampled from a relaxed Bernoulli distribution during learning; when the relaxed Bernoulli distribution is used, continuous values having values that fall within a range from 0 to 1 such as 0.1 and 0.5 are sampled as the parameter ν (i.e., mask having continuous values in the form of a logistic function within the range from 0 and 1); the mask layer 1304 calculates and outputs products of the sampled parameter ν and the entire channels corresponding to inputted feature maps). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein mask having continuous values, as taught by Ishikawa into Xu and Teig. Doing so would be desirable because it would enable the learning of a lightweight model to be finished within a short period of time (Ishikawa [0008]). However, Xu, Teig, and Ishikawa fail to expressly teach wherein CDF learned by applying a logistic function to the CDF. In the same filed of endeavor, Steck teaches wherein CDF learned by applying a logistic function to the CDF ([0039] for a Gaussian distribution of scores, a cumulative distribution function (CDF) derived from the Gaussian distribution provides a smooth mapping from scores to corresponding ranks (i.e., CDF learned using a mask having continuous values); [0040] the training subsystem 152 is configured to implement the activation function 212 based on an approximation to a CDF derived from a Gaussian distribution of score; [0063] training engine 216 selects the activation function 212 as a logistic function that approximates a cumulative distribution function (CDF) derived from an approximate Gaussian distribution (i.e., by applying logistic function to CDF)). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein a mask having continuous values, as taught by Steck into Xu, Teig, and Ishikawa. Doing so would be desirable because it would optimize the rank loss function via optimization techniques that are considered to be computationally efficient (Steck [0011]). However, Xu, Teig, Ishikawa, and Steck fail to expressly teach wherein performing normalization on the channels by determining a mean and a standard deviation of the channels and performing a transformation on the normalization using the batch normalization parameters. In the same filed of endeavor, Lin teaches wherein performing normalization on the channels by determining a mean and a standard deviation of the channels and performing a transformation on the normalization using the batch normalization parameters ([0033] the utilization monitoring module 202 may detect dead neurons using the scale parameter γ from the following equations (which a batch normalization layer uses to normalize outputs of a convolutional layer before the batch normalization layer): z^=zin-μβσβ2+ϵ;zout=γz^+β (i.e., performing normalization on the channel), where: zin is an input to the batch normalization layer; zout is an output of the batch normalization layer; ββ and σβ are the respective mean and standard deviation values of input activations over β (i.e., mean and standard deviation of the channel); and γ and β are trainable affine transformation parameters (scale and shift) (i.e., transformation using the batch normalization parameters)). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein performing normalization on the channels by determining a mean and a standard deviation of the channels and performing a transformation on the normalization using the batch normalization parameters, as taught by Lin into Xu, Teig, Ishikawa, and Steck. Doing so would be desirable because it would help identify neurons of the network which have little or no impact on the network's ability to reliably accomplish a task (Lin [0020]), thereby improve the performance of the neural network in real-time as resource needs are identified (Lin [0032]). Response to Arguments Claim Objection: Applicant’s amendments did not overcome all of the claim objections previously set forth. 35 U.S.C. §112: Applicant’s amendments did not overcome the 112(a) rejections previously set forth. 35 U.S.C. §101: In the remarks, Applicant argues that: (a) Prong 1: (1) The present claims do not recite an abstract idea. (2) The Office Action has provided no rationale evidencing that claimed subject matter is similar to what the courts have identified as an abstract idea, and provided even no citation of any court-identified cases with respect to the claimed features. (b) Prong 2: The claimed features are integrated into a practical application. The claims are directed to a specific implementation for improvements to technology or technical field of performing recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network. Even if it could be considered that the claims are directed to an abstract idea, the claims also include an element, or a combination of elements, that are sufficient to ensure that the claims amount to significantly more than the judicial exception and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition. Examiner respectfully disagrees with Applicant’s arguments. As to point (a)(1), in the amended independent claims 1 and 17, the claim limitations “determining a portion of channels to be used for calculation among channels of a neural network based on importance values respectively corresponding to the channels of the neural network, wherein the importance values are determined based on cumulative distribution functions (CDFs) of the channels determined, using batch normalization parameters; performing normalization on the channels by determining a mean and a standard deviation of the channels and performing a transformation on the normalization using the batch normalization parameters; performing an inference operation including object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network; and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition, wherein the determining the portion of channels comprises: setting a predetermined threshold according to a required lightweight degree; and determining a channel to be deactivated by comparing an importance value of each channel to the threshold” and in the amended independent claim 21, the claims limitations “determine a channel included in a neural network to be a channel to be used for performing an inference operation based on an important value of the channel determined using a cumulative distribution function (CDF) of the channel learned using a mask having continuous values by applying a logistic function to the CDF; perform normalization on the channels by determining a mean and a standard deviation of the channels and perform a transformation on the normalization using the batch normalization parameters; perform the inference operation including object verification or recognition using a result of performing a calculation based on the input data using the determined channel; and authenticate the object associated with the input data based on a result of performing the inference operation including the object verification or recognition; set a predetermined threshold according to a required lightweight degree; and determine a channel to be deactivated by comparing an importance value of each channel to the threshold” do recite an abstract idea. These limitations, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind or by using a pen and paper using mathematical concepts - such as, a user drawing a tree of a neural network, selecting a subset of labeled channels of the network to be used for making evaluation and/or judgement based on some priority value using mathematical calculation and/ or relationship, perform normalization using mathematical calculation and/ or relationship, making evaluation and/or judgement related to the input data (i.e., verification or recognition) using the results of calculations performed based on the decided upon channels, making evaluation and/or judgement to select a threshold according to a required proportion/ lightweight degree, and deciding to not use a labeled channel by comparing the priority value to the threshold. Accordingly, these claims recite an abstract idea that falls under the “Mental Processes” and “Mathematical Concepts” grouping. As to point (a)(2), see the 101 rejection above for Examiner’s rationale that identifies the judicial exception recited in the claims. Further, Examiner notes: “When the examiner has determined the claim recites an abstract idea, the rejection should identify the abstract idea as it is recited (i.e., set forth or described) in the claim, and explain why it falls within one of the groupings of abstract ideas. While not required, this explanation or justification may include citing to an appropriate court decision that supports the identification of the subject matter recited in the claim language as an abstract idea within one of the groupings”. (see MPEP 2106.07(a) I) and “When performing the analysis at Step 2A Prong One, it is sufficient for the examiner to provide a reasoned rationale that identifies the judicial exception recited in the claim and explains why it is considered a judicial exception (e.g., that the claim limitation(s) falls within one of the abstract idea groupings). Therefore, there is no requirement for the examiner to rely on evidence”. (see MPEP 2106.07(a) III). As to point (b), firstly, based on the 112(a) discussion above, the limitation “performing an inference operation including object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network; and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition” is interpreted as: performing object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network. Secondly, in the amended independent claims 1 and 17, the claim limitations “determining a portion of channels to be used for calculation among channels of a neural network based on importance values respectively corresponding to the channels of the neural network, wherein the importance values are determined based on cumulative distribution functions (CDFs) of the channels determined, using batch normalization parameters; performing normalization on the channels by determining a mean and a standard deviation of the channels and performing a transformation on the normalization using the batch normalization parameters; performing an inference operation including object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network; and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition, wherein the determining the portion of channels comprises: setting a predetermined threshold according to a required lightweight degree; and determining a channel to be deactivated by comparing an importance value of each channel to the threshold” and in the amended independent claim 21, the claims limitations “determine a channel included in a neural network to be a channel to be used for performing an inference operation based on an important value of the channel determined using a cumulative distribution function (CDF) of the channel learned using a mask having continuous values by applying a logistic function to the CDF; perform normalization on the channels by determining a mean and a standard deviation of the channels and perform a transformation on the normalization using the batch normalization parameters; perform the inference operation including object verification or recognition using a result of performing a calculation based on the input data using the determined channel; and authenticate the object associated with the input data based on a result of performing the inference operation including the object verification or recognition; set a predetermined threshold according to a required lightweight degree; and determine a channel to be deactivated by comparing an importance value of each channel to the threshold” are considered reciting an abstract idea as discussed above. The limitations/ additional elements “receiving input data including an object” is an insignificant extra-solution activity of mere data gathering in order to perform the recited judicial exception (see MPEP § 2106.05(g)), which is well-understood, routine, and conventional activity (see MPEP § 2106.05(d), “receiving/ transmitting data”); “processor-implemented neural network data processing”, “a neural network data processing apparatus, the apparatus comprising: a processor”, and “a neural network data processing electronic device, the electronic device comprising: a processor” are recited at a high-level of generality such that it amount to no more than using generic computer components to apply the judicial exception (see MPEP § 2106.05(f)). As such, these additional elements do not integrate the judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea and do not amount to significantly more than the judicial exception. Further, as to applicant’s remarks regarding improvements to technology, the Examiner notes that “It is important to keep in mind that an improvement in the abstract idea itself (e.g. a recited method of organizing human activity) is not an improvement in technology” (see MPEP 2106.05(a)(II)). Additionally, “It is important to note, the judicial exception alone cannot provide the improvement” (see MPEP § 2106.05 (a)). For the reasons above, Applicant’s arguments concerning the §101 rejections are not persuasive. 35 U.S.C. §103: In the remarks, applicant argues that Xu does not, and would/could not, set a predetermined threshold according to a required lightweight degree, and determine a channel to be deactivated by comparing an importance value of each channel to the threshold, as set forth in amended independent 1 and as similarly recited in amended independent claims 17 and 21. Teiq, Ishikawa and Steck are silent on and provide no teaching, motivation or suggestion to modify teachings of Xu with respect to the above-noted claimed features. Xu, Teiq, Ishikawa and Steck fail to disclose or suggest, taken individually or even in combination thereof, all respective features of amended independent claims 1, 17, and 21. Examiner respectfully disagrees with Applicant’s arguments. The combination of Xu, Teiq, and Lin do teach the features recited in independent claims 1 and 17 and the combination of Xu, Teiq, Ishikawa, Steck, and Lin do teach the features recited in independent claim 21. Firstly, in response to Applicants’ arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Secondly, Xu teaches that network pruner tool provide functionality to perform both coarse-grained neural network pruning to prune channels, kernels, or nodes from the neural network model as well as more fine-grained neural network pruning to prune individual weights from the model; coarse-grained pruning logic block of network pruner tool identify the relative importance of various channels, kernels, and/or nodes of a neural network and iteratively prune the model to first remove those portions of the neural network determined to be less important; provide test data as an input to the pruned neural network and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network; the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning; in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights (see [0028], [0030], [0045]-[0047]). Thus, the channels are ranked based on their importance and those with importance less than 30% (i.e., threshold set according to lightweight degree/ percentage of channels) of the ranked channels are selected for pruning. See the §103 rejections above for details. According to MPEP 2111, examiner is obliged to give the terms or phrases their broadest interpretation definition awarded by one of an ordinary skill in the art unless applicant has provided some indication of the definition of the claimed terms or phrases. Accordingly, the combination of cited references is considered to teach the features recited in amended independent claims 1, 17, and 21. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Applicant is required under 37 CFR § 1.111(c) to consider these references fully when responding to this action. Zhang et al. (US 2021/0397965 A1) teaches: a certain percentage X of filters in each convolution layer is uniformly removed, where X may be determined based on the desired compression ratio or based on the desired performance of the compressed neural network; a non-uniform pruning approach is implemented in which each i-th layer are pruned with X_i percentage, such as, different pruning percentages may be used for different layers (see [0108]-[0109]). THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to REJI KARTHOLY whose telephone number is (571)272-3432. The examiner can normally be reached on Monday - Thursday 7:30 am - 3:30 pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch, can be reached at telephone number (571)272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from Patent Center. Status information for published applications may be obtained from Patent Center. Status information for unpublished applications is available through Patent Center for authorized users only. Should you have questions about access to Patent Center, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) Form at https://www.uspto.gov/patents/uspto-automated- interview-request-air-form. /REJI KARTHOLY/Primary Examiner, Art Unit 2143
Read full office action

Prosecution Timeline

Show 8 earlier events
Nov 22, 2024
Response after Non-Final Action
Dec 16, 2024
Request for Continued Examination
Dec 30, 2024
Response after Non-Final Action
Apr 30, 2025
Non-Final Rejection mailed — §101, §103, §112
Jul 21, 2025
Response Filed
Aug 19, 2025
Applicant Interview (Telephonic)
Aug 19, 2025
Examiner Interview Summary
Oct 31, 2025
Final Rejection mailed — §101, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12632163
CLOUD SYSTEM, AGGREGATED RESULT DISPLAY METHOD, AND INFORMATION STORAGE MEDIUM
3y 7m to grant Granted May 19, 2026
Patent 12585963
METHOD AND DEVICE FOR LEARNING A STRATEGY AND FOR IMPLEMENTING THE STRATEGY
4y 8m to grant Granted Mar 24, 2026
Patent 12585988
SYSTEMS AND METHODS FOR GENERATING AND APPLYING A SECURE STATISTICAL CLASSIFIER
4y 0m to grant Granted Mar 24, 2026
Patent 12572395
Method and Devices for Latency Compensation
4y 9m to grant Granted Mar 10, 2026
Patent 12572846
SYSTEM AND METHOD FOR DEVICE ATTRIBUTE IDENTIFICATION BASED ON HOST CONFIGURATION PROTOCOLS
3y 11m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

5-6
Expected OA Rounds
65%
Grant Probability
99%
With Interview (+72.1%)
3y 1m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 153 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month