Last updated: April 19, 2026
Application No. 17/148,619
METHOD AND APPARATUS WITH NEURAL NETWORK DATA PROCESSING

Final Rejection §101§103§112
Filed
Jan 14, 2021
Examiner
KARTHOLY, REJI P
Art Unit
2143
Tech Center
2100 — Computer Architecture & Software
Assignee
Snu R&Db Foundation
OA Round
4 (Final)
Interview Optional

— +71.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 151 resolved cases, 2023–2026
Examiner Intelligence

KARTHOLY, REJI P View full profile →
Grants 64% of resolved cases
Career Allow Rate
97 granted / 151 resolved
+9.2% vs TC avg
Strong +72% interview lift
Without
With
+71.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
18 currently pending
Career history
169
Total Applications
across all art units
Statute-Specific Performance

§101
13.7%
-26.3% vs TC avg
§103
55.7%
+15.7% vs TC avg
§102
8.8%
-31.2% vs TC avg
§112
12.0%
-28.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 151 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
This Office Action is in response to Applicant's Response filed on 07/21/2025 for the above identified application.  

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed on 07/21/2025 has been entered.  
Claims 1, 17, and 21 have been amended.  Claims 1-21 are pending in the application.  

Claim Objection
Claim 21 is objected to because of the following informalities:  Claim 21 recites the terms “the channels” and “the batch normalization parameters”, which have no antecedent basis.  Appropriate correction is required.  


Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-21 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA  the inventor(s), at the time the application was filed, had possession of the claimed invention.  
 
Independent Claims 1 and 17 recite “performing an inference operation including object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network; and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition”  independent Claim 21 recites “perform the inference operation including object verification or recognition using a result of performing a calculation based on the input data using the determined channel; and authenticate the object associated with the input data based on a result of performing the inference operation including the object verification or recognition”.  The background/ related art section of the specification (see [0003]) describes that "Technology may perform user authentication.. may be based on neural network.. neural network may be used to output recognition result.." (underlining added for emphasis).  User authentication is briefly mentioned in the background section as noted above and nowhere else in the specification. That is not the same as "performing an inference operation including object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network; and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition."   Performing an inference operation using a result of a calculation using the determined portion of channels and authenticating an object associated with input data based on the performed inference operation is not disclosed in the specification.  At best, the specification suggests that performing object verification or recognition in relation to the input data.  Therefore, the language - performing an inference operation including object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network; and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition - constitutes new matter.  (See also 37 C.F.R. 1.121(f), MPEP 608.04, 706.03(o)).  For the purposes of examination, the Examiner will interpret this limitation as: performing object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network. 
 
Claims 2-16 and 18-20 are also rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as being dependent on parent claims failing to comply with the written description requirement.
 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claims 1-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1
Claims 1-15 are directed to a method and claims 17-21 are directed to an apparatus/ device. Thus, the claims fall within one of the statutory categories (process and machine) and are eligible under Step 1.

Step 2A Prong 1
Independent Claims 
Claims 1 and 17 recite:
determining a portion of channels to be used for calculation among channels of a neural network based on importance values respectively corresponding to the channels of the neural network, wherein the importance values are determined based on cumulative distribution functions (CDFs) of the channels determined, using batch normalization parameters; performing normalization on the channels by determining a mean and a standard deviation of the channels and performing a transformation on the normalization using the batch normalization parameters; performing an inference operation including object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network; and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition, wherein the determining the portion of channels comprises: setting a predetermined threshold according to a required lightweight degree; and determining a channel to be deactivated by comparing an importance value of each channel to the threshold- these limitations encompass mental processes and mathematical concepts, such as a user drawing a tree of a neural network, selecting a subset of labeled channels of the network to be used for making evaluation and/or judgement based on some priority value using mathematical calculation and/ or relationship, perform normalization using mathematical calculation and/ or relationship, making evaluation and/or judgement related to the input data (i.e., verification or recognition) using the results of calculations performed based on the decided upon channels, making evaluation and/or judgement to select a threshold according to a required proportion/ lightweight degree, and deciding to not use a labeled channel by comparing the priority value to the threshold.
Thus, the claims recite an abstract idea that falls under the “Mental Processes” and “Mathematical Concepts” grouping.
Claim 21 recites:
determine a channel included in a neural network to be a channel to be used for performing an inference operation based on an important value of the channel determined using a cumulative distribution function (CDF) of the channel learned using a mask having continuous values by applying a logistic function to the CDF; perform normalization on the channels by determining a mean and a standard deviation of the channels and perform a transformation on the normalization using the batch normalization parameters; perform the inference operation including object verification or recognition using a result of performing a calculation based on the input data using the determined channel; and authenticate the object associated with the input data based on a result of performing the inference operation including the object verification or recognition; set a predetermined threshold according to a required lightweight degree; and determine a channel to be deactivated by comparing an importance value of each channel to the threshold - these limitations encompass mental processes and mathematical concepts, such as a user drawing a tree of a neural network, selecting a subset of labeled channels of the network to be used for making evaluation and/or judgement based on some priority value using mathematical calculation and/ or relationship, perform normalization using mathematical calculation and/ or relationship, making evaluation and/or judgement related to the input data (i.e., verification or recognition) using the results of calculations performed based on the decided upon channels, making evaluation and/or judgement to select a threshold according to a required proportion/ lightweight degree, and deciding to not use a labeled channel by comparing the priority value to the threshold.
Thus, the claim recites an abstract idea that falls under the “Mental Processes” and “Mathematical Concepts” grouping.
Step 2A Prong 2
Independent Claims 
Additional elements
Claims 1 and 17:
receiving input data including an object – this limitation amount to insignificant extra-solution activity of mere data gathering (see MPEP § 2106.05(g)).
processor-implemented neural network data processing - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components to apply the judicial exception (see MPEP § 2106.05(f)).  
Claim 17:
a neural network data processing apparatus, the apparatus comprising: a processor - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components to apply the judicial exception (see MPEP § 2106.05(f)).
Claim 21:
receive input data including an object - this limitation amount to insignificant extra-solution activity of mere data gathering (see MPEP § 2106.05(g)).
a neural network data processing electronic device, the electronic device comprising: a processor - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components to apply the judicial exception (see MPEP § 2106.05(f)).
Accordingly, these additional elements do not integrate the judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims are directed to the abstract idea.
Step 2B
Independent Claims 
Additional elements
Claims 1 and 17:
receiving input data including an object – this limitation amounts to insignificant extra-solution activity of mere data gathering, which is well-understood, routine, and conventional activity (see MPEP § 2106.05(d), “receiving/ transmitting data”).
processor-implemented neural network data processing - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components to apply the judicial exception (see MPEP § 2106.05(f)).  
Claim 17:
a neural network data processing apparatus, the apparatus comprising: a processor - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components to apply the judicial exception (see MPEP § 2106.05(f)).
Claim 21:
receive input data including an object - this limitation amounts to insignificant extra-solution activity of mere data gathering, which is well-understood, routine, and conventional activity (see MPEP § 2106.05(d), “receiving/ transmitting data”).
a neural network data processing electronic device, the electronic device comprising: a processor - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components to apply the judicial exception (see MPEP § 2106.05(f)).
Accordingly, these additional elements do not amount to significantly more than the judicial exception. As such, the claims are patent ineligible.

Step 2A Prong 1
Dependent Claims
Claims 2 and 18: 
the number of channels in the determined portion varies based on a lightweight degree of the neural network - this limitation encompasses mathematical concepts and/ or mental process.
Claim 3:
the lightweight degree is a proportion of the channels of the neural network to be used for calculation, and the lightweight degree is determined based on any one or any combination of a memory usage, a processing speed, and a processing time of an apparatus - this limitation encompasses mathematical concepts.
Claim 4:
the determining comprises determining a portion of channels satisfying the lightweight degree based on an order of the importance values of the channels of the neural network - this limitation encompasses mathematical concepts and/ or mental process.
Claim 5:
the order of the importance values is an order from greatest to least among the importance values - this limitation encompasses mental process.
Claim 6:
the determining comprises determining a current channel included in the neural network to be a channel to be used for the calculation, in response to an importance value of the current channel being greater than a threshold - this limitation encompasses mathematical concepts and/ or mental process.
Claim 7:
the determining comprises determining to deactivate the current channel such that the channel is not used for the calculation, in response to the importance value of the current channel being less than or equal to the threshold - this limitation encompasses mathematical concepts and/ or mental process.
Claim 8:
the threshold is determined based on a lightweight degree of the neural network - this limitation encompasses mathematical concepts and/ or mental process.
Claim 9:
the importance value of the current channel is a probability value corresponding to a degree of influence on the calculation for the input data in response to the current channel being deactivated - this limitation encompasses mathematical concepts and/ or mental process.
Claim 10 and 20:
the importance values respectively corresponding to the channels are determined based on cumulative distribution functions (CDFs) of the channels determined by a process of training the neural network - this limitation encompasses mathematical concepts and/ or mental process.
Claim 11:
the determining of the portion of channels comprises: determining a binary mask based on the CDFs and a threshold; and determining the portion of channels to be used for the calculation based on the determined binary mask - this limitation encompasses mathematical concepts and/ or mental process.
Claim 12:
parameters of the CDFs are learned using a mask having continuous values in the form of a logistic function, in a process of training the neural network - this limitation encompasses mathematical concepts and/ or mental process.
Claim 13:
in the process of training, a differentiable soft mask is determined using a Gumbel-softmax function, and backward propagation training is performed based on the soft mask - this limitation encompasses mathematical concepts and/ or mental process.
Claim 15:
determining comprises determining channels to be used for calculation for each of hidden layers of the neural network - this limitation encompasses mental process.
Claim 19:
for the determining determine a current channel included in the neural network to be a channel to be used for the calculation, in response to an importance value of the current channel being greater than a threshold, and the threshold is determined based on a lightweight degree required for the neural network - this limitation encompasses mathematical concepts and/ or mental process.
Thus, the claims recite the abstract idea.
Step 2A Prong 2
Dependent Claims 
Additional elements
Claim 14:
the neural network is a convolutional neural network, and hidden layers of the convolutional neural network include a convolutional layer, a batch normalization layer, and a rectified linear unit (ReLU) layer - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components and using generic class of algorithms to apply the judicial exception (see MPEP § 2106.05(f)).  This limitation can also be viewed as generally linking the judicial exception to the technological environment of neural networks (see MPEP § 2106.05(h)).
Claim 16:
non-transitory computer-readable storage medium storing instructions that, when executed by a processor, configure the processor to perform the method - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components and using generic class of algorithms to apply the judicial exception (see MPEP § 2106.05(f)).
Accordingly, these additional elements do not integrate the judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims are directed to the abstract idea.
Step 2B
Dependent Claims 
Additional elements
Claim 14:
the neural network is a convolutional neural network, and hidden layers of the convolutional neural network include a convolutional layer, a batch normalization layer, and a rectified linear unit (ReLU) layer - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components and using generic class of algorithms to apply the judicial exception (see MPEP § 2106.05(f)).  This limitation can also be viewed as generally linking the judicial exception to the technological environment of neural networks (see MPEP § 2106.05(h)).
Claim 16:
non-transitory computer-readable storage medium storing instructions that, when executed by a processor, configure the processor to perform the method - this limitation is recited at a high-level of generality such that it amount to no more than using generic computer components and using generic class of algorithms to apply the judicial exception (see MPEP § 2106.05(f)).
Accordingly, these additional elements do not amount to significantly more than the judicial exception. As such, the claims are patent ineligible.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-11 and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Xu et al. (US 2019/0362235 A1 hereinafter Xu) in view of Teig et al. (US 11,900,238 B1 hereinafter Teig), further in view of Lin et al. (US 2020/0234128 A1 hereinafter Lin).

Regarding Claim 1, Xu teaches a processor-implemented neural network data processing method ([0024] FIG. 1 illustrating a system that includes various devices (e.g., 120, 125, 130, 135) capable of utilizing machine learning models in the course of their operation; devices utilize neural network models in connection with detecting persons, animals, or objects within their respective environments and/or conditions, characteristics, and events within these environments based on sensor data generated at the devices; [0027] system for use in performing preprocessing on existing neural network models to adapt and prepare the models for distribution to and use by resource-constrained devices; pre-processing system  implement a network pruner tool, implemented in hardware- and/or software-based logic on the preprocessing system; the preprocessing system include processors), the method comprising:  
receiving input data including an object ([0024] devices utilize neural network models in connection with detecting persons, animals, or objects within their respective environments; devices include vehicles, drones, robots, and other devices, which possess autonomous navigation capabilities, allowing the devices to detect attributes and conditions within physical space, plan paths within the environment, avoid collisions, and interact with things within the environment utilizing one or more sensors; the data generated from these sensors may be provided as an input to a machine learning model, such as a neural network model (e.g., convolutional neural network (CNN), deep neural network (DNN), spiking neural network (SNN), etc.); [0029] a preprocessing system provide the network pruner tool as a service to resource constrained system 125; a query or request submitted to the preprocessing system identifying a particular neural network model and requesting that the model be pruned);  
determining a portion of channels to be used for calculation among channels of a neural network based on importance values respectively corresponding to the channels of the neural network ([0024] devices, such as, 120, 125, 130, 135, utilize neural network models in connection with detecting persons, animals, or objects within their respective environments and/or conditions, characteristics, and events within these environments based on sensor data generated at the devices; [0028] network pruner tool provide functionality to perform both coarse-grained neural network pruning to prune channels, kernels, or nodes from the neural network model as well as more fine-grained neural network pruning to prune individual weights from the model; [0030] coarse-grained pruning logic block of network pruner tool identify the relative importance of various channels, kernels, and/or nodes of a neural network (i.e., importance values respectively corresponding to the channels of the neural network ) and iteratively prune the model to first remove those portions of the neural network determined to be less important; provide test data as an input to the pruned neural network and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network - thus, determining a portion of channels/ unpruned channels to be used to perform test/ accuracy calculation);  
performing an inference operation including object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network; and authenticating the object associated with the input data based on a result of performing the inference operation including the object verification or recognition (based on the 112(a) discussion above, this limitation is interpreted as: performing object verification or recognition in relation to the input data, using a result of performing a calculation based on the input data using the determined portion of channels of the neural network. [0024] devices, such as, 120, 125, 130, 135, utilize neural network models in connection with detecting persons, animals, or objects within their respective environments and/or conditions, characteristics, and events within these environments based on sensor data generated at the devices (i.e., inference operation including object verification or recognition); the data generated from these sensors provided as an input to a machine learning model, such as a neural network model, from which one or more outputs may be generated that cause actuators of the device (e.g., 125, 130, 135) to autonomously direct movement of the device within the environment; [0030] provide test data as an input to the pruned neural network (i.e., using the determined portion of channels/ unpruned channels of the neural network) and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network based on one or more outputs generated from the test data input to the pruned neural network (i.e., performing calculation); [0052] when the fine-tuning is completed, the pruned (or thinned or sparse) neural network model is ready for use and deployment on resource-constrained computing systems - thus, the deployed pruned network model is used to perform inference operation including object verification or recognition based on the output generated/ calculations performed based on the input), wherein the determining the portion of channels comprises: setting a predetermined threshold according to a required lightweight degree ([0045]-[0047] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning (i.e., lightweight degree of neural network);  in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights - thus, the channels are ranked based on their importance and those with importance less than 30% (i.e., threshold) of the ranked channels are selected for pruning); and determining a channel to be deactivated by comparing an importance value of each channel to the threshold ([0045]-[0047] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning (i.e., lightweight degree of neural network);  in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights - thus, the channels are ranked based on their importance and those with importance less than 30% (i.e., threshold according to lightweight degree/ percentage of channels) of the ranked channels, are pruned (i.e., determining channel to be deactivated)). 
However, Xu fails to expressly teach wherein the importance values are determined based on cumulative distribution functions (CDFs) of the channels determined using batch normalization parameters.
In the same filed of endeavor, Teig teaches wherein the importance values are determined based on cumulative distribution functions (CDFs) of the channels determined using batch normalization parameters (column 20, line 65 to column 21, line 10 and lines 21 to 24 - VIB moves layer-by-layer to identify portions of the network (e.g., nodes, edges, or even entire filters) that are not passing important information; VIB introduces probabilistic (e.g., Gaussian) noise into the output values of a set of computation nodes of the network (e.g., the nodes of one or more layers of the network); the outputs of such nodes (which are passed to nodes in the next layer) are made to vary probabilistically around the actual computed output value during training; this noise enables the training system to identify nodes that are less important to the eventual output of the network (e.g., the classification decision, etc.) and remove these nodes; each layer of the network is treated as a bottleneck for the purpose of identifying the nodes, edges, and/or filters that can be removed from the network; column 21, lines 45 to 52 - the loss function represents the loss for a single layer, with the subscript c representing each channel output by the layer (e.g., the outputs for each filter of the layer); the complete loss function, then is a sum over all of the layers, with a different y and σc for each layer; the σc represents the noise variance for the channel, while the coefficient y is a multiplicative variable that can be changed per layer (i.e., loss function estimates information transmitted by each layer in order to determine whether to remove/ keep the nodes/ channel); column 22, lines 33 to 34 and lines 62 to 65 - a channel can be removed once its noise variance (σc) exceeds a threshold (e.g., 1); various sigmoid functions may be used for the loss function: these include the logistic function, an algebraic sigmoid function, and a Cauchy cumulative distribution function (CDF) - thus, determining to remove/ keep channels based on the loss function/ CDFs of the channels; column 7, lines 10 to 25 - before a multi-layer network can be used to solve a particular problem (e.g., image classification, face recognition, etc.), the network is put through a supervised training process that adjusts the network's configurable parameters; the training process uses different input value sets with known output value sets; for each selected input value set (i.e., batch), the training process typically (1) forward propagates the input value set through the network's nodes to produce a computed output value set and then (2) backpropagates a gradient (rate of change) of a loss function (output error) that quantifies in a particular way the difference between the input set's known output value set and the input set's computed output value set, in order to adjust the network's configurable parameters (e.g., the weight values); column 4, lines 44 to 49 - the neural networks are convolutional feed-forward neural networks. In this case, the intermediate layers (referred to as “hidden” layers) may include convolutional layers, pooling layers, fully-connected layers, and normalization layers - thus, in feed forward neural networks with normalization layer, the training process propagates batch of inputs through network's nodes and the loss function/ CDF is determined using normalization parameters for the batch of inputs). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein the importance values are determined based on cumulative distribution functions (CDFs) of the channels determined using batch normalization parameters, as taught by Teig into Xu.  Doing so would be desirable because it would greatly reduce the resources required to achieve optimal network structure (Teig, column 34, lines 53 to 56) and removing the subset of computation nodes from the trained network increases sparsity of the network, thereby reducing memory and power resources required to execute the trained network  (Teig, claim 6).    
However, Xu and Teig fail to expressly teach wherein performing normalization on the channels by determining a mean and a standard deviation of the channels and performing a transformation on the normalization using the batch normalization parameters. 
In the same filed of endeavor, Lin teaches wherein performing normalization on the channels by determining a mean and a standard deviation of the channels and performing a transformation on the normalization using the batch normalization parameters ([0033] the utilization monitoring module 202 may detect dead neurons using the scale parameter γ from the following equations (which a batch normalization layer uses to normalize outputs of a convolutional layer before the batch normalization layer): z^=zin-μβσβ2+ϵ;zout=γz^+β (i.e., performing normalization on the channel), where: zin is an input to the batch normalization layer; zout is an output of the batch normalization layer; ββ and σβ are the respective mean and standard deviation values of input activations over β (i.e., mean and standard deviation of the channel); and γ and β are trainable affine transformation parameters (scale and shift) (i.e., transformation using the batch normalization parameters)).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein performing normalization on the channels by determining a mean and a standard deviation of the channels and performing a transformation on the normalization using the batch normalization parameters, as taught by Lin into Xu and Teig.  Doing so would be desirable because it would help identify neurons of the network which have little or no impact on the network's ability to reliably accomplish a task (Lin [0020]), thereby improve the performance of the neural network in real-time as resource needs are identified (Lin [0032]).   

As to dependent Claim 2, Xu, Teig, and Lin teach all the limitations of Claim 1.  Xu further teaches wherein the number of channels in the determined portion varies based on a lightweight degree of the neural network ([0030] coarse-grained pruning logic block of network pruner tool identify the relative importance of various channels, kernels, and/or nodes of a neural network and iteratively prune the model to first remove those portions of the neural network determined to be less important; importance reflects the neural network's sensitivity to the removal of these portions affecting the pruned neural network's accuracy; [0046]-[0047] initial prune defined such that a particular starting percentage of channels is identified for pruning (i.e., lightweight degree of neural network);  in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights; if it is determined during the test that the initial prune of a layer allows the neural network to still retain sufficient accuracy, the initial pruning percentage may be increased (e.g., incremented  by 5%, 10%, etc.) by the pruner tool (i.e., the number of channels in the determined portion varies based on lightweight degree of the neural network )).  

As to dependent Claim 3, Xu, Teig, and Lin teach all the limitations of Claim 2.  Xu further teaches wherein the lightweight degree is a proportion of the channels of the neural network to be used for calculation ([0046]-[0047] initial prune defined such that a particular starting percentage of channels is identified for pruning;  in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights (i.e., the remaining 70% of channels/ proportion of unpruned channels used for calculation); if it is determined during the test that the initial prune of a layer allows the neural network to still retain sufficient accuracy, the initial pruning percentage may be increased (e.g., incremented  by 5%, 10%, etc.) by the pruner tool) , and the lightweight degree is determined based on any one or any combination of a memory usage, a processing speed, and a processing time of an apparatus ([0029] preprocessing system provide the network pruner tool as a service to resource constrained system; a query or request submitted to the preprocessing system identifying a particular neural network model and requesting that the model be pruned; [0038] pruned, thinned, or sparse, neural network model is dramatically smaller in size, making the model well-suited for use by and implementation on a resource-constrained device possessing significantly lower memory and processing power - thus, the lightweight degree is determined based on any one or any combination of memory usage, processing speed, and a processing time of apparatus/ resource-constrained device). 

As to dependent Claim 4, Xu, Teig, and Lin teach all the limitations of Claim 2.  Xu further teaches wherein the determining comprises determining a portion of channels satisfying the lightweight degree based on an order of the importance values of the channels of the neural network ([0045]-[0046] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel (i.e., order of the importance values of the channels); initial prune defined such that a particular starting percentage of channels is identified for pruning;  in an initial prune, 30% of the lowest ranked channels may be selected for pruning).  

As to dependent Claim 5, Xu, Teig, and Lin teach all the limitations of Claim 4.  Xu further teaches wherein the order of the importance values is an order from greatest to least among the importance values ([0045]-[0046] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning;  in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights (i.e., order of the importance values is an order from greatest to least)).  

As to dependent Claim 6, Xu, Teig, and Lin teach all the limitations of Claim 1.  Xu further teaches wherein the determining comprises determining a current channel included in the neural network to be a channel to be used for the calculation, in response to an importance value of the current channel being greater than a threshold ([0045]-[0047] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning;  in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights - thus, the channels are ranked based on their importance and those with importance higher than 30% (i.e., threshold) of the ranked channels, are kept unpruned; [0030] provide test data as an input to the pruned neural network and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network (i.e., using the unpruned channels for  test/accuracy calculation).  

As to dependent Claim 7, Xu, Teig, and Lin teach all the limitations of Claim 6.  Xu further teaches wherein the determining comprises determining to deactivate the current channel such that the channel is not used for the calculation, in response to the importance value of the current channel being less than or equal to the threshold ([0045]-[0047] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel; such a sorting effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; initial prune defined such that a particular starting percentage of channels is identified for pruning;  in an initial prune, 30% of the lowest ranked channels may be selected for pruning, e.g., those with lowest aggregate weights - thus, the channels are ranked based on their importance and those with importance less than 30% (i.e., threshold) of the ranked channels are selected for pruning; [0030] provide test data as an input to the pruned neural network and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network - thus, the pruned neural network do not include channels with importance less than threshold (i.e., those channels deactivated) and not used for test/accuracy calculation).  

As to dependent Claim 8, Xu, Teig, and Lin teach all the limitations of Claim 6.  Xu further teaches wherein the threshold is determined based on a lightweight degree of the neural network ([0046]-[0047] initial prune defined such that a particular starting percentage of channels is identified for pruning (i.e., lightweight degree of neural network);  in an initial prune, 30% of the lowest ranked channels may be selected for pruning; if it is determined during the test that the initial prune of a layer allows the neural network to still retain sufficient accuracy, the initial pruning percentage may be increased (e.g., incremented  by 5%, 10%, etc.) by the pruner tool - thus, the threshold is based on lightweight degree of the neural network).  

As to dependent Claim 9, Xu, Teig, and Lin teach all the limitations of Claim 6.  Xu further teaches wherein the importance value of the current channel is a probability value corresponding to a degree of influence on the calculation for the input data in response to the current channel being deactivated ([0030] identify the relative importance of various channels, kernels, and/or nodes of a neural network and iteratively prune the model to first remove those portions of the neural network determined to be less important; importance reflects the neural network's sensitivity to the removal of these portions affecting the pruned neural network's accuracy; after each pruning iteration, the pruned neural network may be tested for accuracy to determine whether additional portions may be pruned while keeping the accuracy of the model within an acceptable threshold or range of values - thus, the importance value corresponds to the degree of influence on the network's accuracy in response to the channel being pruned/ deactivated).  

As to dependent Claim 10, Xu, Teig, and Lin teach all the limitations of Claim 1. Teig further teaches wherein the importance values respectively corresponding to the channels are determined based on cumulative distribution functions (CDFs) of the channels determined by a process of training the neural network (column 20, line 65 to column 21, line 10 and lines 21 to 24 - VIB moves layer-by-layer to identify portions of the network (e.g., nodes, edges, or even entire filters) that are not passing important information; VIB introduces probabilistic (e.g., Gaussian) noise into the output values of a set of computation nodes of the network (e.g., the nodes of one or more layers of the network); the outputs of such nodes (which are passed to nodes in the next layer) are made to vary probabilistically around the actual computed output value during training; this noise enables the training system to identify nodes that are less important to the eventual output of the network (e.g., the classification decision, etc.) and remove these nodes; each layer of the network is treated as a bottleneck for the purpose of identifying the nodes, edges, and/or filters that can be removed from the network; column 21, lines 45 to 52 - the loss function represents the loss for a single layer, with the subscript c representing each channel output by the layer (e.g., the outputs for each filter of the layer); the complete loss function, then is a sum over all of the layers, with a different y and σc for each layer; the σc represents the noise variance for the channel, while the coefficient y is a multiplicative variable that can be changed per layer (i.e., loss function  (equivalent to importance value) estimates information transmitted by each layer in order to determine whether to remove/ keep the nodes/ channel); column 22, lines 33 to 34 and lines 62 to 65 - a channel can be removed once its noise variance (σc) exceeds a threshold (e.g., 1); various sigmoid functions may be used for the loss function: these include the logistic function, an algebraic sigmoid function, and a Cauchy cumulative distribution function (CDF) - thus, determining to remove/ keep channels based on the loss function/ CDFs of the channels).  

As to dependent Claim 11, Xu, Teig, and Lin teach all the limitations of Claim 10. Teig further teaches wherein determining of the portion of channels based on CDFs (column 20, line 65 to column 21, line 10 and lines 21 to 24 - VIB moves layer-by-layer to identify portions of the network (e.g., nodes, edges, or even entire filters) that are not passing important information; VIB introduces probabilistic (e.g., Gaussian) noise into the output values of a set of computation nodes of the network (e.g., the nodes of one or more layers of the network); the outputs of such nodes (which are passed to nodes in the next layer) are made to vary probabilistically around the actual computed output value during training; this noise enables the training system to identify nodes that are less important to the eventual output of the network (e.g., the classification decision, etc.) and remove these nodes; each layer of the network is treated as a bottleneck for the purpose of identifying the nodes, edges, and/or filters that can be removed from the network; column 21, lines 45 to 52 - the loss function represents the loss for a single layer, with the subscript c representing each channel output by the layer (e.g., the outputs for each filter of the layer); the complete loss function, then is a sum over all of the layers, with a different y and σc for each layer; the σc represents the noise variance for the channel, while the coefficient y is a multiplicative variable that can be changed per layer (i.e., loss function  (equivalent to importance value) estimates information transmitted by each layer in order to determine whether to remove/ keep the nodes/ channel); column 22, lines 33 to 34 and lines 62 to 65 - a channel can be removed once its noise variance (σc) exceeds a threshold (e.g., 1); various sigmoid functions may be used for the loss function: these include the logistic function, an algebraic sigmoid function, and a Cauchy cumulative distribution function (CDF) - thus, determining to remove/ keep channels based on the loss function/ CDFs of the channels).  Xu further teaches wherein determining of the portion of channels comprises determining a binary mask based on the CDFs and a threshold ([0045]-[0046] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel to effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; in an initial prune, 30% of the lowest ranked channels (e.g., those with the lowest aggregate weights) selected for pruning and a mask may be generated based on this pruning percentage and the sorting (i.e., mask based on threshold to prune channels with importance below the threshold); [0053] a layer-wise weight threshold computed based on the statistical distribution (equivalent to CDF) of full dense weights in each channel-pruned layer and weight pruning may be performed to mask out those weights that are less than the corresponding layer-specific threshold; [0056] a layer-wise mask defined to represent the binary mask governing which weights to prune during any given training iteration - thus, the binary-mask based on the CDF/ statistical distribution and threshold); and determining the portion of channels to be used for the calculation based on the determined binary mask ([0045]-[0046] the channels in a layer may be sorted based on the respective sum of the absolute values of the weights in the channel to effectively rank order the channels of the layer based on the relative importance or sensitivity of that channel; in an initial prune, 30% of the lowest ranked channels (e.g., those with the lowest aggregate weights) selected for pruning and a mask may be generated based on this pruning percentage and the sorting; the channels may then be pruned according to the mask to generate a pruned version of the layer; ([0030] provide test data as an input to the pruned neural network (i.e., using the determined portion of channels/ unpruned channels of the neural network) and perform a test on the pruned neural network to determine an accuracy value for the pruned neural network based on one or more outputs generated from the test data input to the pruned neural network (i.e., performing calculation)).

As to dependent Claim 14, Xu, Teig, and Lin teach all the limitations of Claim 1. Teig further teaches wherein the neural network is a convolutional neural network, and hidden layers of the convolutional neural network include a convolutional layer, a batch normalization layer, and a rectified linear unit (ReLU) layer (column 4, lines 44 to 49 - the neural networks are convolutional feed-forward neural networks; the intermediate layers (referred to as “hidden” layers) may include convolutional layers, pooling layers, fully-connected layers, and normalization layers; column 33, lines 65 to 66-some embodiments use a first variable multiplied by a ReLU or leaky ReLU function).

As to dependent Claim 15, Xu, Teig, and Lin teach all the limitations of Claim 1.  Xu further teaches wherein the determining comprises determining channels to be used for calculation for each of hidden layers of the neural network ([0028] network pruner tool provide functionality to perform both coarse-grained neural network pruning to prune channels, kernels, or nodes from the neural network model as well as more fine-grained neural network p
Read full office action
Prosecution Timeline

Jan 14, 2021
Application Filed
Mar 19, 2024
Non-Final Rejection — §101, §103, §112
Jun 13, 2024
Response Filed
Jul 18, 2024
Applicant Interview (Telephonic)
Jul 18, 2024
Examiner Interview Summary
Sep 15, 2024
Final Rejection — §101, §103, §112
Nov 07, 2024
Response after Non-Final Action
Nov 21, 2024
Examiner Interview (Telephonic)
Nov 22, 2024
Response after Non-Final Action
Dec 16, 2024
Request for Continued Examination
Dec 30, 2024
Response after Non-Final Action
Apr 25, 2025
Non-Final Rejection — §101, §103, §112
Jul 21, 2025
Response Filed
Aug 19, 2025
Examiner Interview Summary
Aug 19, 2025
Applicant Interview (Telephonic)
Oct 29, 2025
Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/305,586
Patent 12585963
METHOD AND DEVICE FOR LEARNING A STRATEGY AND FOR IMPLEMENTING THE STRATEGY
2y 5m to grant Granted Mar 24, 2026
17/683,395
Patent 12585988
SYSTEMS AND METHODS FOR GENERATING AND APPLYING A SECURE STATISTICAL CLASSIFIER
2y 5m to grant Granted Mar 24, 2026
17/331,136
Patent 12572395
Method and Devices for Latency Compensation
2y 5m to grant Granted Mar 10, 2026
17/655,845
Patent 12572846
SYSTEM AND METHOD FOR DEVICE ATTRIBUTE IDENTIFICATION BASED ON HOST CONFIGURATION PROTOCOLS
2y 5m to grant Granted Mar 10, 2026
18/540,360
Patent 12569702
RADIOTHERAPY METHODS, SYSTEMS, AND WORKFLOW-ORIENTED GRAPHICAL USER INTERFACES
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
64%
Grant Probability
99%
With Interview (+71.8%)
3y 4m
Median Time to Grant
High
PTA Risk
Based on 151 resolved cases by this examiner. Grant probability derived from career allow rate.
METHOD AND APPARATUS WITH NEURAL NETWORK DATA PROCESSING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email