Prosecution Insights
Last updated: April 19, 2026
Application No. 16/254,563

TECHNIQUES FOR REMOVING MASKS FROM PRUNED NEURAL NETWORKS

Final Rejection §102
Filed
Jan 22, 2019
Examiner
GERMICK, JOHNATHAN R
Art Unit
2122
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
6 (Final)
47%
Grant Probability
Moderate
7-8
OA Rounds
4y 2m
To Grant
79%
With Interview

Examiner Intelligence

Grants 47% of resolved cases
47%
Career Allow Rate
43 granted / 91 resolved
-7.7% vs TC avg
Strong +32% interview lift
Without
With
+32.1%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
28 currently pending
Career history
119
Total Applications
across all art units

Statute-Specific Performance

§101
29.0%
-11.0% vs TC avg
§103
38.5%
-1.5% vs TC avg
§102
17.3%
-22.7% vs TC avg
§112
14.3%
-25.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 91 resolved cases

Office Action

§102
DETAILED ACTION This action is responsive to the Application filed on 12/03/2025 Claims 1-20 are pending in the case. Claims 1, 11 and 18 are independent claims. Claims 1, 5, 6, 9-13, 15, 18 and 19 are amended. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments Applicant's arguments filed 12/03/2025 have been fully considered but they are not persuasive With respect to 35 U.S.C. 101: Upon further consideration the rejection has been withdrawn. With respect to prior art: Applicant principally argues firstly, “the reference does not describe performing gather operation based on deactivated weights of a pruned neural network” and secondly, “Ren is silent with respect to performing scatter and gather operations in place of a mask operation”. Applicant does not substantiate their position any further. Examiner disagrees. Several important features with respect to the limits of the claim are provided for clarity of the record. The claim recites “one or more pruned neural networks”, the claim does not describe how the networks are pruned, as such the “pruned” is a label for “neural networks”. Paragraph 0016 describes pruning as generating a mask that zeros out elements of the tensors of the neural network to reduce the size. Specification 0022 describes pruning a trained neural network via generate a masked neural network by identifying redundant elements within the tensors. Specification paragraph 0033 describes first an evaluation using a tensor W and a mask M to evaluate a node output in the neural network graph. Specification paragraph 0035 describes a denser version of the that input tensor by demasking. To create the dense tensor, w, the demasking engine identifies portions of tensor W that are zeroed out via a mask M. Figure 8 goes on to describe removal of the first mask to create the densified version for the scatter operation. This is to say that the claims use of “gather and scatter” operations in place of a mask describes densifying a tensor utilizing a mask. The resulting densified vector allows the system to avoid using deactivated weights indicated by the mask. In this way a mask is not needed in a tensor multiplication operation, however the disclosure clearly describes using the mask in a different way (to serve as an identifier for zero or redundant elements in a tensor) to enable the gather and scatter operations via dense tensor. The very same process of using a densified tensor for gather and scatter operations is described in the cited art Ren. As shown in figure 2 of the cited art, a densified “Tile indices”, or tensor, is generated by a demasking engine which identifies portions of the original tensor that are zeroed out because as noted by Ren “Sparse inference is beneficial to accuracy as the network focuses more of its computational attention on useful activation patterns and ignore”. As described on page 4 of the art the gather and scatter kernels use the list of indices to perform their operations. For these reasons Ren is thought to “describe performing gather operation based on deactivated weights of a pruned neural network” because the down sampled mask represents the non-deactivated weights used to extract patches for the gather operation (“Knowing the block sizes and overlap sizes, we can perform a simple pooling operation, such as maximum or average pooling followed by a threshold to downsample the input mask. The resulting non-zero locations are the spatial block locations that we extract the patches from.” (Section 3.1 pg 3). Because only activated activations are used rather than the full activation, the weights are of a “pruned neural network”. Finally, Ren describes “performing scatter and gather operations in place of a mask operation” precisely because the system replaces the binary mask via conversion to a list of indices for the neural network gather/scatter operations. The art does not multiply the activation matrix with the mask itself, instead the mask is used to densify the tensor just as the instant application describes using a densified tensor which is created by a demasking engine which uses the mask to identify non-redundant elements of the original tensor. (Section 3 “converts a binary mask to a list of indices, where each index references the location of the corresponding n-dimensional block … Sparse gather/scatter: For gathering, we extract a block from the input tensor, given the start location and the size of the n-d block. Scatter is the inverse operation”). Here the blocks which are extracted are given by the derived list of indices. Examiner notes this process is thought to be equivalent to the demasking engine described in the cited portions of the specification which densify a tensor in place of a Mask. Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ren “SBNet: Sparse Blocks Network for Fast Inference” Claim 1/11/18 Ren teaches, A computer-implemented method comprising (claim 1)…a system comprising one or more processors (claim 11)…one or more processors comprising circuitry (claim 18) (pg 1 Introduction “We implemented our proposed sparse convolution kernels (fragments of parallel code) on graphics processing unit (GPU)”) Ren teaches, generating output of one or more pruned neural networks by at least performing one or more scatter and gather operations generated based, at least in part, on one or more deactivated weights of the one or more pruned neural networks (pg 1 “In these examples, spatial sparsity can be represented as binary computation masks where ones indicate active locations that need more computation and zeros inactive” pg 2 “we gather block-wise slices from tensors and maintain the tensor shape instead of lowering them to vectors. Within each active block, we perform a regular dense convolution” Figure 1 pg 1 PNG media_image1.png 206 393 media_image1.png Greyscale as shown and described in the art a gather operation collected a subset of a feature map which represent the pruned neural network. The binary mask defines the active and inactive areas which are used and not used for computations. Thus, the resulting gathering is based on deactivated portions of the input tensor maps, i.e deactivated weights.) wherein the one or more scatter and gather operations are performed in place of a mask operation of the one or more pruned neural networks (pg 3 “The input to our sparse convolution module is a dense binary mask. Just like other standard sparse operations, we first need to extract a list of active location indices, which is named the reduce mask operation. Then, we would like to extract data from the sparse inputs at specified locations and paste the computed results back to the original tensor. To summarize, there are two major building blocks in our approach to sparse block-wise convolution…Reduce mask to indices: converts a binary mask to a list of indices, where each index references the location of the corresponding n-dimensional block in the input tensor and in our current implementation this is a 3- d tuple…For gathering we extract a block from the input tensor…Scatter is the inverse operation” The mask is replaced/converted with a list of active location indices this reduced input block is used in the subsequent gather and scatter operations. Examiner notes that the mask matrix itself is not used in a multiplication step like that described in instant figure 4. Instead it is the of the gather and scatter operations on a list of indices in place of the binary mask, thus corresponding to the claim.) Claim 2/12 Ren teaches claim 1/11 Ren teaches, performing the one or more scatter operations to match original dimensionality of one or more representations comprising the one or more deactivated weights (Figure 1 PNG media_image1.png 206 393 media_image1.png Greyscale Pg 2 “we gather block-wise slices from tensors and maintain the tensor shape instead of lowering them to vectors” as shown in the figure scattering remaps the computation to the original dimension) Claim 3/16 Ren teaches claim 1/11 Ren teaches, wherein the one or more deactivated weights corresponds to a zero value (pg 1 “In these examples, spatial sparsity can be represented as binary computation masks where ones indicate active locations that need more computation and zeros inactive.” The binary masks have zero values for the deactivated tensor elements pg 3 “The resulting non-zero locations are the spatial block locations that we extract the patches from.” The elements used for computation are non-zero values, conversely zero values are therefore prevented from being used to infer as they are not extracted for computation.) Claim 4 Ren teaches claim 1 Ren teaches, the one or more deactivated weights correspond to one or more functions, wherein the one or more functions comprise at least one of: concatenation function, a matrix multiply function, a convolution function, or a rectifier linear unit (ReLU) function. (pg 1 “In this work, we leverage structured sparsity patterns of computation masks and propose Sparse Blocks Networks (SBNet), which computes convolution on a blockwise decomposition of the mask”) Claim 5/14/20 Ren teaches claim 1/11/20 Ren teaches, the one or more deactivated weights are prevented from being used to infer information by replacing a first tensor that comprises the one or more deactivated weights with a second tensor that is generated based, at least in part, on the first tensor. (pg 1 “we gather block-wise slices from tensors and maintain the tensor shape instead of lowering them to vectors” pg 4 “we then slice the blocks out of the 4-d N × H × W × C input tensor using h×w×C slices, where h and w are the blocks’ height and width, and stack the B slices into a new tensor along the batch dimension, yielding a B×h×w×C tensor” the tensor is created by replacing the original tensor with a new tensor, thereby preventing certain elements of the original tensor to be used to infer.) Claim 6 Ren teaches claim 1 Ren teaches, the one or more deactivated weights correspond to a masked portion of the one or more pruned neural networks (pg 1 “we propose to use the masks to guide the convolutional filters. Computation masks can also be considered as a form of attention mechanism where the attention weights are binary.” Figure 1 PNG media_image1.png 206 393 media_image1.png Greyscale the mask selectively attends to a portion of the tensor which is not deactivated as shown in the figure) Claim 7 Ren teaches claim 1 Ren teaches, the one or more deactivated weights are identified based, at least in part, on whether the one or more deactivated weights contribute to generation of an output tensor. (pg 3 “We observe that many input sources have structured sparsity that meshes well with block sparsity - background pixels are likely to be surrounded by other background pixels. It stands to reason that computations for entire spatial clumps or “blocks” of activations can be skipped.” Pg 6 “Using the same activation size of our detector network, we test the speedup on three types of masks: … Predicted masks obtained from the outputs of PSPNet.” The masks which identify the deactivated weights are based on skipping background pixels which have minimal impact on the activation of the block, further the masks are obtained based on the generation of the output tensor from PSPNet.) Claim 8 Ren teaches claim 1 Ren teaches, determining whether to perform a scatter operation on a first node that is associated with one of the one or more deactivated weights based, at least in part on a function associated with a second node that is subsequent to the first node. (Figure 1 and pg 3 “For gathering, we extract a block from the input tensor, given the start location and the size of the n-d block. Scatter is the inverse operation where we update the output tensor using previously gathered and transformed data” as shown in figure 1 each the scatter operation, associated with the mask or deactivated weights, is based on subsequent convolutions and gather operations.) Claim 9/13/19 Ren teaches claim 1/11/18 Ren teaches, combining the one or more scatter operations with one or more additional scatter operations associated with one or more layers of the one or more pruned neural networks. (claim 9) cause coalescing of the one or more scatter operations with an additional scatter operation associated with a second layer of the one or more pruned neural networks that resides after to a first layer of the one or more pruned neural networks in a sequence of layers of the one or more pruned neural networks (claim 13) combine one or more scatter operations with additional scatter operations to align dimensionality of a first portion of the one or more pruned neural networks to dimensionality of a second portion of the one or more pruned neural networks. (claim 19) ( Section 3 “ For speed-up purposes, the same sparsity mask is reused for every layer in our experiments, but it can also be computed from a different source per layer…” pg 3 “In this section, we first go over details of the above two building blocks, and then we introduce a sparse blocks residual unit which groups several layers of computation into sparse blocks” Figure 1 PNG media_image2.png 214 398 media_image2.png Greyscale multiple scatter operations are shown in combination for the convolution module of the neural network over several layers of computation. The dimensionality is aligned by maintaining the dimension of the pre-gathered tensor.)Claim 10/15 Ren teaches claim 2/11/18 Ren teaches, propagating one or more scatter operations to one or more portions of the one or more pruned neural network to match an original dimensionality of one or more layers of the one or more pruned neural networks (claim 10)…. propagate the one or more scatter operations to an match original dimensionality of one or more layers of the one or more pruned neural networks that comprise the one or more deactivated weights. (claim 15) Figure 1 PNG media_image2.png 214 398 media_image2.png Greyscale multiple scatter operations are shown in combination for the convolution module of the neural network over several layers of computation. The dimensionality is aligned by maintaining the dimension of the pre-gathered tensor, or the original dimensionality.) Claim 17 Ren teaches claim 16 Ren teaches, combine two or more scatter operations to be performed in response to the one or more gather operations Figure 1 PNG media_image2.png 214 398 media_image2.png Greyscale multiple scatter operations are shown in combination for the convolution module of the neural network over several layers of computation. Pg 1 “For gathering, we extract a block from the input tensor, given the start location and the size of the n-d block. Scatter is the inverse operation where we update the output tensor using previously gathered and transformed data” scattering is the inverse operation, performed to undo a gather operation after a convolution operation to re-map back to the original dense full dimensionality representation.) Conclusion Prior art not relied upon: Anwar et al. “Structured Pruning of Deep Convolutional Neural Networks” describes pruning via removal of weight connection in a neural network. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 9:30-4:30. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /J.R.G./ Examiner, Art Unit 2122 /KAKALI CHAKI/ Supervisory Patent Examiner, Art Unit 2122
Read full office action

Prosecution Timeline

Jan 22, 2019
Application Filed
Nov 04, 2021
Non-Final Rejection — §102
May 02, 2022
Applicant Interview (Telephonic)
May 02, 2022
Examiner Interview Summary
May 10, 2022
Response Filed
Jul 28, 2022
Final Rejection — §102
Jan 21, 2023
Interview Requested
Jan 31, 2023
Applicant Interview (Telephonic)
Jan 31, 2023
Examiner Interview Summary
Feb 06, 2023
Notice of Allowance
May 10, 2023
Response after Non-Final Action
May 24, 2023
Response after Non-Final Action
Aug 17, 2023
Non-Final Rejection — §102
Oct 27, 2023
Interview Requested
Nov 02, 2023
Examiner Interview Summary
Nov 02, 2023
Applicant Interview (Telephonic)
Feb 29, 2024
Response Filed
May 13, 2024
Final Rejection — §102
Nov 18, 2024
Notice of Allowance
May 19, 2025
Request for Continued Examination
May 21, 2025
Response after Non-Final Action
Aug 29, 2025
Non-Final Rejection — §102
Nov 25, 2025
Examiner Interview Summary
Nov 25, 2025
Applicant Interview (Telephonic)
Dec 03, 2025
Response Filed
Jan 20, 2026
Final Rejection — §102
Apr 01, 2026
Request for Continued Examination
Apr 06, 2026
Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12566962
DITHERED QUANTIZATION OF PARAMETERS DURING TRAINING WITH A MACHINE LEARNING TOOL
2y 5m to grant Granted Mar 03, 2026
Patent 12566983
MACHINE LEARNING CLASSIFIERS PREDICTION CONFIDENCE AND EXPLANATION
2y 5m to grant Granted Mar 03, 2026
Patent 12554977
DEEP NEURAL NETWORK FOR MATCHING ENTITIES IN SEMI-STRUCTURED DATA
2y 5m to grant Granted Feb 17, 2026
Patent 12443829
NEURAL NETWORK PROCESSING METHOD AND APPARATUS BASED ON NESTED BIT REPRESENTATION
2y 5m to grant Granted Oct 14, 2025
Patent 12443868
QUANTUM ERROR MITIGATION USING HARDWARE-FRIENDLY PROBABILISTIC ERROR CORRECTION
2y 5m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

7-8
Expected OA Rounds
47%
Grant Probability
79%
With Interview (+32.1%)
4y 2m
Median Time to Grant
High
PTA Risk
Based on 91 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month