Last updated: April 19, 2026

Application No. 17/398,673

PERFORMANCE-AWARE SIZE REDUCTION FOR NEURAL NETWORKS

Final Rejection §103§112

Filed

Aug 10, 2021

Examiner

BOSTWICK, SIDNEY VINCENT

Art Unit

2124

Tech Center

2100 — Computer Architecture & Software

Assignee

Nvidia Corporation

OA Round

4 (Final)

This examiner grants 52% of cases after interview

— +38.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 136 resolved cases, 2023–2026

Examiner Intelligence

BOSTWICK, SIDNEY VINCENT View full profile →

Grants 52% of resolved cases

Career Allow Rate

71 granted / 136 resolved

-2.8% vs TC avg

Strong +38% interview lift

Without

With

+38.2%

Interview Lift

resolved cases with interview

Typical timeline

4y 7m

Avg Prosecution

68 currently pending

Career history

204

Total Applications

across all art units

Statute-Specific Performance

§101

24.4%

-15.6% vs TC avg

§103

40.9%

+0.9% vs TC avg

§102

12.0%

-28.0% vs TC avg

§112

21.9%

-18.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 136 resolved cases

Office Action

§103 §112

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on January 29, 2026, in which claims 1, 7, 13, and 25 are currently amended. Claims 1-18 and 25-30 are currently pending.

Response to Arguments
Applicant’s arguments with respect to rejection of claims 1-18 and 25-30 under 35 U.S.C. 103 based on amendment have been considered and are persuasive. The argument is moot in view of a new ground of rejection set forth below.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-18 and 25-30 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claims 2, 3, 6, 8, 9, 12, 14, 15, 18, 26, 27, and 30, "the one or more portions" lacks antecedent basis.  "One or more portions" is recommended.

Claims 4, 10, 16, and 28 are rejected with respect to their dependence on claims 3, 9, 15, and 27, respectively.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

	Claims 1, 3, 4, 5, 6, 7, 9-13, 15-19, 25, and 27-30 are rejected under U.S.C. §103 as being unpatentable over the combination of Turner (“Distilling with Performance Enhanced Students”, 2019) and Hemmat (“CAP'NN: Class-Aware Personalized Neural Network Inference”, 2020).

    PNG
    media_image1.png
    500
    732
    media_image1.png
    Greyscale

FIG. 1 of Turner



    PNG
    media_image2.png
    446
    778
    media_image2.png
    Greyscale

FIG. 2 of Turner


	 Regarding claim 1, Turner teaches A processor, comprising: one or more circuits to: ([p. 5] "We evaluated our approach on the exemplar embedded CPU and GPU present on the Nvidia Jetson TX2")
	access pre-generated data including different  latency step sizes for different layers of one or more neural networks,([p. 3] "We begin by developing the latency profile for each prunable layer in the network. These prunable layers are the first convolutional layers of each block" [p. 4] "Once we have extracted the Fisher-pruned architecture, we adapt each layer-width choice to the nearest optimal step. This algorithm is also shown in Algorithm 1. Formally, a step on the staircase is defined as any difference in inference time between contiguous channel counts is greater than three standard deviations from the mean difference. An optimal point on such a step is the rightmost channel count" Turner explicitly pre-generates timing measurements across channel counts for each prunable layer (Algorithm 1) and then uses that profiling data for the subsequent adaptation step.)
	 the different latency step sizes corresponding to size granularities of the different layers at which inference time on a specified hardware target changes according to a step function([p. 4] "Once we have extracted the Fisher-pruned architecture, we adapt each layer-width choice to the nearest optimal step. This algorithm is also shown in Algorithm 1. Formally, a step on the staircase is defined as any difference in inference time between contiguous channel counts is greater than three standard deviations from the mean difference. An optimal point on such a step is the rightmost channel count" FIG. 1 and FIG. 4 explicitly show steps corresponding to size granularities of the different layers at which inference time on a specified hardware target changes according to a step function)
	and determine individual neuron accuracy importance scores for individual neurons of the different layers([p. 2 §2] "Channel pruning focuses on the removal of neuron structures [p. 2] " channel saliency metrics [...] The detection of unimportant weights and channels relies on a saliency estimation metric" [p. 3 §3.1] "Fisher pruning [Theis et al., 2018; Molchanov et al., 2017] is a principled channel pruning technique, whereby the saliency metric is an approximation of the change in error that would occur on the removal of  [...] the channel with the lowest ∆c value is pruned"  [p. 4] "we start with the original number of channels from the teacher model and benchmark latency for a single inference, removing a single channel at a time as illustrated in Figure 4" [p. 6] "We perform Fisher pruning in a similar manner. We fine tune, and every 100 steps, a single channel is pruned" Turner "channel" is interpreted as claimed "neuron" in view of the instant specification [¶0051] "there may be Nl neurons, or output channels" explicitly calls the neurons "output channels" and as would be known to one of ordinary skill in the art.  Turner also explicitly refers to channel as a "neuron structure".)
	and cause groups of individual neurons of the one or more neural networks having respective size granularities determined from the pre-generated data to be removed based, at least in part, on a combination of the individual neuron accuracy importance scores and the different latency step sizes such that different numbers of neurons are removed from different layers([p. 2] " channel saliency metrics [...] The detection of unimportant weights and channels relies
on a saliency estimation metric" [p. 3 §3.1] "Fisher pruning [Theis et al., 2018; Molchanov et al., 2017] is a principled channel pruning technique, whereby the saliency metric is an approximation of the change in error that would occur on the removal of  [...] the channel with the lowest ∆c value is pruned"  [p. 4] "we start with the original number of channels from the teacher model and benchmark latency for a single inference, removing a single channel at a time as illustrated in Figure 4" [p. 6] "We perform Fisher pruning in a similar manner. We fine tune, and every 100 steps, a single channel is pruned" Turner explicitly combines channel saliency with inference latency steps in a student-discovery mechanism.  The paper uses Fisher pruning to effectively descend the staircase (remove low-signal channels) and then adapt each layer to the nearest optimal step (See Algorithm 1)).
	While Turner explicitly teaches that channels are neuron structures, and while the instant specification explicitly relates neurons to output channels, Turner does not explicitly teach channels are neurons.

	Hemmat, in the same field of endeavor, teaches channels are neurons ([p. 2 §III] "Our class-aware pruning techniques require to first calculate class specific firing rates for each neuron (or channel in case of convolutional layers" Hemmat explicitly correlates neurons with CNN channels as analogous).

	Turner as well as Hemmat are directed towards pruning convolutional neural networks.  Therefore, Turner as well as Hemmat are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Turner with the teachings of Hemmat by treating the channels as neurons.  While one of ordinary skill in the art would recognize this relationship before the effective filing date of the claimed invention, it is explicitly reinforced by Hemmat who provides as additional motivation for combination ([p. 2] “it is straightforward to adapt the discussions to pruning channels”).  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 3, the combination of Turner and Hemmat teaches The processor of claim 1, wherein the one or more portions each include one or more neurons, (Turner [p. 2 §2] "Channel pruning focuses on the removal of neuron structures (or filters) while fine-tuning")
	and wherein the one or more neurons to be included in the one or more portions to be removed are selected based at least in part upon respective importance scores calculated for the one or more neurons after pre-training of the one or more neural networks.(Turner [p. 4] "Baseline pretrained model" [p. 2] "Step 1: Using channel saliency and empirical latency [...] Starting with a large, trained network, we first perform pruning—specifically, Fisher pruning [Theis et al., 2018]— and then use a latency-aware optimiser to adapt the profile of the pruned network for deployment on a particular device"  [p. 2 §2] "Channel pruning focuses on the removal of neuron structures [p. 2] " channel saliency metrics [...] The detection of unimportant weights and channels relies on a saliency estimation metric" [p. 3 §3.1] "Fisher pruning [Theis et al., 2018; Molchanov et al., 2017] is a principled channel pruning technique, whereby the saliency metric is an approximation of the change in error that would occur on the removal of  [...] the channel with the lowest ∆c value is pruned"  [p. 4] "we start with the original number of channels from the teacher model and benchmark latency for a single inference, removing a single channel at a time as illustrated in Figure 4" [p. 6] "We perform Fisher pruning in a similar manner. We fine tune, and every 100 steps, a single channel is pruned" Turner explicitly beings after pretraining a large model.  Turner then determines importance via a saliency metric grounded in loss impact.).
	
	 Regarding claim 4, the combination of Turner and Hemmat teaches The processor of claim 3, wherein the one or more circuits are further to group sets of neurons based at least in part upon a similarity of the one or more performance metrics.(Turner [p. 4] "a step on the staircase is defined as any difference in inference time between contiguous channel counts is greater than three standard deviations from the mean difference").
	
	 Regarding claim 5, the combination of Turner and Hemmat teaches The processor of claim 1, wherein the performance metrics are determined based at least in part upon a target type of hardware to be used to perform inferencing using the one or more neural networks.(Turner [p. 2] " to design a student network that fits a specific deployment platform [...] use a latency-aware optimiser to adapt the profile of the pruned network for deployment on a particular device" [p. 5] "Each point on the dotted curve relates to an architecture from the Fisher curve that has been adapted for the specific hardware platform").
	
	 Regarding claim 6, the combination of Turner and Hemmat teaches The processor of claim 1, wherein the one or more circuits are further to utilize an optimization solver to determine the one or more portions to be removed.(Turner [p. 5] "We evaluated our approach on the exemplar embedded CPU and GPU present on the Nvidia Jetson TX2" [p. 2] " channel saliency metrics [...] The detection of unimportant weights and channels relies on a saliency estimation metric" [p. 3 §3.1] "Fisher pruning [Theis et al., 2018; Molchanov et al., 2017] is a principled channel pruning technique, whereby the saliency metric is an approximation of the change in error that would occur on the removal of  [...] the channel with the lowest ∆c value is pruned"  [p. 4] "we start with the original number of channels from the teacher model and benchmark latency for a single inference, removing a single channel at a time as illustrated in Figure 4" [p. 6] "We perform Fisher pruning in a similar manner. We fine tune, and every 100 steps, a single channel is pruned" See Algorithm 1).
	
	 Regarding claims 7 and 9-12, 7 claims 7 and 9-12 are substantially similar to claims 1 and 3-6.  Therefore, the rejections applied to claims 1 and 3-6 also apply to claims 7 and 9-12.
	
	 Regarding claims 13 and 15-18, claims 13 and 15-18 are directed towards the method performed by the processor of claims 1 and 2-6.  Therefore, the rejections applied to claims 1 and 2-6 also apply to claims 13 and 15-18.
	
	 Regarding claims 25 and 27-30, claims 25 and 27-30 are substantially similar to claims 1 and 3-6.  Therefore, the rejections applied to claims 1 and 3-6 also apply to claims 26 and 27-30.  Claims 25 and 27-30 also recite additional elements memory for storing network parameters for the one or more neural networks. (Turner [p. 5] "We evaluated our approach on the exemplar embedded CPU and GPU present on the Nvidia Jetson TX2").	

	Claims 2, 8, 14, and 26 are rejected under U.S.C. §103 as being unpatentable over the combination of Turner and Hemmat and Elkerdawy ("To Filter Prune, or to Layer Prune, That Is The Question", 2019).

	 Regarding claim 2, the combination of Turner and Hemmat teaches the processor of claim 1.
	However, the combination of Turner and Hemmat doesn't explicitly teach, wherein to obtain the one or more pre-generated performance impact values the one or more circuits are further to calculate an impact on performance for each of the one or more portions before determining the one or more portions to be removed according to a look-up table of pre-measured performance values for the specified hardware target.

	Elkerdawy, in the same field of endeavor, teaches the processor of claim 1, wherein to obtain the one or more pre-generated performance impact values the one or more circuits are further to calculate an impact on performance for each of the one or more portions before determining the one or more portions to be removed according to a look-up table of pre-measured performance values for the specified hardware target.([p. 5 §2] "A lookup table is built for latency prediction and then multiple candidates are generated at each pruning iteration by pruning a ratio of filters from each layer independently. The candidate with the highest accuracy is then selected").

	The combination of Turner and Hemmat as well as Elkerdawy are directed towards pruning neural networks.  Therefore, the combination of Turner and Hemmat as well as Elkerdawy are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Turner and Hemmat with the teachings of Elkerdawy by performing the accuracy importance score based pruning of Elkerdawy in addition to the step-wise pruning of The combination of Turner and Hemmat (for example in the iterative hardware and accuracy aware pruning step in The combination of Turner and Hemmat).  Elkerdawy provides as additional motivation for combination of LayerPrun ([p. 13 §4.3] "LayerPrune outperforms SSS on the same latency budget even when SSS supports block pruning for ResNet50, which shows the effectiveness of accuracy approximation as layer importance").  This motivation for combination also applies to the remaining claims which depend on this combination.

	Regarding claim 8, claim 8 is substantially similar to claim 2.  Therefore, the rejection applied to claim 2 also applies to claim 8.

	Regarding claim 14, claim 14 is directed towards the method performed by the processor of claim 2.  Therefore, the rejection applied to claim 2 also applies to claim 14. 
	
	Regarding claim 26, claim 26 is substantially similar to claim 2.  Therefore, the rejection applied to claim 2 also applies to claim 26.  Claim 26 also recites additional elements memory for storing network parameters for the one or more neural networks. (Turner [p. 5] "We evaluated our approach on the exemplar embedded CPU and GPU present on the Nvidia Jetson TX2").	

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SIDNEY VINCENT BOSTWICK/Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124

Read full office action

Prosecution Timeline

Aug 10, 2021

Application Filed

Sep 12, 2024

Non-Final Rejection — §103, §112

Mar 24, 2025

Response Filed

Apr 08, 2025

Final Rejection — §103, §112

Sep 15, 2025

Request for Continued Examination

Oct 05, 2025

Response after Non-Final Action

Oct 28, 2025

Non-Final Rejection — §103, §112

Jan 29, 2026

Response Filed

Mar 02, 2026

Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/373,021

Patent 12561604

SYSTEM AND METHOD FOR ITERATIVE DATA CLUSTERING USING MACHINE LEARNING

2y 5m to grant Granted Feb 24, 2026

18/486,534

Patent 12547878

Highly Efficient Convolutional Neural Networks

2y 5m to grant Granted Feb 10, 2026

16/902,547

Patent 12536426

Smooth Continuous Piecewise Constructed Activation Functions

2y 5m to grant Granted Jan 27, 2026

18/607,777

Patent 12518143

FEEDFORWARD GENERATIVE NEURAL NETWORKS

2y 5m to grant Granted Jan 06, 2026

16/940,293

Patent 12505340

STASH BALANCING IN MODEL PARALLELISM

2y 5m to grant Granted Dec 23, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

5-6

Expected OA Rounds

52%

Grant Probability

90%

With Interview (+38.2%)

4y 7m

Median Time to Grant

High

PTA Risk

Based on 136 resolved cases by this examiner. Grant probability derived from career allow rate.