Last updated: May 29, 2026

Application No. 18/354,093

METHOD AND SYSTEM FOR JOINTLY PRUNING AND HARDWARE ACCELERATION OF PRE-TRAINED DEEP LEARNING MODELS

Non-Final OA §103§112

Filed

Jul 18, 2023

Priority

Jul 29, 2022 — IN 202221043520

Examiner

HOOVER, BRENT JOHNSTON

Art Unit

2127

Tech Center

2100 — Computer Architecture & Software

Assignee

Tata Consultancy Services Limited

OA Round

1 (Non-Final)

Interview Optional

— +23.4% interview lift. Examiner has a relatively high allowance rate (83%); +23.4% interview lift. A written response may suffice.

Based on 363 resolved cases, 2023–2026

Examiner Intelligence

HOOVER, BRENT JOHNSTON View full profile →

Grants 83% — above average

Career Allowance Rate

300 granted / 363 resolved

+27.6% vs TC avg

Strong +23% interview lift

Without

With

+23.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 5m

Avg Prosecution

21 currently pending

Career history

387

Total Applications

across all art units

Statute-Specific Performance

§101

22.0%

-18.0% vs TC avg

§103

64.9%

+24.9% vs TC avg

§102

6.6%

-33.4% vs TC avg

§112

3.7%

-36.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 363 resolved cases

Office Action

§103 §112

DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the original application filed on 7/18/2023.  Acknowledgment is made with respect to a claim of priority to Indian Application IN202221043520 filed on 7/29/2022.

Specification

The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter.  See 37 CFR 1.75(d)(1) and MPEP § 608.01(o).  Correction of the following is required: Claim 9 and its dependents recite “[o]ne or more non-transitory machine-readable information storage mediums” (emphasis added). The originally filed specification fails to provide antecedent support for an “information storage medium”. It is suggested to amend claim 9 and its dependents to recite “A non-transitory computer readable medium” to reflect antecedent support in the originally filed specification. Appropriate correction is required.

Claim Objections

Claims 1-12 are objected to because of the following informalities: Claim 1 recites the limitation receiving from a user via one or more hardware processor” (emphasis added) which should read as “receiving from a user via one or more hardware processors” (emphasis added) so that there is proper pluralization for the limitation. Claim 1 further repeatedly uses the phrasing “comprising of” which should read as “comprising” for better grammatical clarity. Dependent claims 2-4 depend on objected claim 1, and are also objected to by virtue of this dependency. Independent claims 5 and 9 contain the same grammatical issue and are objected to for the same reasons as claim 1. Dependent claims 6-8 and 10-12 depend on objected claims 5 and 9, and are also objected to by virtue of these dependencies. Appropriate correction is required.

Claims 4, 8, and 12 further recite the limitation “obtaining from a layer splitter each layer of the DNN model associated with the pruned accelerated DNN model based on at least one of the user option” (emphasis added) which should read as “obtaining from a layer splitter each layer of the DNN model associated with the pruned accelerated DNN model based on at least one of the user options” (emphasis added) so that there is proper pluralization for the limitation.  Appropriate correction is required.

Claims 4, 8, and 12 further recite the limitation “a data transfer latency on the first layer of all the participating processor” (emphasis added) which should read as “a data transfer latency on the first layer of all the participating processors” (emphasis added) so that there is proper pluralization for the limitation.  Appropriate correction is required.

Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

	Claim 1 recites the limitation “receiving from a user via one or more hardware processor, a pruning request comprising of (i) a plurality of deep neural network (DNN) models, (ii) a plurality of hardware accelerators comprising of one or more processors, a plurality of target performance indicators comprising of a target accuracy, a target inference latency, a target model size, a target network sparsity, and a target energy, and (iii) a plurality of user options comprising of a first pruning search, and a secondary pruning search” (emphasis added). It is not clear how a received user request could plausibly comprise models and hardware accelerators. Please explain. For examination purposes, the limitation will be interpreted to mean that the request contains a selection of models, hardware accelerator arrangements, and user options. Appropriate correction is required.

Claim 1 further recites the limitation “wherein the first pruning search option executes a hardware pruning search technique, to perform search on each DNN model and each processor based on at least one of a performance indicator and an optimal pruning ratio” (emphasis added). First, the term “each processor” is not clear. It is not clear as to which processor this element refers to. It seems that the intended element is “each hardware accelerator”. Second, the term “optimal” in claim 1 is a relative term which renders the claim indefinite. The term “optimal” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. There is no ascertainable standard by which one determines an “optimal” pruning ratio versus some other pruning ratio. Please explain what is an optimal pruning ratio. For examination purposes, this limitation will be interpreted to mean “wherein the first pruning search option executes a hardware pruning search technique, to perform search on each DNN model and each [[processor]] hardware accelerator based on at least one of a performance indicator and [[an optimal]] a pruning ratio” (emphasis added). Appropriate correction is required.

Claim 1 further recites the limitation “wherein the second pruning search option executes an optimal pruning search technique, to perform search on each layer with corresponding pruning ratio” (emphasis added). The term “each layer” lacks antecedent basis. For examination purposes, this limitation will be interpreted to mean “wherein the second pruning search option executes an optimal pruning search technique, to perform search on [[each]] a layer corresponding to the plurality of DNN models with a corresponding pruning ratio” (emphasis added). Appropriate correction is required.

Claim 1 further recites the limitation “identifying via the one or more hardware processors, an optimal layer associated with the pruned hardware accelerated DNN model based on the user option” (emphasis added). The term “the pruned hardware accelerated DNN model” is not clear because the claim previously refers to a plurality of DNN models and hardware accelerators, so it would not logically follow that the claim only identifies an “optimal layer” for one of these pruned hardware accelerated DNN models. Second, it is not clear as to what “the user option” refers to in this limitation. The claim previously discloses a plurality of user options comprising a first pruning search and a second pruning search, so what does it mean to refer to the singular “pruning option”? Please explain. For examination purposes, this limitation will be interpreted to mean “identifying via the one or more hardware processors, an optimal layer associated with one of the plurality of the pruned hardware accelerated DNN models based on a [[the] selected user option of the plurality of user options” (emphasis added). Appropriate correction is required.

Dependent claims 2-4 depend on indefinite claim 1, and are also rejected under 35 USC § 112(b) by virtue of this dependency. Independent claims 5 and 9 contain the same indefiniteness issues and are rejected under 35 USC § 112(b) for the same reasons as claim 1. Dependent claims 6-8 and 10-12 depend on indefinite claims 5 and 9, and are also rejected under 35 USC § 112(b) by virtue of these dependencies. Appropriate correction is required.

Claim 2 recites the limitation “recording the revised pruning ratio when at least one of the first performance indicator value is nearest to the target performance indicator value and modifying the step size” (emphasis added). The term “nearest” in claim 2 is a relative term which renders the claim indefinite. The term “nearest” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. There is no ascertainable standard by which one determines a “nearest” value compared to the indicator value. Please explain. For examination purposes, the limitation will be interpreted to mean that a nearest value is based on a threshold. Appropriate correction is required.

Dependent claims 6 and 10 contain the same indefiniteness issues and are rejected under 35 USC § 112(b) for the same reasons as claim 2.

Claim 3 recites the limitation “(vi) the one or more accelerating elements” (emphasis added). The term “the one or more accelerating elements” lacks antecedent basis. For examination purposes, this limitation will be interpreted to mean “(vi) [[the]] one or more accelerating elements” (emphasis added). Appropriate correction is required.

Claim 3 further recites the limitation “creating, an individual element for each layer based on the pruning ratio associated with each layer of the DNN model,” (emphasis added). The term “the DNN model” is not clear because the claim previously refers to a plurality of DNN models and hardware accelerators, so it would not logically follow that the claim only identifies an “individual element” for one of these pruned DNN models. The same issue appears later in claim with the limitations “selecting the layers of the DNN model” and “the hardware accelerated layers of the DNN model”. For examination purposes, these limitations will be interpreted to mean one of a plurality of DNN models. Appropriate correction is required.

Claim 3 further recites the limitation “recording each individual entity with corresponding pruning ratios into a population batch” (emphasis added). The term “each individual entity” lacks antecedent basis. For examination purposes, this limitation will be interpreted to mean “recording [[each] individual [[entity]] entities with corresponding pruning ratios into a population batch” (emphasis added). Appropriate correction is required.

Claim 3 further recites the limitation “selecting a fittest individual element from the population batch using the fitness score function” (emphasis added). The term “fittest” in claim 3 is a relative term which renders the claim indefinite. The term “fittest” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. There is no ascertainable standard by which one determines a “fittest” element. Please explain. For examination purposes, the limitation will be interpreted to mean selecting an individual element based on a fitness score function. Appropriate correction is required.

Claim 3 further recites the limitation “updating the new individual element into the population batch and removing the least fit individual element from the population batch” (emphasis added). The term “least fit” in claim 3 is a relative term which renders the claim indefinite. The term “least fit” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. There is no ascertainable standard by which one determines a “least fit” element. Please explain. For examination purposes, the limitation will be interpreted to mean removing an individual element based on a threshold. Appropriate correction is required.

Dependent claims 7 and 11 contain the same indefiniteness issues and are rejected under 35 USC § 112(b) for the same reasons as claim 3.

Claim 4 recites the limitation “obtaining from a layer splitter each layer of the DNN model associated with the pruned accelerated DNN model based on at least one of the user option” (emphasis added). The term “the pruned accelerated DNN model” is not clear because the claim previously refers to a plurality of DNN models and hardware accelerators, so it would not logically follow that the claim only one of these pruned DNN models. For examination purposes, these limitations will be interpreted to mean one of a plurality of DNN models. Appropriate correction is required.

Claim 4 further recites the claim elements “the execution”, “the participating processor”, “the complete layer table”, indexing all “the layers”, and “the optimal schedule of each layer”.  These claim elements in claim 4 do not have proper antecedent basis. For examination purposes, these claim elements will be interpreted to mean “an execution”, “a participating processor”, “a complete layer table”, indexing all “layers”, and “an optimal schedule of each layer”.   Appropriate correction is required.

Dependent claims 8 and 12 contain the same indefiniteness issues and are rejected under 35 USC § 112(b) for the same reasons as claim 4.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5, and 9 are rejected under 35 U.S.C. § 103 as being obvious over He et al. (He et al., “AMC: AutoML for Model Compression and Acceleration on Mobile Devices”, Jan. 16, 2019, arXiv:1802.03494v4, pp. 1-17, hereinafter “He”) in view of Yang et al. (Yang et al., “NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications”, Sep. 28, 2018, arXiv:1804.03230v2, pp. 1-16, hereinafter “Yang”) and Kang et al. (Kang et al., “Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge”, Apr. 12, 2017, ASPLOS ’17, pp. 615-629, hereinafter “Kang”).

Regarding claim 1, He discloses [a] processor implemented method for jointly pruning and hardware acceleration of pre-trained deep learning models, the method comprising: (Abstract; “we propose AutoML for Model Compres sion (AMC) which leverage reinforcement learning to provide the model compression policy. This learning-based compression policy outperforms conventional rule-based compression policy by having higher compression ratio, better preserving the accuracy and freeing human labor”, which discloses a method for jointly pruning or compressing and accelerating the hardware of pre-trained Ml models; and §3.1; “The pruned weights are regular and can be accelerated directly with off-the-shelf hardware and libraries. Here we study structured pruning that shrink the input channel of each convolutional and fully connected layer”; and §4; the experiments section is inherently implemented using a processor)
receiving from a user via one or more hardware processor, a pruning request comprising of (i) a plurality of deep neural network (DNN) models, (ii) a plurality of hardware accelerators comprising of one or more processors, a plurality of target performance indicators comprising of a target accuracy, a target inference latency, a target model size, a target network sparsity, and a [[target energy,]] and (iii) a plurality of user options comprising of a first pruning search, and a secondary pruning search; (§4; the section discloses receiving requests or selections from the experimenter or user which include the DNN models, hardware accelerators, and performance indicators including accuracy, latency, model size and sparsity, as well as different pruning search strategies based on the selected pruning policy; and Algorithm 1; the algorithm discloses receiving performance indicators such as model size and sparsity, as well as receiving a pruning request by initializing the DNN models and accelerator; and Page 4, ¶1; “AMC engine optimizes for both accuracy and latency)
transforming the plurality of DNN models and the plurality of hardware accelerators, via the one or more hardware processors, into a plurality of pruned hardware accelerated DNN models based on at least one of the user options, (Algorithm 1; the algorithm discloses the fine-grained pruning technique that results in a transformed DNN model and accelerator based on user options and constraints; and §3.2 and 3.3; and §4.2; “Fine-grained pruning method prunes neural networks based on individual connections to achieve sparsity in both weights and activations, which is able to achieve higher compression ratio and can be accelerated with specialized hardware”)
wherein the first pruning search option executes a hardware pruning search technique, to perform search on each DNN model and each processor based on at least one of a performance indicator and an optimal pruning ratio, and (Algorithm 1; the algorithm discloses the fine-grained pruning technique that results in a transformed DNN model and accelerator based on user options and constraints such as a performance indicator like accuracy and a sparsity or optimal pruning ratio; and §3.2 and 3.3; and §4.2; “Fine-grained pruning method prunes neural networks based on individual connections to achieve sparsity in both weights and activations, which is able to achieve higher compression ratio and can be accelerated with specialized hardware”)
wherein the second pruning search option executes an optimal pruning search technique, to perform search on each layer with corresponding pruning ratio; (Algorithm 1; the algorithm discloses the fine-grained pruning technique that searches layers with a pruning or sparsity ratio; and §3.2 and 3.3; and §4.2; “Fine-grained pruning method prunes neural networks based on individual connections to achieve sparsity in both weights and activations, which is able to achieve higher compression ratio and can be accelerated with specialized hardware”)
identifying via the one or more hardware processors, an optimal layer associated with the pruned hardware accelerated DNN model based on the user option; and (Algorithm 1; the algorithm discloses identifying an optimal layer associated with a pruned DNN model based on user-defined constraints or parameters; and §3.2 and 3.3; and §4.2; “Fine-grained pruning method prunes neural networks based on individual connections to achieve sparsity in both weights and activations, which is able to achieve higher compression ratio and can be accelerated with specialized hardware”; and Figures 2 and 4).
He fails to explicitly disclose but Yang discloses target performance indicators comprising of … target energy (§3.1; “Resj(·) evaluates the direct metric for resource con sumption of the jth resource, and Budj is the budget of the jth resource and the constraint on the optimization. The resource can be latency, energy, memory footprint, etc., or a combination of these metrics”, which discloses a performance indicator thjat considers target energy for DNN pruning; and §5; “NetAdapt can incorporate direct metrics, such as latency and energy, into the optimization to maximize the adaptation performance based on the characteristics of the platform”).
He and Yang are analogous art because both are concerned with DNN pruning based on performance indicators. Before the effective filing date of the claimed invention, it would have been obvious to one skilled DNN pruning to combine the target energy performance indicator of Yang and the method of He to yield to the predictable result of a plurality of target performance indicators comprising of a target accuracy, a target inference latency, a target model size, a target network sparsity, and a target energy. The motivation for doing so would be to adapt a pretrained network to a mobile platform given a real resource budget (Yang; §5).
He fails to explicitly disclose but Kang discloses creating by using a layer assignment sequence technique, via the one or more hardware processors, a static load distributor by partitioning the optimal layer of the DNN model into a plurality of layer sequences and assigning each layer sequence to corresponding processing element of hardware accelerators (Abstract; “We find that given the characteristics of DNN algorithms, a fine-grained, layer-level computation partitioning strategy based on the data and computation variations of each layer within a DNN has significant latency and energy advantages over the status quo approach. Using this insight, we design Neurosurgeon, a light weight scheduler to automatically partition DNN computation between mobile devices and datacenters at the granularity of neural network layers. …  It adapts to various DNN architectures, hardware platforms, wireless networks, and server load levels, intelligently partitioning computation for best latency or best mobile energy”, which discloses creating a load distributor or scheduler by partitioning layers in a DNN into layer sequences and assigning the layer processing to different processing elements of hardware accelerators such as mobile devices and datacenters and hardware platforms and servers; and §4.3; and Figure 10).
He, Yang, and Kang are analogous art because all are concerned with DNN optimization based on performance indicators. Before the effective filing date of the claimed invention, it would have been obvious to one skilled DNN optimization to combine the static load distributor of Kang and the method of He and Yang to yield to the predictable result of creating by using a layer assignment sequence technique, via the one or more hardware processors, a static load distributor by partitioning the optimal layer of the DNN model into a plurality of layer sequences and assigning each layer sequence to corresponding processing element of hardware accelerators. The motivation for doing so would be to adapt to various DNN architectures, hardware platforms, wireless connections, and server load levels, and choose the partition point for best latency and best mobile energy consumption (Kang; §8. Conclusion).
Regarding claim 5, it is a system claim corresponding to the steps of claim 1, and is rejected for the same reasons as claim 1.

Regarding claim 9, it is a non-transitory machine-readable information storage medium claim corresponding to the steps of claim 1, and is rejected for the same reasons as claim 1.

Conclusion

Claims 2-4, 6-8, and 10-12 have been searched but no prior art was uncovered.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brent Hoover whose telephone number is (303)297-4403. The examiner can normally be reached Monday - Friday 9-5 MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached at 571-270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BRENT JOHNSTON HOOVER/Primary Examiner, Art Unit 2127

Read full office action

Prosecution Timeline

Jul 18, 2023

Application Filed

Apr 09, 2026

Non-Final Rejection mailed — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/962,729

Patent 12639637

SYSTEM AND METHOD OF TRAINING MACHINE-LEARNING-BASED MODEL

3y 7m to grant Granted May 26, 2026

16/949,359

Patent 12632772

METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT FOR IMPROVING INTERPRETABILITY OF SOFTWARE BLACK-BOX MACHINE LEARNING MODEL OUTPUTS

5y 6m to grant Granted May 19, 2026

18/659,042

Patent 12632732

METHOD AND APPARATUS FOR MULTI-LABEL CLASS CLASSIFICATION BASED ON COARSE-TO-FINE CONVOLUTIONAL NEURAL NETWORK

2y 0m to grant Granted May 19, 2026

17/090,071

Patent 12626135

DYNAMICALLY DIVIDING ACTIVATIONS AND KERNELS FOR IMPROVING MEMORY EFFICIENCY

5y 6m to grant Granted May 12, 2026

17/492,460

Patent 12626125

SYNTHESIZING A SINGULAR ENSEMBLE MACHINE LEARNING MODEL FROM AN ENSEMBLE OF MODELS

4y 7m to grant Granted May 12, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

83%

Grant Probability

99%

With Interview (+23.4%)

3y 5m (~6m remaining)

Median Time to Grant

Low

PTA Risk

Based on 363 resolved cases by this examiner. Grant probability derived from career allowance rate.