Last updated: May 29, 2026

Application No. 18/325,379

COMPUTER-IMPLEMENTED TECHNOLOGIES FOR TRAINING AND COMPRESSING A DEEP NEURAL NETWORK

Non-Final OA §103

Filed

May 30, 2023

Examiner

ROY, SANCHITA

Art Unit

2146

Tech Center

2100 — Computer Architecture & Software

Assignee

Microsoft Technology Licensing, LLC

OA Round

1 (Non-Final)

Interview Optional

— +46.3% interview lift. Examiner has a relatively high allowance rate (72%); +46.3% interview lift. A written response may suffice.

Based on 318 resolved cases, 2023–2026

Examiner Intelligence

ROY, SANCHITA View full profile →

Grants 72% — above average

Career Allowance Rate

229 granted / 318 resolved

+17.0% vs TC avg

Strong +46% interview lift

Without

With

+46.3%

Interview Lift

resolved cases with interview

Typical timeline

3y 2m

Avg Prosecution

16 currently pending

Career history

340

Total Applications

across all art units

Statute-Specific Performance

§101

1.1%

-38.9% vs TC avg

§103

82.5%

+42.5% vs TC avg

§102

3.1%

-36.9% vs TC avg

§112

7.1%

-32.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 318 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.

 Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

 Claims 1-3, 18-20, are rejected under 35 U.S.C. 103 as being unpatentable over Xie (US 20190370658 A1), in view of Fan (US 20240232686 A1) and Fusi (US 20220108168 A1).

Regarding claim 1, Xie teaches a computing system comprising: a processor; and memory storing instructions that, when executed by the processor, cause the processor to perform acts comprising (Xie [6] device processor executes code stored in memory to perform operations): 
obtaining ... computer-implemented model that is to be ... compressed, wherein the ... computer-implemented model includes an operator that comprises a structure (Xie [12, 14, 18] input components may include deep neural network, deep neural network may have batch normalization layers for functions such as scaling and adding bias to features (operator with structure)); 
obtaining training data that is to be employed to train the ... computer-implemented model (Xie [14] input can include training and validation dataset); and 
based upon ... request ... compressing the ... computer-implemented model to generate a trained and compressed computer-implemented model, wherein the trained and compressed computer-implemented model fails to include the structure (Xie [14, 18, 21] end-user customized performance metric (request) may be received as input, batch normalization layer(s) may be removed (not included) from model to generate compressed model that is trained).  

Xie does not specifically teach an untrained computer-implemented model that is to be trained; receiving a request from a user to train and compress the untrained computer-implemented model; 
However Fan teaches receiving a request from a user to train and compress the ... computer-implemented model; based upon the request and the training data, and without further input from the user, training and compressing the ... computer-implemented model to generate a trained and compressed computer-implemented model (Fan [6, 41, 73, 75, 77] user may provide data descriptive of a selection of one or more candidate compression schemes (request) for compressing model, model may be compressed based on request and trained based on training data).
It would have been obvious to one of an ordinary skill in the art before the effective filing date of the claimed invention, to have incorporated the concept taught by Fan of receiving a request from a user to train and compress the ... computer-implemented model; based upon the request and the training data, and without further input from the user, training and compressing the ... computer-implemented model to generate a trained and compressed computer-implemented model, into the invention suggested by Xie; since both inventions are directed towards generating compressed computer-implemented models, and incorporating the teaching of Fan into the invention suggested by Xie would provide the added advantage of allowing data descriptive of a selection of one or more candidate compression schemes to be used in compressing a model- to be provided by a user, and the combination would perform with a reasonable expectation of success (Fan [6, 41, 73, 75, 77]).

Xie and Fan does not specifically teach an untrained computer-implemented model that is to be trained.
However Fusi teaches an untrained computer-implemented model that is to be trained [Fusi [28, 29] model to be compressed may be untrained, Fusi [16, 17, 37] compressing a model before training is useful when model may be in a form that renders it difficult or impossible to train, as the computational requirements associated with the model may be greater than the computational resources that are available); 
obtaining training data that is to be employed to train the untrained computer-implemented model (Fusi [32] training data may be received);
based upon ...a... request and the training data, ... training and compressing the untrained computer-implemented model to generate a trained and compressed computer-implemented model [Fusi [41, 44] based on compression indication (request) and training data, untrained model is compressed and compressed model is trained).
It would have been obvious to one of an ordinary skill in the art before the effective filing date of the claimed invention, to have incorporated the concept taught by Fusi of an untrained computer-implemented model that is to be trained; obtaining training data that is to be employed to train the untrained computer-implemented model; based upon ...a... request and the training data, ... training and compressing the untrained computer-implemented model to generate a trained and compressed computer-implemented model, into the invention suggested by Xie and Fan; since both inventions are directed towards generating compressed computer-implemented models, and incorporating the teaching of Fusi into the invention suggested by Xie and Fan would provide the added advantage of being useful when model may be in a form that renders it difficult or impossible to train, as the computational requirements associated with the model may be greater than the computational resources that are available, and the combination would perform with a reasonable expectation of success (Fusi [28, 29, 32, 41, 44]).

 Regarding claim 2, Xie, Fan and Fusi teach the invention as claimed in claim 1 above. 
Xie further teaches wherein the operator is ... a batch normalization function ... (Xie[17] operator whose structure is removed for compression, may be a batch normalization function).

 Regarding claim 3, Xie, Fan and Fusi teach the invention as claimed in claim 1 above. 
Xie further teaches wherein training and compressing the computer-implemented model comprises: identifying the structure as a removable structure, the removable structure being removable from the computer-implemented model such that the computer-implemented model generates valid output when the removable structure is removed from the computer-implemented model (Xie[21, 24] resulting trained and compressed is validated after structure is excluded).
 
Claim 18 is directed towards a medium storing instructions similar in scope to the instructions executed by the system of claim 1, and is rejected under the same rationale. Xie further teaches a computer-readable storage medium comprising instructions that, when executed by a processor, cause the processor to perform acts (Xie [6] device processor executes code stored in memory to perform operations for a model).
Claim(s) 19, 20, is/are dependent on claim 18 above, is/are directed towards a medium storing instructions similar in scope to the instructions executed by the system of claim(s) 2, 3respectively, and is/are rejected under the same rationale.


Claims 4-17, are rejected under 35 U.S.C. 103 as being unpatentable over Xie (US 20190370658 A1), in view of Fan (US 20240232686 A1) and Fusi (US 20220108168 A1), and further in view of Miret (US 20220092425 A1).

 Regarding claim 4, Xie, Fan and Fusi teach the invention as claimed in claim 1 above. 
Xie does not specifically teach wherein identifying the structure as the removable structure comprises: constructing a trace graph of the computer-implemented model, where the trace graph comprises: vertices that represent operators in the computer-implemented model, where the vertices comprise a first vertex that represents the operator and a second vertex that represents a second operator; and edges that represent connections between the operators, where the edges comprise an edge between the operator and the second operator; assigning a category from amongst several potential categories to the first vertex, where the category is assigned to the first vertex based upon a parameter of the operator, and further wherein the structure is identified as the removable structure based upon the category assigned to the first vertex.
However Miret teaches wherein identifying the structure as the removable structure comprises: constructing a trace graph of the computer-implemented model, where the trace graph comprises: vertices that represent operators in the computer-implemented model, where the vertices comprise a first vertex that represents the operator and a second vertex that represents a second operator; and edges that represent connections between the operators, where the edges comprise an edge between the operator and the second operator (Miret [17, 54, 65] filters to be pruned (structures to be removed) from a neural network for compression are determined from graph generated from model (construction), graph is generated may have nodes (vertices) for activations or layers (operator) and edges between nodes may be connections or dependencies); 
assigning a category from amongst several potential categories to the first vertex, where the category is assigned to the first vertex based upon a parameter of the operator, and further wherein the structure is identified as the removable structure based upon the category assigned to the first vertex (Miret [63, 65, 66, 76] nodes may be assigned to groups and or features (category), based on group characteristics- nodes may be identified for pruning (removal) for compressing models, Miret [17, 18] using a graph and group-wise pruning improves efficiency of pruning). 
It would have been obvious to one of an ordinary skill in the art before the effective filing date of the claimed invention, to have incorporated the concept taught by Miret of wherein identifying the structure as the removable structure comprises: constructing a trace graph of the computer-implemented model, where the trace graph comprises: vertices that represent operators in the computer-implemented model, where the vertices comprise a first vertex that represents the operator and a second vertex that represents a second operator; and edges that represent connections between the operators, where the edges comprise an edge between the operator and the second operator; assigning a category from amongst several potential categories to the first vertex, where the category is assigned to the first vertex based upon a parameter of the operator, and further wherein the structure is identified as the removable structure based upon the category assigned to the first vertex, into the invention suggested by Xie, Fan and Fusi; since both inventions are directed towards generating compressed computer-implemented models, and incorporating the teaching of Miret into the invention suggested by Xie, Fan and Fusi would provide the added advantage of improving efficiency of pruning for compressing models, and the combination would perform with a reasonable expectation of success (Miret [17, 54, 65, 63, 65, 66, 76, 17, 18]).

 Regarding claim 5, Xie, Fan, Fusi and Miret teach the invention as claimed in claim 4 above. Xie does not specifically teach wherein identifying the structure as the removable structure comprises identifying the structure as being the removable structure based upon the category assigned to the first vertex, where the removable structure belongs to a class of minimal structures that are able to be removed from the computer-implemented model without impacting output of the computer-implemented model when parameters of the minimal structures are zero
However Miret teaches wherein identifying the structure as the removable structure comprises identifying the structure as being the removable structure based upon the category assigned to the first vertex, where the removable structure belongs to a class of minimal structures that are able to be removed from the computer-implemented model without impacting output of the computer-implemented model when parameters of the minimal structures are zero (Miret [63, 65, 66, 76] nodes may be assigned to groups and or features (category), based on group characteristics- nodes may be identified for pruning (removal) for compressing models, Miret [17, 18] using a graph and group-wise pruning improves efficiency of pruning, Miret [55, 57, 58] nodes to be pruned are selected so that they do not impact model output (negligible or acceptable change in accuracy, precision and/or recall).

 Regarding claim 6, Xie, Fan, Fusi and Miret teach the invention as claimed in claim 4 above. Xie does not specifically teach where training and compressing the computer-implemented model to generate the trained and compressed computer-implemented model further comprises: identifying the removable structure as being redundant with another removable structure in the computer-implemented model.
However Miret teaches where training and compressing the computer-implemented model to generate the trained and compressed computer-implemented model further comprises: identifying the removable structure as being redundant with another removable structure in the computer-implemented model (Miret [66] based on determining a node is dependent on second node (and therefore redundant if the second node is removed)- they may be grouped together so that removing the second node will also remove the first node too).

 Regarding claim 7, Xie, Fan, Fusi and Miret teach the invention as claimed in claim 4 above. Xie does not specifically teach removing the removable structure from the computer-implemented model based upon the removable structure being identified as being redundant with the another removable structure
However Miret teaches removing the removable structure from the computer-implemented model based upon the removable structure being identified as being redundant with the another removable structure (Miret [66] based on determining a node is dependent on second node (and therefore redundant if the second node is removed)- they may be grouped together so that removing the second node will also remove the first node too).


 Claim 8 is directed towards a method performing instructions similar in scope to the instructions executed by the system of claim 4, and is rejected under the same rationale. 
Xie further teaches a method performed by a computing system, ... wherein the computer-implemented model is a ... computer-implemented deep neural network (DNN) (Xie [6, 3, 12] device processor executes code stored in memory to perform operations for a model which can be a deep neural network).

 Regarding claim 9, Xie, Fan, Fusi and Miret teach the invention as claimed in claim 8 above. Xie does not specifically teach using the DNN to generate output without fine-tuning the DNN.
However Miret teaches using the DNN to generate output without fine-tuning the DNN (Miret [56, 57] DNN fine-tuning does not necessarily need to be performed, compressed DNN output may be generated).

 Regarding claim 10, Xie, Fan, Fusi and Miret teach the invention as claimed in claim 8 above. 
Xie further teaches wherein the structure corresponds to at least one of an activation operator, a convolution operator, a batch normalization operator (Xie[17] operator whose structure is removed for compression, may be a batch normalization function).

 Regarding claim 11, Xie, Fan, Fusi and Miret teach the invention as claimed in claim 8 above. Xie does not specifically teach wherein the graph comprises: vertices that represent operators in the computer-implemented model, where the vertices comprise a first vertex that represents a first operator and a second vertex that represents a second operator, and further where the first operator comprises the structure; and edges that represent connections between the operators, where the edges comprise an edge between the first operator and the second operator; wherein training and compressing the untrained computer-implemented DNN to generate a trained and compressed DNN comprises assigning a category from amongst several potential categories to the first vertex, where the category is assigned to the first vertex based upon a parameter of the first operator, and further where the structure is identified as being removable from the computer-implemented DNN based upon the category assigned to the first vertex
However Miret teaches wherein the graph comprises: vertices that represent operators in the computer-implemented model, where the vertices comprise a first vertex that represents a first operator and a second vertex that represents a second operator, and further where the first operator comprises the structure; and edges that represent connections between the operators, where the edges comprise an edge between the first operator and the second operator(Miret [17, 54, 65] filters to be pruned (structures to be removed) from a neural network for compression are determined from graph generated from model (construction), graph is generated may have nodes (vertices) for activations or layers (operator) and edges between nodes may be connections or dependencies); 
wherein training and compressing the untrained computer-implemented DNN to generate a trained and compressed DNN comprises assigning a category from amongst several potential categories to the first vertex, where the category is assigned to the first vertex based upon a parameter of the first operator, and further where the structure is identified as being removable from the computer-implemented DNN based upon the category assigned to the first vertex (Miret [56, 63, 65, 66, 76] nodes may be assigned to groups and or features (category), based on group characteristics- nodes may be identified for pruning (removal) for compressing models which may be trained, Miret [17, 18] using a graph and group-wise pruning improves efficiency of pruning).

 Regarding claim 12, Xie, Fan, Fusi and Miret teach the invention as claimed in claim 11 above. Xie does not specifically teach identifying the structure as removable from the computer-implemented DNN based upon the category assigned to the first vertex, where the structure belongs to a class of minimal structures that are removable from the computer-implemented DNN without impacting output of the computer-implemented DNN when parameters of the minimal structures are zero
However Miret teaches identifying the structure as removable from the computer-implemented DNN based upon the category assigned to the first vertex, where the structure belongs to a class of minimal structures that are removable from the computer-implemented DNN without impacting output of the computer-implemented DNN when parameters of the minimal structures are zero (Miret [63, 65, 66, 76] nodes may be assigned to groups and or features (category), based on group characteristics- nodes may be identified for pruning (removal) for compressing models, Miret [17, 18] using a graph and group-wise pruning improves efficiency of pruning, Miret [55, 57, 58] nodes to be pruned are selected so that they do not impact model output (negligible or acceptable change in accuracy, precision and/or recall).

 Regarding claim 13, Xie, Fan, Fusi and Miret teach the invention as claimed in claim 12 above. Xie does not specifically teach subsequent to identifying the structure as being removable from the computer-implemented DNN, identifying the structure as being redundant with another structure in the computer-implemented DNN
However Miret teaches subsequent to identifying the structure as being removable from the computer-implemented DNN, identifying the structure as being redundant with another structure in the computer-implemented DNN (Miret [66] based on determining a node is dependent on second node (and therefore redundant if the second node is removed)- they may be grouped together so that removing the second node will also remove the first node too).

 Regarding claim 14, Xie, Fan, Fusi and Miret teach the invention as claimed in claim 13 above. Xie does not specifically teach removing the structure from the computer-implemented DNN based upon the structure being identified as being redundant with the another structure in the computer-implemented DNN
However Miret teaches removing the structure from the computer-implemented DNN based upon the structure being identified as being redundant with the another structure in the computer-implemented DNN (Miret [66] based on determining a node is dependent on second node (and therefore redundant if the second node is removed)- they may be grouped together so that removing the second node will also remove the first node too).

 Regarding claim 15, Xie, Fan, Fusi and Miret teach the invention as claimed in claim 8 above. Xie does not specifically teach where the DNN is a convolutional neural network
However Miret teaches where the DNN is a convolutional neural network (Miret [28] DNN can be a convolutional neural network).

 Regarding claim 16, Xie, Fan, Fusi and Miret teach the invention as claimed in claim 8 above. Xie does not specifically teach wherein the untrained computer-implemented DNN, when trained, consumes a first amount of computer-readable memory, the trained and compressed computer-implemented DNN consumes a second amount of computer-readable memory, and further where the second amount of computer-readable memory is less than the first amount of computer-readable memory
However Miret teaches wherein the untrained computer-implemented DNN, when trained, consumes a first amount of computer-readable memory, the trained and compressed computer-implemented DNN consumes a second amount of computer-readable memory, and further where the second amount of computer-readable memory is less than the first amount of computer-readable memory (Merit [40] compression reduces resources (including memory) needed for training and storing trained model, reduction may be based on pruning ratio, so trained model after compression uses less memory than trained model before compression).

 Regarding claim 17, Xie, Fan, Fusi and Miret teach the invention as claimed in claim 16 above. Xie does not specifically teach where the second amount of computer-readable memory is between 20% and 50% less than the first amount of computer-readable memory
However Miret teaches where the second amount of computer-readable memory is between 20% and 50% less than the first amount of computer-readable memory (Merit [40] compression reduces resources (including memory) needed for training and storing trained model, reduction may be based on pruning ratio, pruning ratio can be 20%).

 Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SANCHITA ROY whose telephone number is (571)272-5310. The examiner can normally be reached Monday-Friday 12-8.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached at (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SANCHITA ROY
Primary Examiner
Art Unit 2146



/SANCHITA ROY/Primary Examiner, Art Unit 2146

Read full office action

Prosecution Timeline

May 30, 2023

Application Filed

Apr 13, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/584,161

Patent 12608644

GENERATING A CONFIGURATION PORTFOLIO INCLUDING A SET OF MODEL CONFIGURATIONS

4y 2m to grant Granted Apr 21, 2026

17/659,794

Patent 12599476

AI-BASED VIDEO ANALYSIS OF CATARACT SURGERY FOR DYNAMIC ANOMALY RECOGNITION AND CORRECTION

3y 12m to grant Granted Apr 14, 2026

17/745,617

Patent 12585966

INTELLIGENT DEVICE SELECTION USING HISTORICAL INTERACTIONS

3y 10m to grant Granted Mar 24, 2026

18/630,507

Patent 12585870

READER MODE-OPTIMIZED ATTENTION APPLICATION

1y 11m to grant Granted Mar 24, 2026

17/671,406

Patent 12579656

MACHINE LEARNING DENTAL SEGMENTATION SYSTEM AND METHODS USING GRAPH-BASED APPROACHES

4y 1m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

72%

Grant Probability

99%

With Interview (+46.3%)

3y 2m (~2m remaining)

Median Time to Grant

Low

PTA Risk

Based on 318 resolved cases by this examiner. Grant probability derived from career allowance rate.