Prosecution Insights
Last updated: April 19, 2026
Application No. 17/680,630

NEURAL NETWORK MODEL COMPRESSION METHOD AND APPARATUS, STORAGE MEDIUM, AND CHIP

Non-Final OA §103§112
Filed
Feb 25, 2022
Examiner
SAX, STEVEN PAUL
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
Huawei Technologies Co., Ltd.
OA Round
3 (Non-Final)
70%
Grant Probability
Favorable
3-4
OA Rounds
4y 0m
To Grant
99%
With Interview

Examiner Intelligence

Grants 70% — above average
70%
Career Allow Rate
320 granted / 460 resolved
+14.6% vs TC avg
Strong +45% interview lift
Without
With
+44.8%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
20 currently pending
Career history
480
Total Applications
across all art units

Statute-Specific Performance

§101
10.4%
-29.6% vs TC avg
§103
62.5%
+22.5% vs TC avg
§102
6.7%
-33.3% vs TC avg
§112
5.5%
-34.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 460 resolved cases

Office Action

§103 §112
Detailed Action Notice of Pre-AIA or AIA Status 1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 2. The RCE (Request for Continued Examination) filed 2/10/26 has been entered. Accordingly, the amendment filed 1/26/26 has been entered. 3. Claims 1-20 are pending. Claim Objections 4. Applicant is advised that should claim 19 be found allowable, claim 20 will be objected to under 37 CFR 1.75 as being a substantial duplicate thereof. Claim 20 with dependency on claim 17 covers the same exact limitations as claim 19 with dependency on claim 17. When two claims in an application are duplicates or else are so close in content that they both cover the same thing, despite a slight difference in wording, it is proper after allowing one claim to object to the other as being a substantial duplicate of the allowed claim. See MPEP § 608.01(m). Claim Rejections - 35 USC § 112 5. The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. Claims 1, 4, 9, 12, 17, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention: Claim 1 recites the limitation "the proportion information" in line 15. There is insufficient antecedent basis for this limitation in the claim. Claim 4 recites the limitation "the proportion information" in line 3. There is insufficient antecedent basis for this limitation in the claim. Note the dependency for claim 4 has changed. Claim 4 recites the limitation "the second feature" in line 4. There is insufficient antecedent basis for this limitation in the claim. Note the dependency for claim 4 has changed. Claim 9 recites the limitation "the proportion information" in line 18. There is insufficient antecedent basis for this limitation in the claim. Claim 12 recites the limitation "the proportion information" in line 3. There is insufficient antecedent basis for this limitation in the claim. Note the dependency for claim 12 has changed. Claim 12 recites the limitation "the second feature" in line 4. There is insufficient antecedent basis for this limitation in the claim. Note the dependency for claim 12 has changed. Claim 17 recites the limitation "the proportion information" in line 17. There is insufficient antecedent basis for this limitation in the claim. Claim 20 recites the limitation "the proportion information" in line 3. There is insufficient antecedent basis for this limitation in the claim. Note the dependency for claim 20 has changed. Claim Rejections - 35 USC § 103 6. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 7. Claim(s) 1-6, 9-14, and 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Oh et al “Oh” (KR 20200052444 A) and Widerhorn et al “Widerhorn” (US 10552735 B1) and Wshah et al “Wshah” (US 21070140253 A1). (Please see the previously attached copy of Oh and the attached copy of Wang that number paragraphs in the same format as that used in this Action) 8. Regarding claim 1, Oh shows a neural network model compression (para 156 shows neural network model compression) method, comprising: obtaining a first neural network model and training data of the first neural network model that are uploaded by user equipment (para 120, 142, 150, 152, 161 show obtaining the first/teacher neural network model as well as data derived and used for training it); obtaining positive unlabeled data based on the training data of the first neural network model and unlabeled data stored in the server (para 151 shows that the teacher model has the obtained classifier in that it classifies objects included in the input data into preset classes, and para 145, 152, 158 show that it classifies positive/certainty unlabeled data); and training a second neural network model by using a knowledge distillation (KD) method based on the training data, wherein the first neural network model is used as a teacher network model of the KD method and the second neural network model is used as a student network model of the KD method (para 137, 143-145 show training a second neural network model using a knowledge distillation method based on the training data; indeed the first neural network model is used as a teacher network model of the KD method and the second neural network model is used as a student network model of the KD method). Oh para show the training data used to train the second neural network model is from the unlabeled data and has a property and distribution similar to a property and distribution of the training data of the first neural network model (para 142, 155, 158, 160 show data taken from the inputted unlabeled data is used to train the second/student neural network model. It has the same kind of data and distribution based on the prediction result when training the first/teacher neural network model), but Oh does not explicitly show obtaining a positive-unlabeled (PU) classifier by using a PU learning algorithm based on the training data of the first neural network model and unlabeled data stored in the server, and that the training data used to train the second model is extended data being positive sample data determined by the PU classifier and selected from the unlabeled data by using the (PU) classifier per se. Widerhorn however does show obtaining a positive-unlabeled (PU) classifier by using a PU learning algorithm based on the training data of the first neural network model and unlabeled data stored in the server (para 145-147 and 149 show building the classifier using classification techniques that classify the unlabeled data into positive and negative data, and extracting the negative data to obtain a positive data set), and that the training data used to train the second model is extended data being positive sample data determined by the PU classifier and selected from the unlabeled data by using the (PU) classifier (para 147 shows using the classifier to extract the negative data to obtain a positive data set from the unlabeled data, and para 148, 151-153, 172 show using the positive sample data set as further data to train another model). It would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention to obtain a positive-unlabeled classifier by using a PU learning algorithm based on the training data of the first neural network model and unlabeled data stored in the server, and use it to select from the unlabeled data extended data which is positive sample data determined by the PU classifier, to train the second model in Oh, because it would provide an efficient way to obtain data from the unlabeled data that would be useful to train a student neural network model from a teacher neural network model using knowledge distillation. Oh para 148 shows that layers of the neural network may be based on features, and Oh para 120, 155, 200, 245 show the classifying (and thus the PU classifier when combined with Widerhorn) is obtained based on proportion information of new/processed labeled positive training data to unlabeled data, and Widerhorn para 63, 74, 76 furthermore show the PU classifier is obtained based on a first feature, but Oh and Widerhorn do not explicitly show the first feature is obtained based on a fusion of plurality of third features and the plurality of third features are in a one-to-one correspondence with a plurality of layers of the first neural network. Wshah however does show a first feature is obtained based on a fusion of plurality of third features and the plurality of third features are in a one-to-one correspondence with a plurality of layers of the first neural network (para 42-43, 45, 53 show multi-layer fusion in a convolutional neural network where features extracted each from different multiple depths/layers are concatenated/fused to form a combined representation used for classification). It would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention to obtain the first feature this way as shown in Wshah, in the method of Oh, especially as modified by Widerhorn, because it would provide an efficient and useful way to construct a feature used to obtain the classifier. Oh para 163, 166, 169 show standard ways of storing, retrieving, and processing data in a computer system and network, but does not explicitly say the various computer implemented steps (such as the obtaining, selecting, and training steps) are performed by a server per se. Widerhorn para 166 however shows a server performing various computer implemented steps, such as obtaining and selecting data, and training a neural network, for efficient use of standard computer technology to perform computer implemented steps. It would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention to use the server in Oh for the obtaining, selecting, and training and other steps, because it would provide a convenient and efficient way to perform those computer implemented steps in a computer system that stores, retrieves, and processes data. Given this combination, the unlabeled data would thus have been stored in the server and then retrieved accordingly. 9. Regarding claim 2, in addition to that mentioned for claim 1, Oh as modified by Widerhorn shows obtaining the PU classifier by using the PU learning algorithm based on the training data of the first neural network, and Oh further shows using the unlabeled data and proportion information used to indicate a proportion of processed training data/extended data to the unlabeled data (Oh para 140, 151, 162 show the classifying is obtained based on the training data of the first neural network and the unlabeled data, Oh para 120, 155, 200, 245 show then it is further based on the unlabeled data and proportion information of new/processed labeled positive training data to unlabeled data – this newly processed training data would be the extended data in view of Widerhorn para 147-148, 151-153, as explained for claim 1 from which claim 2 depends). Oh show the loss function of the PU learning algorithm is an expectation of a training loss of the training data of the first neural network and the unlabeled data (Oh para 137-142 show the loss calculation is a prediction/expectation of a training loss/uncertainty/instability of the training data of the first neural network and the unlabeled data), and the proportion information is used to calculate the expectation (Oh para 138, 142-143 show the prediction/expectation is based on comparing amounts of processed labeled positive/certain training data, uncertain data, to unlabeled data – the processed labeled positive training data from the unlabeled data would be the extended data in view of Widerhorn para 147-148, 151-153 as explained for claim 1, from which claim 2 depends). Oh para 163, 166, 169 show standard ways of storing, retrieving, and processing data in a computer system and network, but does not explicitly say the various computer implemented steps (such as the obtaining step) is performed by a server per se. Widerhorn para 166 however shows a server performing various computer implemented steps, such as obtaining data and software such as a classifier, for efficient use of standard computer technology to perform computer implemented steps. It would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention to use the server in Oh for the obtaining and other steps, because it would provide a convenient and efficient way to perform those computer implemented steps in a computer system that stores, retrieves, and processes data. Given this combination, the unlabeled data would thus have been stored in the server and then retrieved accordingly. 10. Regarding claim 3, in addition to that mentioned for claim 1, the plurality of third features are obtained by performing feature extraction by using the first neural network model on the training data of the first neural network and the unlabeled data stored in the server (Widerhorn para 32-33, 60, 65, 88, 90 show feature extraction by the neural network on training data and unlabeled data [stored in the server as shown in para 163, 166, 169]); and further comprising performing, by the server by using the first neural network model, feature extraction on the unlabeled data stored in the server, to obtain a second feature (Widerhorn para 147 shows using the classifier to extract the negative data from the unlabeled data, and the feature extraction in para 33, 60, 65, 88, 90 on this would produce a second feature); and inputting, by the server, the second feature into the PU classifier, to determine the extended data (Widerhorn para 147 shows using the classifier to extract the negative data to obtain a positive data set from the unlabeled data, and the feature extraction in para 33, 60, 65, 88, 90 would determine the extended data). The motivation to have this in Oh is the same as that mentioned for claim 1, namely because it would provide an efficient way to obtain data from the unlabeled data that would be useful to train the neural network model using knowledge distillation. 11. Regarding claim 4, in addition to that mentioned for claim 1, the first feature is obtained by fusing the plurality of third features (see claim 1) that undergo a first weight adjustment, the first weight adjustment is performed based on proportion information (Oh para 120, 155, 200, 245 show the data undergoes a weight adjustment based on proportion information, and Wshah shows fusing the third features to obtain the first feature as explained for claim 1). Oh and Widerhorn do not explicitly show the second feature is obtained by fusing a plurality of fourth features by using a first weight, and the plurality of fourth features are in a one-to-one correspondence with the plurality of layers of the first neural network. Wshah however does show a (second) feature is obtained based on a fusion of plurality of (fourth) features and the plurality of those (fourth) features are in a one-to-one correspondence with a plurality of layers of the first neural network (para 42-43, 45, 53 show multi-layer fusion in a convolutional neural network where features extracted each from different multiple depths/layers are concatenated/fused to form a combined representation used for classification). It would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention to obtain a second feature this way as shown in Wshah, in the method of Oh, especially as modified by Widerhorn, because it would provide an efficient and useful way to construct a feature used to obtain the classifier. Given the combination, the data making up the fourth features would also have the weight adjustment as shown in Oh para 120, 155, 200, 245. 12. Regarding claim 5, the training data of the first neural network model is a part of training data used to train the first neural network model (Oh para 120, 142, 150, 152 show the training data used for the first and second neural network model us part of a set of training data used to train the first neural network). 13. Regarding claim 6, Oh shows the part of training data comprises data of each of a plurality of classes output by the first neural network (Oh para 142, 151, 162 show the training data includes data from a plurality of preset classes output by the first neural network). 14. Claims 9-14 show the same features as claims 1-6 respectively, and are rejected for the same reasons. In addition, note that Oh para 163, 166, and 169 show the memory storing the computer program and processor to invoke and run the computer program. 15. Claims 17, 18, 19, and 20 show the same features a claims 1, 2, 3, and 3 respectively and are rejected for the same reasons. Note that claim 20 repeats a couple limitations already recited in claim 17 from which it depends and otherwise repeats the same limitations recited in claim 19, and that in total claim 20 thus covers the same exact limitations which are covered by claim 19 as dependent on claim 17. In addition, note that Oh para 163, 166, and 169 show the non-transitory computer readable medium storing program code executed by a device such as a processor. 16. Claim(s) 7 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Oh and Widerhorn and Wshah and Guo et al “Guo” (CN 109685743 B). (Please also see the previously attached copy of Guo that numbers paragraphs in the same format as that used in this Action). 17. Regarding claim 7, in addition to that explained for claim 1, Oh as modified by Widerhorn shows inputting the processed/extended data into the first neural network model, to classify the processed/extended data and obtain processed/extended data of a plurality of classes and a second weight of processed/extended data of each of the plurality of classes (Oh para 142, 151, 162 show inputting the processed data into the first neural network which then classifies it to obtain processed data of a plurality of classes. Oh para 150, 155, 160 further show assigning the weight to the processed data. The processed data would be the extended data in view of Widerhorn para 147-148, 151-153 , as explained for claim 1 from which claim 7 depends). Oh para 163, 166, 169 show standard ways of storing, retrieving, and processing data in a computer system and network, but does not explicitly say the various computer implemented steps (such as the inputting and minimizing steps) are performed by a server per se. Widerhorn para 166 however shows a server performing various computer implemented steps, such as obtaining, selecting, and processing data, for efficient use of standard computer technology to perform computer implemented steps. It would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention to use the server in Oh for the inputting and minimizing and other steps, because it would provide a convenient and efficient way to perform those computer implemented steps in a computer system that stores, retrieves, and processes data. Oh shows the loss function of the KD method is a sum of products of training errors of extended data of all of the plurality of classes and second weights of the extended data of all the classes (Oh para 137-138, 150-151 show the loss calculation of the knowledge distillation method is a variance, or sum of squares [which is thus a sum of products] of the training error of the processed/extended data. Note this is for the extended data of all of the preset classes and assigned weights of the extended data for all the preset classes). Oh para 137 shows the second neural network may be trained based on the loss, but Oh and Widerhorn and Wshah do not explicitly state minimizing a loss function per se. Guo para 77, 119, and claim 1 however show minimizing a loss function to train a second neural network model. It would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention to have this in Oh, especially as modified by Widerhorn and Wshah, because it would provide an efficient way to train the second neural network model using a loss calculation. 18. Claim 15 shows the same features as claim 7 and is rejected for the same reasons. Allowable Subject Matter 19. Claims 8 and 16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. These claims bring out at least that in addition to that mentioned for claim 7, that the second weights of the extended data of all the classes comprise a plurality of perturbed weights obtained after random perturbation is performed on initial weights of the extended data of all the classes, and the loss function of the KD method comprises a plurality of loss functions in a one-to-one correspondence with the plurality of perturbed weights, wherein an initial weight of the extended data of each class is in negative correlation with an amount of the extended data of each class; and further comprising: minimizing, by the server, maximum values of the plurality of loss functions, to obtain the trained second neural network model. The limitations combined are not set forth in the prior art of record. Response to Arguments 20. Applicant's arguments filed 1/26/26 have been fully considered but they are not persuasive. Applicant argues that Oh and Widerhorn do not show the newly amended limitations to the independent claims. However, the Action explains how Oh and Widerhorn do show the amended limitation of “obtaining the PU classifier based on a first feature and some sort of proportion information.” Note the 112 rejection and that recitation of “the proportion information” in the independent claims does not have antecedent basis and therefore it is not explained any further in the recitation of the independent claims. Applicant does recite further detail about proportion information in dependent claim 2, but this is not mentioned for claim 1 (nor is it referred to in claims 3 and 4 as they are now amended to depend directly from claim 1). Furthermore, regarding the amended limitation of “the first feature is obtained based on a fusion of plurality of third features and the plurality of third features are in a one-to-one correspondence with a plurality of layers of the first neural network,” please note that Wshah is brought in to show this. Applicant also argues that Oh and Widerhorn do not show the extended data is positive sample data determined by the PU classifier, but Widerhorn para 145-147 and 149 show building the classifier using classification techniques that classify the unlabeled data into positive and negative data, and extracting the negative data to obtain a positive data set, and Widerhorn para 147 shows using the classifier to extract the negative data to obtain a positive data set from the unlabeled data, and this would be extended data is used to train another model. Applicant does not argue these citations in Widerhorn nor this reasoning. Conclusion 21. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: a) Gao (CA 3066775 A1) shows techniques for training a neural network using knowledge based systems. b) Majumdar (US 20180260695 A1) shows a system for compressing a neural network with an unlabeled data set. c) Forman (WO 2015094281 A1) uses a labeling module to determine positive and negative data and uses this to train a second classifier. d) Deng (CN 111340053 A) uses a PU classifier to obtain a positive sample set. 22. Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEVEN PAUL SAX whose telephone number is (571)272-4072. The examiner can normally be reached Monday - Friday, 9:30 - 6:00 Est. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached at 571-272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /STEVEN P SAX/Primary Examiner, Art Unit 2146
Read full office action

Prosecution Timeline

Feb 25, 2022
Application Filed
Jun 28, 2022
Response after Non-Final Action
Aug 09, 2025
Non-Final Rejection — §103, §112
Sep 24, 2025
Response Filed
Nov 23, 2025
Final Rejection — §103, §112
Jan 26, 2026
Response after Non-Final Action
Feb 10, 2026
Request for Continued Examination
Feb 12, 2026
Examiner Interview (Telephonic)
Feb 15, 2026
Response after Non-Final Action
Feb 15, 2026
Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602537
METHODS FOR SERVING INTERACTIVE CONTENT TO A USER
2y 5m to grant Granted Apr 14, 2026
Patent 12596343
GRAPHICAL ELEMENT SEARCH TECHNIQUE SELECTION, FUZZY LOGIC SELECTION OF ANCHORS AND TARGETS, AND/OR HIERARCHICAL GRAPHICAL ELEMENT IDENTIFICATION FOR ROBOTIC PROCESS AUTOMATION
2y 5m to grant Granted Apr 07, 2026
Patent 12547922
BENCHMARK-DRIVEN AUTOMATION FOR TUNING QUANTUM COMPUTERS
2y 5m to grant Granted Feb 10, 2026
Patent 12541708
TRUSTED AND DECENTRALIZED AGGREGATION FOR FEDERATED LEARNING
2y 5m to grant Granted Feb 03, 2026
Patent 12524691
CENTRAL CONTROLLER FOR A QUANTUM SYSTEM
2y 5m to grant Granted Jan 13, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
70%
Grant Probability
99%
With Interview (+44.8%)
4y 0m
Median Time to Grant
High
PTA Risk
Based on 460 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month