Office Action Analysis: 18123768 — MODEL COMPRESSION METHOD AND APPARATUS

Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is made non-final.
Claims 1-16 are pending in the case.

Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in China on September 14th, 2020. It is noted, however, that applicant has not filed a certified copy of the CN202010997722.4 application as required by 37 CFR 1.55.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


To determine if a claim is directed to patent ineligible subject matter, the Court has guided the Office to apply the Alice/Mayo test, which requires:
Step 1: Determining if the claim falls within a statutory category.
Step 2A: Determining if the claim is directed to a patent ineligible judicial exception consisting of a law of nature, a natural phenomenon, or abstract idea; and Step 2A is a two prong inquiry. MPEP 2106.04(II)(A). Under the first prong, examiners evaluate whether a law of nature, natural phenomenon, or abstract idea is set forth or described in the claim. Abstract ideas include mathematical concepts, certain methods of organizing human activity, and mental processes. MPEP 2104.04(a)(2). The second prong is an inquiry into whether the claim integrates a judicial exception into a practical application. MPEP 2106.04(d).
Step 2B: If the claim is directed to a judicial exception, determining if the claim recites limitations or elements that amount to significantly more than the judicial exception. (See MPEP 2106).
Claims 1-16 is/are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Step 1: Claims 1-8 are directed to a method (a process), and Claims 9-16 are directed to a apparatus comprising one or more processors (a machine). Therefore, Claims 1-16 are directed to a process, machine or manufacture or composition of matter.

Regarding claim 1
Step 2A Prong 1
	Claim 1 recites the following the following mathematical concepts, that in each case under the broadest reasonable interpretation, covers performance of mathematical relationships, mathematical formulas or equations, and mathematical calculations but for recitation of generic computer components (e.g., “neural network model”, “machine learning model”, etc.) [see MPEP 2106.04(a)(2)(I)].
	“determining a first target loss based on the first output and the second output, and updating the second neural network model based on the first target loss, to obtain an updated second neural network model” (e.g., calculating a difference between outputs and using that value to update parameters)
Step 2A Prong 2
	The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of “neural network model”, which is recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)). The Examiner notes that this is used throughout the claim limitations, and is rejected thusly for each claim which recites the same language.
	Regarding the “obtaining a first neural network model, a second neural network model, and a third neural network model, wherein the first neural network model comprises a transformer layer, the second neural network model comprises the first neural network model or a neural network model obtained by performing parameter update on the first neural network model, and the third neural network model is obtained by compressing the second neural network model” this additional element is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of data gathering for use in the claimed process (see MPEP 2106.05(g)). 
	Regarding the “processing first to-be-processed data using the first neural network model, to obtain a first output”, and “processing the first to-be-processed data using the third neural network model, to obtain a second output” these additional elements are recited at a high level of generality and amounts to extra-solution activity of outputting data to generate the target of the training process, i.e. pre-solution activity of outputting data for use in the claimed process (see MPEP 2106.05(g)). The examiner notes that these could also be identified under 2106.05(f), as no more than mere instructions to apply the exception using a generic computer component, they are describing that the model processes data to obtain outputs to determine a target loss.
	Regarding the “compressing the updated second neural network model to obtain a target neural network model” this additional element is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. post-solution activity of inputting data for use in the claimed process (see MPEP 2106.05(g)).
	Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application. 
Step 2B
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional element of a “neural network model”, which is recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).
	Regarding the “obtaining a first neural network model, a second neural network model, and a third neural network model, wherein the first neural network model comprises a transformer layer, the second neural network model comprises the first neural network model or a neural network model obtained by performing parameter update on the first neural network model, and the third neural network model is obtained by compressing the second neural network model” limitations, these additional elements are recited at a high-level of generality and amounts to extra-solution activity of obtaining data to input for a model, i.e., pre-solution activity of data gathering. The courts have found limitations directed to obtaining information electronically, recited at a high-level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
	Regarding the “processing first to-be-processed data using the first neural network model, to obtain a first output”, and “processing the first to-be-processed data using the third neural network model, to obtain a second output” these additional elements are recited at a high level of generality and amounts to extra-solution activity of outputting data i.e. pre-solution activity of outputting data. The courts have found limitations directed to obtaining information electronically, recited at a high-level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
	Regarding the “compressing the updated second neural network model to obtain a target neural network model” this additional element is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. post-solution activity of inputting data. The courts have found limitations directed to obtaining information electronically, recited at a high-level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
	Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Regarding claim 2	
Step 2A Prong 1
	Claim 2 does not recite an abstract idea, but is directed to the abstract idea identified in its parent claim(s).
	Accordingly, at Step 2A, prong one, the claim recites an abstract idea.
Step 2A Prong 2
	The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of “wherein a difference between processing results obtained by processing same data using the second neural network model and the first neural network model falls within a preset range” which is recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
	Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional element of a “wherein a difference between processing results obtained by processing same data using the second neural network model and the first neural network model falls within a preset range” which is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).
	Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Regarding claim 3	
Step 2A Prong 1
	Claim 3 does not recite an abstract idea, but is directed to the abstract idea identified in its parent claim(s).
	Accordingly, at Step 2A, prong one, the claim recites an abstract idea.
Step 2A Prong 2
	The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of “wherein a difference between processing results obtained by processing same data using the updated second neural network model and the first neural network model falls within the preset range” which is recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
	Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional element of a “wherein a difference between processing results obtained by processing same data using the updated second neural network model and the first neural network model falls within the preset range” which is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).
	Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Regarding claim 4	
Step 2A Prong 1
	Claim 4 does not recite an abstract idea, but is directed to the abstract idea identified in its parent claim(s).
	Accordingly, at Step 2A, prong one, the claim recites an abstract idea.
Step 2A Prong 2
	The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of “quantizing the updated second neural network model to obtain the target neural network model” which is recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
	Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional element of a “quantizing the updated second neural network model to obtain the target neural network model” which is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).
	Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Regarding claim 5	
Step 2A Prong 1
	Claim 5 does not recite an abstract idea, but is directed to the abstract idea identified in its parent claim(s).
	Accordingly, at Step 2A, prong one, the claim recites an abstract idea.
Step 2A Prong 2
	The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of “the second neural network model and the third neural network model each comprises an embedding layer, a transformer layer, and an output layer”, “the first output is an output of a target layer in the second neural network model”, “the second output is an output of a target layer in the third neural network model”, “the target layer in the second neural network model comprises at least one of the embedding layer of the second neural network model, the transformer layer of the second neural network model, or the output layer of the second neural network model”, and “the target layer in the third neural network model comprises at least one of the embedding layer of the third neural network model, the transformer layer of the third neural network model, or the output layer of the third neural network model”, which are recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
	Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional element of a “the second neural network model and the third neural network model each comprises an embedding layer, a transformer layer, and an output layer”, “the first output is an output of a target layer in the second neural network model”, “the second output is an output of a target layer in the third neural network model”, “the target layer in the second neural network model comprises at least one of the embedding layer of the second neural network model, the transformer layer of the second neural network model, or the output layer of the second neural network model”, and “the target layer in the third neural network model comprises at least one of the embedding layer of the third neural network model, the transformer layer of the third neural network model, or the output layer of the third neural network model”, which are recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).	Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Regarding claim 6
Step 2A Prong 1
	Claim 6 recites the following the following mathematical concepts, that in each case under the broadest reasonable interpretation, covers performance of mathematical relationships, mathematical formulas or equations, and mathematical calculations but for recitation of generic computer components (e.g., “neural network model”, “machine learning model”, etc.) [see MPEP 2106.04(a)(2)(I)].
	“determining a second target loss based on the third output and the fourth output, and updating the updated second neural network model based on the second target loss, to obtain a fourth neural network model” (e.g., calculating a second difference between outputs and using that value to update parameters)
Step 2A Prong 2
	The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of “processing second to-be-processed data using the first neural network model, to obtain a third output”, and “processing the second to-be-processed data using the target neural network model, to obtain a fourth output” these additional elements are recited at a high level of generality and amounts to extra-solution activity of outputting data to generate the target of the training process, i.e. pre-solution activity of outputting data for use in the claimed process (see MPEP 2106.05(g)). The examiner notes that these could also be identified under 2106.05(f), as no more than mere instructions to apply the exception using a generic computer component, they are describing that the model processes data to obtain outputs to determine a target loss.
	Regarding the “compressing the fourth neural network model to obtain an updated target neural network model” this additional element is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. post-solution activity of inputting data for use in the claimed process (see MPEP 2106.05(g)).
	Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application. 
Step 2B
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional element of 
“processing second to-be-processed data using the first neural network model, to obtain a third output”, and “processing the second to-be-processed data using the target neural network model, to obtain a fourth output” these additional element are recited at a high level of generality and amounts to extra-solution activity of outputting data i.e. pre-solution activity of outputting data. The courts have found limitations directed to obtaining information electronically, recited at a high-level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
	Regarding the “compressing the fourth neural network model to obtain an updated target neural network model” this additional element is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. post-solution activity of inputting data. The courts have found limitations directed to obtaining information electronically, recited at a high-level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
	Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Regarding claim 7	
Step 2A Prong 1
	Claim 7 does not recite an abstract idea, but is directed to the abstract idea identified in its parent claim(s).
	Accordingly, at Step 2A, prong one, the claim recites an abstract idea.
Step 2A Prong 2
	The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of “wherein the first to-be-processed data comprises one of audio data, text data, or image data” which is recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
	Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional element of a “wherein the first to-be-processed data comprises one of audio data, text data, or image data” which is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).
	Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Regarding claim 8	
Step 2A Prong 1
	Claim 8 does not recite an abstract idea, but is directed to the abstract idea identified in its parent claim(s).
	Accordingly, at Step 2A, prong one, the claim recites an abstract idea.
Step 2A Prong 2
	The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of “performing parameter fine-tuning or knowledge distillation on a pre-trained language model, to obtain the first neural network model, wherein processing precision of the first neural network model during a target task processing is higher than a preset value” which is recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
	Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional element of a “performing parameter fine-tuning or knowledge distillation on a pre-trained language model, to obtain the first neural network model, wherein processing precision of the first neural network model during a target task processing is higher than a preset value” which is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).
	Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Regarding claims 9-16
	Claims 9-16 recites an apparatus, which corresponds directly to the method steps of claims 1-8. The addition of generic computer components executing instructions are insufficient to render the claims subject matter eligible for the same reason described above.

Claim Rejections - 35 USC § 103The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 4-9, and 12-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. (US 11461415 B2, referred to as Lu), in view of Kim et al. (US 20200364574 A1, referred to as Kim).

	Regarding claim 1. Lu teaches, a method for model compression, comprising:
obtaining a first neural network model, a second neural network model, and a third neural network model, wherein the first neural network model comprises a transformer layer, the second neural network model comprises the first neural network model or a neural network model obtained by performing parameter update on the first neural network model (FIG. 5, Col. 12, lines 44-67 cont. Col. 13, lines 1-13: Describes obtaining multiple neural network models, including a pre-trained model, and performing a training operation that fine tunes the pre-trained model by modifying its parameter values to produce a fine-tuned model. The fine-tuned model is used as a teacher model to train a student model, these correspond to a first, second and third neural network models in a training framework. ;Col 13, lines 64067 cont. Col 14, lines 1-13: Describes that the encoders transform inputs using at least one transformation unit, and that each transformation unit includes a self-attention mechanism configures to determine relevance among parts of an input.)
Although Lu teaches obtaining a first neural network model, a second neural network model, and a third neural network model, wherein the first neural network model comprises a transformer layer, the second neural network model comprises the first neural network model or a neural network model obtained by performing parameter update on the first neural network model. It does not teach the third neural network model is obtained by compressing the second neural network model.
Kim teaches the third neural network model is obtained by compressing the second neural network model ([0083-0086]: Describes a model compression module that applies a compression algorithm to portions of the model and selects compression range/algorithm to achieve a target compression rate.);
It would have been obvious to one of ordinary skill in the art at the time of the claimed invention to have combined the model network of Lu with the compression algorithm of Kim. Doing so would have reduced memory usage while maintaining model performance during deployment.
Lu further teaches, processing first to-be-processed data using the first neural network model, to obtain a first output (FIG. 5, Col. 12, lines 44-67 cont. Col. 13, lines 1-53: Describes using the fine-tuned model 510 as a teacher model and feeding training examples to the teacher model which generates a soft label for a training example. This then creates output produced by processing the data with the teacher model corresponding to a first output.);
processing the first to-be-processed data using the third neural network model, to obtain a second output (FIG. 6, Col. 12, lines 44-67 cont. Col. 13, lines 1-53: Describes that the third training system 602 trains a student model 604 and that the same training examples are fed to both the teacher model and the student model. The student model produces probabilities for the training example, which correspond to an output produced by processing the data with the student model to obtain a second output.);
determining a first target loss based on the first output and the second output, and updating the second neural network model based on the first target loss, to obtain an updated second neural network model (FIG. 5, FIG. 6, Col. 12, lines 44-67 cont. Col. 13, lines 1-53: Describes that the teacher model (fine-tuned model 510) generates a soft label for a training example, and that the student model 604 generates corresponding probability outputs. The training of the student model 604 based on a cross-entropy loss function that compares the teacher provided soft label with the students models output probabilities, and updating the student model through training (parameter updates) to obtain an updated trained student model.); 
Kim further teaches, compressing the updated second neural network model to obtain a target neural network model ([0083]: Describes using a model compression module to compress an original neural network model to obtain a first compression model, corresponding to obtaining a model by compression another model.; [0224]: Describes compressing a trained neural network model (updated model) to reduce the size of the model, which obtains a compressed model. These correspond to outputting a final compressed model form an updated model.).

	Regarding claim 4, Lu in view of Kim teaches the method according to claim 1 wherein
	compressing the second neural network model comprises quantizing the second neural network model, and
	compressing the updated second neural network model to obtain the target neural network model comprises:
	quantizing the updated second neural network model to obtain the target neural network model (Kim, [0009] and [0086]: Describes compressing a neural network model using compression techniques including quantization. ; [0201-00202]: Describes compressing a trained neural network model to reduce model size, using quantization to obtain a target neural network model.).

	Regarding claim 5, Lu in view of Kim teaches the method according to claim 1, wherein
	the second neural network model and the third neural network model each comprises an embedding layer, a transformer layer, and an output layer (Lu, Col. 8 lines 50-67 cont. Col. 9 lines 1-8: Describes a linguistic embedding mechanism, that transforms tokens of an input expression into input embeddings, for use in the models input layer (embedding layer).; Col. 13, lines 14-67 cont. Col 14. Lines 1-19: Describes a transformation unit including a self-attention mechanism (transformer layer). This generates output [probabilities/labels form the model (output layer).);
	the first output is an output of a target layer in the second neural network model; the second output is an output of a target layer in the third neural network model (Lu, Col. 13 lines 14-53: Describes generating a soft label as an output of the teacher model, and generates probability outputs form the student model. These outputs correspond to outputs of respective target layers in the second and third neural network models.);
	the target layer in the second neural network model comprises at least one of the embedding layer of the second neural network model, the transformer layer of the second neural network model, or the output layer of the second neural network model; and
	the target layer in the third neural network model comprises at least one of the embedding layer of the third neural network model, the transformer layer of the third neural network model, or the output layer of the third neural network model (Lu, Col. 8 lines 50-67 cont. Col. 9 lines 1-8 and Col. 13, lines 14-67 cont. Col 14. Lines 1-19: These describe generating outputs at different stages of the neural network, including input embeddings produced by a linguistic embedding mechanism, intermediate representations produced by transformation units including self-attention mechanisms, and a final output probabilities/label. The second and third neural network models may comprise one of these.).

	Regarding claim 6, Lu in view of Kim teaches the method according to claim 1, wherein the method further comprises:
	processing second to-be-processed data using the first neural network model, to obtain a third output(Lu, Col. 13, lines 1-53: Describes processing training examples using the teacher model to generate soft labels. The teacher model is used as the second model and processes second data using the first neural network model to then be used for additional models.);
	processing the second to-be-processed data using the target neural network model, to obtain a fourth output(Lu, Col. 13, lines 1-53: Describes processing the same training examples using the student model to generate probability outputs, which processes the second data using the target neural network model to obtain a fourth output.);
	determining a second target loss based on the third output and the fourth output, and updating the updated second neural network model based on the second target loss, to obtain a fourth neural network model(Lu, Col. 13, lines 1-53: Describes determining a loss function based on the teacher model output and the student model output and updating the student model through training to modify its parameter values. This is determining a second target loss based on the third and fourth outputs to then update the second model to create an updated neural network.); and
	Although Lu teaches processing second to-be-processed data using the first neural network model, to obtain a third output processing the second to-be-processed data using the target neural network model, to obtain a fourth output determining a second target loss based on the third output and the fourth output, and updating the updated second neural network model based on the second target loss, to obtain a fourth neural network model. It does not teach compressing the fourth neural network model to obtain an updated target neural network model.
	Kim teaches, compressing the fourth neural network model to obtain an updated target neural network model ([0083-0086]: Describes a model compression module that applies a compression algorithm to portions of the model and selects compression range/algorithm to achieve a target compression rate.).

	Regarding claim 7, Lu in view of Kim teaches the method according to claim 1, wherein the first to-be-processed data comprises one of audio data, text data, or image data (Lu, Col. 9 lines 3-29: Describes processing query expressions represented as tokens (text data) and transforms the tokens into input embeddings for neural network processing.).

	Regarding claim 8, Lu in view of Kim teaches the method according to claim 1, wherein the obtaining theft first neural network model comprises:
	performing parameter fine-tuning or knowledge distillation on a pre-trained language model, to obtain the first neural network model, wherein processing precision of the first neural network model during a target task processing is higher than a preset value (Lu, FIG. 5, Col. 12, lines 44-67 cont. Col. 13, lines 1-13: Describes producing a pre-trained language model and performing fine-tuning by further modifying its parameter values to obtain a fine-tuned model 510, this obtains the first neural networks model by parameter fine tuning of a pre-trained language model. The fine-tuned model is a teacher model to train student models for knowledge distillation. “The second training system 506 fine-tunes the pre-trained model 504 such that the resultant model can successfully determine the semantic relation between any given query expression and target expression.” Shows that the processing precision during target task processing exceeds a preset value.).

	Regarding claims 9, and 12-16, which recites substantially the same limitations as claims 1, and 4-8 and further recites an apparatus comprising one or more processors (Lu, Col. 14 lines 33-41: Describes a computing device to execute the methods using computer hardware executing software using computer components.)to perform the method steps of claims 1, and 4-8, respectively and are rejected for the same reasons as described above.

Claim(s) 2, and 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. (US 11461415 B2, referred to as Lu), in view of Kim et al. (US 20200364574 A1, referred to as Kim), in view of Lin (US 20200320392 A1. referred to as Lin).

	Regarding claim 2, Lu in view of Kim teaches the method according to claim 1, but they do not teach wherein a difference between processing results obtained by processing same data using the second neural network model and the first neural network model falls within a preset range.
	Lin teaches, wherein a difference between processing results obtained by processing same data using the second neural network model and the first neural network model falls within a preset range ([0011]: Describes separately inputting the same data into a non-optimized neural network model and a optimized neural network model to obtain respective processing results, and comparing absolute values of differences between the prediction results, which corresponds to a difference between processing results obtained by the first and second neural network models falling within a preset range.).
	It would have been obvious to one of ordinary skill in the art at the time of the claimed invention to have combined the model network of Lu in view of Kim with the output difference evaluation of Lin. Doing so would have enabled the system to maintain the second models results remain close to the first results to maintain performance with smaller models, improving reliability in the models.

	Regarding claim 3, Lu in view of Kim in view of Lin teaches, wherein a difference between processing results obtained by processing same data using the updated second neural network model and the first neural network model falls within the preset range([0011]: Describes processing the same data using a baseline model and an optimized model and comparing differences between their prediction results.).

	Regarding claims 10, and 11, which recites substantially the same limitations as claims 2, and 3 and further recites an apparatus comprising one or more processors (Lu, Col. 14 lines 33-41: Describes a computing device to execute the methods using computer hardware executing software using computer components.)to perform the method steps of claims 2, and 3, respectively and are rejected for the same reasons as described above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See attached PTO-892 for additional references including:
US 11604965 B2: hidden layer selection
US 11635936 B2: neural networks audio processing
US 20230222353 A1: adversarial training and knowledge distillation

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DONALD T RODEN whose telephone number is (571)272-6441. The examiner can normally be reached Mon-Thur 8:00-5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/D.T.R./Examiner, Art Unit 2128                                                                                                                                                                                         
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128
Read full office action
MODEL COMPRESSION METHOD AND APPARATUS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

MODEL COMPRESSION METHOD AND APPARATUS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email