Last updated: May 29, 2026
Application No. 17/892,481
METHOD AND APPARATUS WITH NEURAL NETWORK COMPRESSION

Final Rejection §101§102§103
Filed
Aug 22, 2022
Priority
Oct 26, 2021 — RE 10-2021-0143629 +1 more
Examiner
LEWIS, MATTHEW LEE
Art Unit
2144
Tech Center
2100 — Computer Architecture & Software
Assignee
Korea Advanced Institute Of Science And Technology
OA Round
2 (Final)
Interview Optional

— +0.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 0% grant rate with +0.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 3 resolved cases, 2023–2026
Examiner Intelligence

LEWIS, MATTHEW LEE View full profile →
Grants only 0% of cases
Career Allowance Rate
0 granted / 3 resolved
-55.0% vs TC avg
Minimal +0% lift
Without
With
+0.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
10 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
3.5%
-36.5% vs TC avg
§103
84.2%
+44.2% vs TC avg
§102
10.5%
-29.5% vs TC avg
§112
1.8%
-38.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 3 resolved cases
Office Action

§101 §102 §103
Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Amendments
	This action is in response to amendments filed November 13th, 2025 in which Claims 1, 8, 15, & 18 have been amended. No claims have been added or cancelled. The amendments have been entered, and Claims 1-21 are currently pending.

Response to Arguments
	Regarding the applicant’s traversal of the 35 U.S.C. 112 rejections of the previous office action, the applicant’s amendments filed November 13th, 2025 have successfully overcome all of the 112(b) rejections, which have been subsequently withdrawn.

Regarding the applicant’s traversal of the 35 U.S.C. 101 rejections of the previous office action, the applicant’s arguments filed November 13th, 2025 have been fully considered, and are unpersuasive.
	The applicant asserts that claim 1 does not recite an abstract idea because, further submitting that “generating a second neural network by fine- tuning, based on training data for a predetermined purpose, a first neural network which is pre- trained; determining delta weights by determining differences between weights of the first neural network and weights of the second neural network; compressing the delta weights; retraining the second neural network updated based on the compressed delta weights and the weights of the first neural network; and encoding and storing the delta weights updated by the retraining of the second neural network”, as recited in claim 1, cannot practically be performed in the human mind.
	The examiner respectfully submits that only “determining delta weights by determining differences between weights of the first neural network and weights of the second neural network”, “compressing the delta weights”, & “encoding… the delta weights updated by the retraining of the second neural network” were ever found to recite mental processes and that the examiner agrees that the remaining limitations cannot be done via mind and subsequently do not recite abstract ideas. The presence of some limitations that cannot be done via mind, do not preclude the ones that can from reciting abstract ideas. They can, however, be used as a basis for integrating the abstract ideas into a practical application.
	Further, the applicant asserts that the claimed features are integrated into a practical application, relying on [0003], [0054], [0058], & [0079] of the specification to provide this evidence. 
	The examiner respectfully submits that the primary focus of these sections seems to be directed toward the encoding and storage of the delta weights, since it allows for the second neural network or the task-specific model to be restored from these small compressed weights as opposed to having to store the entire models. However, compression and encoding of the delta weights were found to be abstract limitations and therefore cannot be relied upon as “additional elements” which integrate the abstract ideas into a practical application, because when examining the claim as a whole, it is the “additional elements” that have the power to integrate the abstract limitations into a practical application, and not the abstract limitations themselves, as shown in the MPEP 2106.04 at Prong Two:
	“Prong Two asks does the claim recite additional elements that integrate the judicial exception into a practical application? In Prong Two, examiners evaluate whether the claim as a whole integrates the exception into a practical application of that exception. If the additional elements in the claim integrate the recited exception into a practical application of the exception, then the claim is not directed to the judicial exception (Step 2A: NO) and thus is eligible at Pathway B. This concludes the eligibility analysis. If, however, the additional elements do not integrate the exception into a practical application, then the claim is directed to the recited judicial exception (Step 2A: YES), and requires further analysis under Step 2B (where it may still be eligible if it amounts to an ‘‘inventive concept’’). For more information on how to evaluate whether a judicial exception is integrated into a practical application”
Further, the storage of the delta weights merely recites an insignificant extra-solution activity (mere data storage) (MPEP 2106.05(g)), which does not provide evidence of integration into a practical application. Further, the re-formation of the second model or the task-specific model from the delta weights does not appear to be positively recited in the claim. 
Therefore, the 35 U.S.C. 101 rejections of the previous action are maintained.

Regarding the applicant’s traversal of the 35 U.S.C. 102/103 rejections of the previous office action, the applicant’s arguments filed November 13th, 2025 have been fully considered, and are unpersuasive.
The applicant asserts that YAO does not successfully teach “determining delta weights by determining differences between weights of the first neural network and weights of the second neural network” because YAO cites using the difference between the first loss and second loss, as opposed to a difference between the weights themselves. 
The examiner respectfully submits that determining the weights based on a difference between the losses of the weights is functionally equivalent to determining the weights based on a difference of the weights. For example, If two runners (e.g. Alice and Bob) run a race, then the time that each racer takes to finish (tA & tB) could simply be subtracted and the difference could be used to determine the winner, or we could define a loss function as “how much slower than a reference time” each runner is (tref), so now the losses are tA - tref and tB - tref. Now if we compare Alice and Bob using the difference of their losses:
LA-LB = (tA - tref) – (tB-tref)
The reference time cancels out:
LA-LB = tA - tB
So even though we went through the loss functions, the comparison ends up being exactly the same as comparing their finish times directly. 
	Therefore, the examiner respectfully submits that just like comparing runners by the difference in how much slower they are than the same reference time reduces to comparing their actual finish times, comparing weights via the difference of their losses is equivalent to comparing the weights directly.
	Further, the applicant asserts that YAO merely discloses a neural network model that is a compressed version of another neural network model, meaning that it fails to disclose a neural network model that is generated by fine-tuning another neural network model.
	The examiner respectfully asserts that all of the specification and claims’ support for “fine-tuning” the neural network is simply directed to pruning, quantization, & encoding weights, which as shown in [0007-0010], are methods of compression. The invention, as claimed, seems to be directed specifically toward a method of compressing neural networks. “A neural network model that is a compressed version of another model” is, in fact, a neural network generated by fine-tuning/compressing another neural network model. 
	Therefore, the rejections for 35 U.S.C. 102 & 35 U.S.C. 103 of the previous action are maintained.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea (mental process) without significantly more.
	Regarding claim 1, in Step 1 of the 101-analysis set forth in MPEP 2106, the claim recites “A method with neural network compression”. A method is one of the four statutory categories of invention.
	In Step 2a Prong 1 of the 101-analysis set forth in the MPEP 2106, the examiner has determined that the following limitations recite a process that, under the broadest reasonable interpretation, covers a mental process but for recitation of generic computer components:
“determining delta weights by determining differences between weights of the first neural network and weights of the second neural network” (A person can mentally evaluate a difference between one set of weights and another set of weights and make a judgement to determine delta weights corresponding to the difference (MPEP 2106).)
“compressing the delta weights” (A person can mentally evaluate the delta weights and make a judgement to compress them using the mathematical processes of pruning and quantization, as cited in the specification [0009-0010] (MPEP 2106).)
“encoding… the delta weights updated by the retraining of the second neural network” (A person can mentally evaluate the delta weights and make a judgement to ignore “0” weights with mental tables organizing them by location, as is cited in the specification at [0008] (MPEP 2106).)
If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
	In Step 2a Prong 2 of the 101-analysis set forth in MPEP 2106, the examiner has
determined that the following additional elements do not integrate this judicial exception into a
practical application:
“generating a second neural network by fine-tuning, based on training data for a predetermined purpose, a first neural network which is pre-trained” (Generally linking the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h)).)
“retraining the second neural network updated based on the compressed delta weights and the weights of the first neural network” (Mere instructions to apply the judicial exception (MPEP 2106.05(f)).)
“storing the delta weights updated by the retraining of the second neural network” (Adding insignificant extra-solution activity (mere data storage) to the judicial exception (MPEP 2106.05(g)).)
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is “directed” to an abstract idea.
In Step 2b of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, additional element (iv) recites generally linking the use of a judicial exception to a particular technological environment or field of use, which is not indicative of significantly more. Additional element (v) recites mere instructions to apply the judicial exception, which is not indicative of significantly more. Additional element (vi) recites an insignificant extra-solution activity. Further, element (vi) recites steps that store and retrieve information in memory which has been determined by the courts to recite a well-understood, routine, and conventional activity which is not indicative of significantly more (Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015)). Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

	Regarding claim 2, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 2 recites the following additional mental processes:
“wherein the encoding and storing of the delta weights comprises: determining whether to terminate the retraining of the second neural network based on a preset accuracy standard with respect to the second neural network” (A person can mentally evaluate a preset standard with respect to a second neural network and make a judgement to determine whether to terminate the retraining of the second neural network (MPEP 2106).)
“encoding… the delta weights updated by retraining of the second neural network based on a determination to terminate the retraining of the second neural network” (A person can mentally evaluate the delta weights based on a decision to terminate retraining and make a judgement to encode them using the methods cited in the specification [0008] (MPEP 2106).)
Further, claim 2 recites “storing the delta weights updated by retraining of the second
neural network based on a determination to terminate the retraining of the second neural network” (In step 2A, prong 2, this recites insignificant extra-solution activity (mere data storage) to the judicial exception (MPEP 2106.05(g).) In step 2B, the courts have found steps that store and retrieve information in memory to be a well-understood, routine, and conventional activity, which is not indicative of significantly more (Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015)).)
	Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

	Regarding claim 3, it is dependent upon claim 2, and thereby incorporates the limitations of, and corresponding analysis applied to claim 2. Further, claim 3 recites the following additional mental process:
“in response to a determination not to terminate the retraining of the second neural network, iteratively performing the compressing of the delta weights… based on the compressed delta weights and the weights of the first neural network” (A person can mentally evaluate the delta weights in light of a previous determination to not terminate retraining, and make a judgement to iteratively compress the delta weights using the mathematical processes of pruning and quantization, as cited in the specification at [0009-0010] (MPEP 2106).)
Further, claim 3 recites “in response to a determination not to terminate the
retraining of the second neural network, iteratively performing… retraining of the second neural network updated based on the compressed delta weights and the weights of the first neural network” (In step 2A, prong 2, this recites mere instructions to apply the judicial exception (MPEP 2106.05(f).) In step 2B, mere instructions to apply the judicial exception is not indicative of significantly more.)
	Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

	Regarding claim 4, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 4 recites the following additional mental process:
“wherein the encoding and storing of the delta weights comprises: encoding the delta weights by metadata comprising position information of non-zero delta weights of the delta weights” (A person can mentally evaluate the delta weights and make a judgement to ignore “0” weights with mental tables organizing them by location (MPEP 2106).)
Further, claim 4 recites “storing the metadata corresponding to the second neural
network” (In step 2A, prong 2, this recites insignificant extra-solution activity (mere data storage) to the judicial exception (MPEP 2106.05(g).) In step 2B, the courts have found steps that store and retrieve information in memory to be a well-understood, routine, and conventional activity, which is not indicative of significantly more (Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015)).)
	Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

	Regarding claim 5, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 5 recites the following additional mental process:
 “wherein the compressing of the delta weights comprises performing pruning to modify a weight, which is less than or equal to a predetermined threshold, of the delta weights to be 0” (A person can mentally evaluate the delta weights and make a judgement to “prune” the weights less than or equal to a threshold, to be 0 (MPEP 2106).)
	Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

	Regarding claim 6, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 6 recites the following additional mental process:
“wherein the compressing of the delta weights comprises performing quantization to reduce the delta weights to a predetermined bit-width” (A person can mentally evaluate the delta weights and make a judgement to apply the mathematical process of quantization to reduce them to a specific “bit-width” (MPEP 2106).)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

	Regarding claim 7, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 7 recites the following additional mental process:
 “…which are encoded…” (A person can mentally evaluate the delta weights and make a judgement to ignore “0” weights with mental tables organizing them by location, as is cited in the specification at [0008] (MPEP 2106).)
Further, claim 7 recites “generating the second neural network, which is trained to 
perform the predetermined purpose, based on the delta weights… and the weight of the first neural network” (In step 2A, prong 2, this recites mere instructions to apply the judicial exception (MPEP 2106.05(f).) In step 2B, mere instructions to apply the judicial exception is not indicative of significantly more.)
	Further claim 7 recites “…which are…stored…” (In step 2A, prong 2, this recites insignificant extra-solution activity (mere data storage) to the judicial exception (MPEP 2106.05(g).) In step 2B, the courts have found steps that store and retrieve information in memory to be a well-understood, routine, and conventional activity, which is not indicative of significantly more (Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015)).)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Regarding claim 8, in Step 1 of the 101-analysis set forth in MPEP 2106, the claim recites “A method with neural network compression”. A method is one of the four statutory categories of invention.
	In Step 2a Prong 1 of the 101-analysis set forth in the MPEP 2106, the examiner has determined that the following limitations recite a process that, under the broadest reasonable interpretation, covers a mental process but for recitation of generic computer components:
“for each of the plurality of task-specific models, determining delta weights by determining differences between weights of the base model and weights of the task-specific model” (A person can mentally evaluate a difference between one set of weights and another set of weights and make a judgement to determine delta weights corresponding to the difference (MPEP 2106).)
“for each the plurality of task-specific models, compressing the determined delta weights based on a preset standard corresponding to the task-specific model” (A person can mentally evaluate the delta weights and make a judgement to compress them using the mathematical processes of pruning and quantization, as cited in the specification [0009-0010] (MPEP 2106).)
If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
	In Step 2a Prong 2 of the 101-analysis set forth in MPEP 2106, the examiner has
determined that the following additional elements do not integrate this judicial exception into a
practical application:
“generating a plurality of task-specific models by fine-tuning, based on a plurality of training data sets for a plurality of purposes, a base model which is pre-trained” (Generally linking the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h)).)
“compressing and storing the plurality of task-specific models based on the compressed delta weights corresponding to the plurality of task-specific models” (Adding insignificant extra-solution activity (mere data storage) to the judicial exception (MPEP 2106.05(g)).)
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is “directed” to an abstract idea.
In Step 2b of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, additional element (iii) recites generally linking the use of a judicial exception to a particular technological environment or field of use, which is not indicative of significantly more. Additional element (iv) recites an insignificant extra-solution activity. Further, element (iv) recites steps that store and retrieve information in memory which has been determined by the courts to recite a well-understood, routine, and conventional activity which is not indicative of significantly more (Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015)). Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

Regarding claims 9-10, they are dependent upon claim 8, and thereby incorporate the limitations of, and corresponding analysis applied to claim 8. Further, claims 9-10 recite similar additional limitations as claims 5-6, respectively, and are rejected under the same rationale.

	Regarding claim 11, it is dependent upon claim 8, and thereby incorporates the limitations of, and corresponding analysis applied to claim 8. Further, claim 11 recites the following additional mental process:
“for each of the plurality of task-specific models, encoding… delta weights corresponding to the task-specific model updated by the retraining” (A person can mentally evaluate the delta weights based on a decision to terminate retraining and make a judgement to encode them using the methods cited in the specification [0008] (MPEP 2106).)
Further, claim 11 recites “wherein the compressing and storing of the plurality of task-specific models comprises: for each of the plurality of task-specific models, retraining the task-specific model updated based on the weights of the base model and the compressed delta weights corresponding to the task-specific model” (In step 2A, prong 2, this recites mere instructions to apply the judicial exception (MPEP 2106.05(f).) In step 2B, mere instructions to apply the judicial exception is not indicative of significantly more.)
Further, claim 11 recites “for each of the plurality of task-specific models… storing delta weights corresponding to the task-specific model updated by the retraining.” (In step 2A, prong 2, this recites insignificant extra-solution activity (mere data storage) to the judicial exception (MPEP 2106.05(g).) In step 2B, the courts have found steps that store and retrieve information in memory to be a well-understood, routine, and conventional activity, which is not indicative of significantly more (Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015)).)
	Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Regarding claim 12, it is dependent upon claim 11, and thereby incorporates the limitations of, and corresponding analysis applied to claim 11. Further, claim 12 recites similar additional limitations as claim 4, and is rejected under the same rationale.

	Regarding claim 13, it is dependent upon claim 8, and thereby incorporates the limitations of, and corresponding analysis applied to claim 8. Further, claim 13 recites “wherein the preset standard comprises either one or both of a standard on a pruning ratio and a standard on a quantization bit-width” (In step 2a, prong 2, this recites generally linking the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h).) In step 2B, generally linking the use of the judicial exception to a particular technological environment or field of use is not indicative of significantly more.)
	Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
	Regarding claim 14, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 14 recites “A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, configure the one or more processors to perform the method” (In step2A, prong 2, this recites using a computer as a tool to perform an abstract idea (MPEP 2106.05(f).) In step 2B, using a computer as a tool to perform an abstract idea is not indicative of significantly more.)
	Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Regarding claim 15, in Step 1 of the 101-analysis set forth in MPEP 2106, the claim recites “An apparatus with neural network compression”. An apparatus is one of the four statutory categories of invention.
	In Step 2a Prong 1 of the 101-analysis set forth in the MPEP 2106, the examiner has determined that the following limitations recite a process that, under the broadest reasonable interpretation, covers a mental process but for recitation of generic computer components:
“determine delta weights by determining differences between weights of the first neural network and weights of the second neural network” (A person can mentally evaluate a difference between one set of weights and another set of weights and make a judgement to determine delta weights corresponding to the difference (MPEP 2106).)
“compress the delta weights” (A person can mentally evaluate the delta weights and make a judgement to compress them using the mathematical processes of pruning and quantization, as cited in the specification [0009-0010] (MPEP 2106).)
“encode… the delta weights updated by the retraining of the second neural network” (A person can mentally evaluate the delta weights and make a judgement to ignore “0” weights with mental tables organizing them by location, as is cited in the specification at [0008] (MPEP 2106).)
If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
	In Step 2a Prong 2 of the 101-analysis set forth in MPEP 2106, the examiner has
determined that the following additional elements do not integrate this judicial exception into a
practical application:
“An apparatus with neural network compression, the apparatus comprising: one or more processors configured to…” (Uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).)
“generate a second neural network by fine-tuning, based on training data for a predetermined purpose, a first neural network which is pre-trained” (Generally linking the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h)).)
“retrain the second neural network updated based on the compressed delta weights and the weights of the first neural network” (Mere instructions to apply the judicial exception (MPEP 2106.05(f)).)
“store the delta weights updated by retraining of the second neural network” (Adding insignificant extra-solution activity (mere data storage) to the judicial exception (MPEP 2106.05(g)).)
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is “directed” to an abstract idea.
In Step 2b of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, additional element (iv) recites use of a computer as a tool to perform an abstract idea, which is not indicative of significantly more. Additional element (v) recites generally linking the use of a judicial exception to a particular technological environment or field of use, which is not indicative of significantly more. Additional element (vi) recites mere instructions to apply the judicial exception, which is not indicative of significantly more. Additional element (vii) recites an insignificant extra-solution activity. Further, element (vii) recites steps that store and retrieve information in memory which has been determined by the courts to recite a well-understood, routine, and conventional activity which is not indicative of significantly more (Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015)). Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

Regarding claims 16-17, they are dependent upon claim 15, and thereby incorporate the limitations of, and corresponding analysis applied to claim 15. Further, claims 16-17 recite similar additional limitations as claims 2-3, respectively, and are rejected under the same rationale.

Regarding claim 18, in Step 1 of the 101-analysis set forth in MPEP 2106, the claim recites “A method with neural network compression”. A method is one of the four statutory categories of invention.
	In Step 2a Prong 1 of the 101-analysis set forth in the MPEP 2106, the examiner has determined that the following limitations recite a process that, under the broadest reasonable interpretation, covers a mental process but for recitation of generic computer components:
“determining delta weights by determining differences between weights of a pre-trained base neural network and weights of a task-specific neural network generated by retraining the pre- trained base neural network for a predetermined task” (A person can mentally evaluate a difference between one set of weights and another set of weights and make a judgement to determine delta weights corresponding to the difference (MPEP 2106).)
“updating the task-specific neural network by compressing the delta weights” (A person can mentally evaluate the delta weights and make a judgement to compress them using the mathematical processes of pruning and quantization, as cited in the specification [0009-0010] (MPEP 2106).)
“encoding… the updated delta weights” (A person can mentally evaluate the delta weights and make a judgement to ignore “0” weights with mental tables organizing them by location, as is cited in the specification at [0008] (MPEP 2106).)
If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
	In Step 2a Prong 2 of the 101-analysis set forth in MPEP 2106, the examiner has
determined that the following additional elements do not integrate this judicial exception into a
practical application:
“updating the compressed delta weights by retraining the updated task-specific neural network” (Mere instructions to apply the judicial exception (MPEP 2106.05(f)).)
“storing the updated delta weights” (Adding insignificant extra-solution activity (mere data storage) to the judicial exception (MPEP 2106.05(g)).)
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is “directed” to an abstract idea.
In Step 2b of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, additional element (iv) recites mere instructions to apply the judicial exception, which is not indicative of significantly more. Additional element (v) recites an insignificant extra-solution activity. Further, element (v) recites steps that store and retrieve information in memory which has been determined by the courts to recite a well-understood, routine, and conventional activity which is not indicative of significantly more (Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015)). Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

	Regarding claim 19, it is dependent upon claim 18, and thereby incorporates the limitations of, and corresponding analysis applied to claim 18. Further, claim 19 recites the following additional abstract idea (mathematical process):
“wherein the updating of the task-specific neural network comprises summing the weights of the base neural network and the compressed delta weights” (summing values together is a mathematical process, which is an abstract idea (MPEP 2106).)
	Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
	Regarding claim 20, it is dependent upon claim 18, and thereby incorporates the limitations of, and corresponding analysis applied to claim 18. Further, claim 20 recites “updating the pre-trained base neural network based on the stored delta weights” (In step 2A, prong 2, this recites mere instructions to apply the judicial exception (MPEP 2106.05(f).) In step 2B, mere instructions to apply the judicial exception is not indicative of significantly more.)
	Further, claim 20 recites “performing the predetermined task by implementing the updated base neural network” (In step 2A, prong 2, this recites mere instructions to apply the judicial exception (MPEP 2106.05(f).) In step 2B, mere instructions to apply the judicial exception is not indicative of significantly more.)
	Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

	Regarding claim 21, it is dependent upon claim 20, and thereby incorporates the limitations of, and corresponding analysis applied to claim 20. Further, claim 21 recites “wherein the stored delta weights are stored in an external device” (In step 2A, prong 2, this recites insignificant extra-solution activity (mere data storage) to the judicial exception (MPEP 2106.05(g).) In step 2B, the courts have found steps of transmitting/receiving data over a network to be a well-understood, routine, and conventional activity, which is not indicative of significantly more (Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362).
	Further, claim 21 recites “the implementing of the updated base neural network comprises loading the stored delta weights by a user device” (In step 2A, prong 2, this recites insignificant extra-solution activity (mere data gathering) to the judicial exception (MPEP 2106.05(g).) In step 2B, the courts have found steps of transmitting/receiving data over a network to be a well-understood, routine, and conventional activity, which is not indicative of significantly more (Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362).
	Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

	Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 6, 14-15 & 18 are rejected under 35 U.S.C. 102(a)(1) as being clearly anticipated by Yao, A. et al. US. PG PUB No: US 2021/0019630 A1 (hereafter, YAO)

Regarding claim 1, YAO teaches “A method with neural network compression, the method comprising: generating a second neural network by fine-tuning, based on training data for a predetermined purpose, a first neural network which is pre-trained”:
 ([0051] “In certain examples, an incremental network quantization strategy is provided to convert a pre-trained full precision deep neural network model (a first neural network model, pretrained based on training data, for a predetermined purpose) into a lossless low precision version of that model (a second model generated by fine-tuning a first neural network). This strategy is further improved through a different weight partition strategy, quantization goals, and optimization formulations provided through explicit-loss-error-aware quantization. By determining a loss error as the low-precision network model is formed from the full-precision network model (generating the second model by fine-tuning the first model), the composition, quality, and effectiveness of the low-bit, low-precision network model can be improved.”)
Further, YAO teaches “determining delta weights by determining differences between weights of the first neural network and weights of the second neural network”:
 ([Abstract] “…The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights. In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights) ...”)
Further, YAO teaches “compressing the delta weights”:
 ([Abstract] “...The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights) ...”)
Further YAO teaches “retraining the second neural network updated based on the compressed delta weights and the weights of the first neural network”:
 ([Abstract] “…The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights)...”) Applying the “weight updater” to “update the second group of network weights based on the difference” and “deploy a low-bit network model” that includes “the low-bit second network weights” correlates directly to retraining the second neural network updated based on the compressed delta weights, which are directly influenced by “the weights of the first neural network.”
Further, YAO teaches “encoding and storing the delta weights updated by the retraining of the second neural network”:
 	([Abstract] “…The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). The example apparatus includes a network model deployer to deploy a low-bit network model including the low-bit second network weights (delta weights are stored).”)
And further:
 ([0057] “…The weights in the first group are quantized to be either powers of two or zero by a variable-length encoding method, forming a low-precision base for an original model (here, the weights are encoded). ...”)

Regarding claim 6, YAO teaches the limitations of claim 1. Further, YAO teaches “wherein the compressing of the delta weights comprises performing quantization to reduce the delta weights to a predetermined bit-width”:
 	([0059] “INQ (Incremental Network Quantization) techniques described herein adopt a variable-length encoding. For example, INQ techniques can use 5-bit quantization (Quantization with a predetermined bit-width): ...”) Therefore, 5-bit or other low-bit settings may be used to set a predetermined bit-width for the quantization performed.

Regarding claim 14, YAO teaches the limitations of claim 1. Further, YAO teaches “A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, configure the one or more processors to perform the method”:
([0098] “Flowcharts representative of example machine readable instructions for implementing the example network training optimizer 700 of FIG. 7 are shown in FIGS. 9-10. In this example, the machine-readable instructions include a program for execution by a processor such as a processor 1112 shown in the example processor platform 1100 discussed below in connection with FIG. 11. The program can be embodied in software stored on a non-transitory computer readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a DVD, a Blu-ray disk, or a memory associated with the processor 1112”)

Regarding claim 15, YAO teaches “An apparatus with neural network compression, the apparatus comprising: one or more processors configured to…”: 
([Abstract] “Methods, apparatus, systems and articles of manufacture for loss-error-aware quantization of a low-bit neural network are disclosed. …The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). ...”)
And further:
 ([0097] “…When reading any of the apparatus or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the example neural network processor 710…”)
Further, YAO teaches “generate a second neural network by fine-tuning, based on training data for a predetermined purpose, a first neural network which is pre-trained”:
 ([0051] “In certain examples, an incremental network quantization strategy is provided to convert a pre-trained full precision deep neural network model (a first neural network model, pretrained based on training data, for a predetermined purpose) into a lossless low precision version of that model (a second model generated by fine-tuning a first neural network). This strategy is further improved through a different weight partition strategy, quantization goals, and optimization formulations provided through explicit-loss-error-aware quantization. By determining a loss error as the low-precision network model is formed from the full-precision network model (generating the second model by fine-tuning the first model), the composition, quality, and effectiveness of the low-bit, low-precision network model can be improved.”)
Further, YAO teaches “determine delta weights by determining differences between weights of the first neural network and weights of the second neural network”:
 	([Abstract] “… The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights. In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). ...”)
Further, YAO teaches “compress the delta weights”:
 	([Abstract] “… The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). ...”)
Further YAO teaches “retrain the second neural network updated based on the compressed delta weights and the weights of the first neural network”:
 ([Abstract] “… The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). ...”) Applying the “weight updater” to “update the second group of network weights based on the difference” and “deploy a low-bit network model” that includes “the low-bit second network weights” correlates directly to retraining the second neural network updated based on the compressed delta weights, which are directly influenced by “the weights of the first neural network.”
Further, YAO teaches “encode and store the delta weights updated by retraining of the second neural network”:
([Abstract] “… The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). The example apparatus includes a network model deployer to deploy a low-bit network model including the low-bit second network weights (delta weights are stored).”)
And further:
 ([0057] “... The weights in the first group are quantized to be either powers of two or zero by a variable-length encoding method, forming a low-precision base for an original model (here, the weights are encoded). ...”)

Regarding claim 18, YAO teaches “A method with neural network compression, the method comprising: determining delta weights by determining differences between weights of a pre-trained base neural network and weights of a task-specific neural network generated by retraining the pre- trained base neural network for a predetermined task”:
 ([0051] “In certain examples, an incremental network quantization strategy is provided to convert a pre-trained full precision deep neural network model (a first neural network model, pretrained based on training data, for a predetermined task) into a lossless low precision version of that model (a second model generated by fine-tuning a first neural network). This strategy is further improved through a different weight partition strategy, quantization goals, and optimization formulations provided through explicit-loss-error-aware quantization. By determining a loss error as the low-precision network model is formed from the full-precision network model (generating the second model by fine-tuning the first model), the composition, quality, and effectiveness of the low-bit, low-precision network model can be improved.”)
And further:
 ([Abstract] “… The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights. In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). ...”)
Further, YAO teaches “updating the task-specific neural network by compressing the delta weights”:
 	([Abstract] “… The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). ...”)
Further, YAO teaches “updating the compressed delta weights by retraining the updated task-specific neural network”:
 ([Abstract] “… An example apparatus includes a network weight partitioner to partition unquantized network weights of a first network model into a first group to be quantized and a second group to be retrained. The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights being updated through retraining). ...”)
Further, YAO teaches “encoding and storing the updated delta weights”:
 ([Abstract] “… The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). The example apparatus includes a network model deployer to deploy a low-bit network model including the low-bit second network weights (delta weights are stored).”)
And further:
 	([0057] “… The weights in the first group are quantized to be either powers of two or zero by a variable-length encoding method, forming a low-precision base for an original model (here, the weights are encoded). ...”)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 2-3, & 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over YAO, as applied to claims above, and further in view of Brownlee, J. “A Gentle Introduction to Early Stopping to Avoid Overtraining Neural Networks.” Available at https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/ on August 6 2019 (hereafter, BROWNLEE)
	Regarding claim 2, YAO teaches the limitations of claim 1. Further YAO fails to explicitly teach “wherein the encoding and storing of the delta weights comprises: determining whether to terminate the retraining of the second neural network based on a preset accuracy standard with respect to the second neural network; and encoding and storing the delta weights updated by retraining of the second neural network based on a determination to terminate the retraining of the second neural network.”
However, analogous art, BROWNLEE, does teach “determining whether to terminate the retraining of the… neural network based on a preset accuracy standard with respect to the… neural network”:
 ([Stop Training When Generalization Error Increases, Paragraphs 1-2] “An alternative approach is to train the model once for a large number of training epochs.
During training, the model is evaluated on a holdout validation dataset after each epoch (after each training cycle, the accuracy of the network is validated against a preset accuracy standard (holdout validation dataset)). If the performance of the model on the validation dataset starts to degrade (e.g. loss begins to increase or accuracy begins to decrease), then the training process is stopped. (The decision of whether training is stopped is based on the preset accuracy standard with respect to the neural network model.)”)
Further, BROWNLEE teaches “storing the… weights updated by retraining of the… neural network based on a determination to terminate the retraining of the… neural network”:
 ([Stop Training When Generalization Error Increases, Paragraphs 3-4] “The model at the time that training is stopped is then used and is known to have good generalization performance.
This procedure is called “early stopping” and is perhaps one of the oldest and most widely used forms of neural network regularization.”)
And further:
([Model Choice, Paragraphs 1-3] “At the time that training is halted, the model is known to have slightly worse generalization error than a model at a prior epoch.
As such, some consideration may need to be given as to exactly which model is saved. Specifically, the training epoch from which weights in the model that are saved to file (weights are stored).
This will depend on the trigger chosen to stop the training process. For example, if the trigger is a simple decrease in performance from one epoch to the next, then the weights for the model at the prior epoch will be preferred.”)
Further, when BROWNLEE is combined with YAO, the “delta weights” for the “second neural network” will be affected. In addition, weights are encoded as is described by YAO:
 ([0057] “The weights in the first group are quantized to be either powers of two or zero by a variable-length encoding method, forming a low-precision base for an original model (here, the weights are encoded). ...”)

It would be obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to combine the base reference of YAO with the teachings of BROWNLEE because YAO teaches compression methods for optimizing performance of neural networks, while BROWNLEE teaches optimal methods for training neural networks.
One of ordinary skill in the art would be motivated to do so because, as BROWNLEE points out in its first few paragraphs, “Too little training will mean that the model will underfit the train and the test sets. Too much training will mean that the model will overfit the training dataset and have poor performance on the test set.
A compromise is to train on the training dataset but to stop training at the point when performance on a validation dataset starts to degrade. This simple, effective, and widely used approach to training neural networks is called early stopping.”

Regarding claim 3, YAO in view of BROWNLEE teaches the limitations of claim 2. Further, BROWNLEE teaches “in response to a determination not to terminate the retraining of the… neural network, iteratively performing the… retraining of the… neural network”:
 ([Stop Training When Generalization Error Increases, Paragraphs 1-2] “An alternative approach is to train the model once for a large number of training epochs.
During training, the model is evaluated on a holdout validation dataset after each epoch (after each training cycle, the accuracy of the network is validated against a preset accuracy standard (holdout validation dataset)). If the performance of the model on the validation dataset starts to degrade (e.g. loss begins to increase or accuracy begins to decrease), then the training process is stopped. (The decision of whether training is stopped is based on the preset accuracy standard with respect to the neural network model, meaning that if the accuracy is still not degrading, retraining continues in an iterative fashion.)”)
When BROWNLEE is combined with YAO, this will affect the “delta weights” of the “second neural network” which was “updated based on the compressed delta weights and the weights of the first neural network” as taught by YAO: 
([Abstract] “… The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). ...”) Applying the “weight updater” to “update the second group of network weights based on the difference” and “deploy a low-bit network model” that includes “the low-bit second network weights” correlates directly to retraining the second neural network updated based on the compressed delta weights, which are directly influenced by “the weights of the first neural network.”

Regarding claims 16-17, YAO teaches the limitations of claim 15. Further, claims 16-17 comprise similar additional limitations as claims 2-3, respectively, and are rejected under the same rationale.

Claims 4, 5, & 7 are rejected under 35 U.S.C. 103 as being unpatentable over YAO, as applied to claims above, and further in view of Li, X. et al. US Patent No. US 10,984,308 B2 (hereafter, LI)
Regarding claim 4, YAO teaches the limitations of claim 1. YAO fails to explicitly teach “encoding the delta weights by metadata comprising position information of non-zero delta weights of the delta weights; and storing the metadata corresponding to the second neural network.” However, analogous art, LI, does teach “encoding the… weights by metadata comprising position information of non-zero… weights; and storing the metadata corresponding to the… neural network” 
([Col. 2, Lines 13-67] “FIG. 2 shows a compression method which was proposed by one of the inventors in earlier works. 
As shown in FIG. 2, the compression method comprises learning, pruning, and training the neural network. In the first step, it learns which connection is important by training connectivity… studies show that in the matrix of a trained neural network model, elements with larger weights represent important connections, while other elements with smaller weights have relatively small impact and can be removed (e.g., set to zero). Thus, low-weight connections are pruned, converting a dense network into a sparse network. 
…
CRS and CCS
As mentioned above, for a sparse matrix, it is desired to compress the matrix in order to reduce the memory requirements. It has been proposed to store sparse matrix by Compressed Row Storage (CRS) or Compressed Column Storage (CCS) (metadata comprising position information). 
In order to exploit the sparsity of activations, encoded sparse weight matrix W can be stored in a variation of compressed column storage (CCS) format.
For each column W1 of matrix W, it stores a vector v that contains the non-zero weights (encoded weights with metadata comprising position information of non-zero weights)…
Storing the sparse matrix by columns in CCS format makes it easy to exploit activation sparsity. It simply multiplies each non-zero activation by all of the non-zero elements in its corresponding column.”) When combined with YAO, this would naturally apply to the “delta weights” of the “second neural network”.

It would be obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to combine the base reference of YAO with the teachings of LI because both references disclose methods of compression in regard to neural networks.
One of ordinary skill in the art would be motivated to do so because, as LI points out in Col. 2, Lines 65 onward, “Storing the sparse matrix by columns in CCS format makes it easy to exploit activation sparsity. It simply multiplies each non-zero activation by all of the non-zero elements in its corresponding column.”

Regarding claim 5, YAO teaches the limitations of claim 1. YAO fails to explicitly teach “wherein the compressing of the delta weights comprises performing pruning to modify a weight, which is less than or equal to a predetermined threshold, of the delta weights to be 0.” However, analogous art, LI, does teach “wherein the compressing of the… weights comprises performing pruning to modify a weight, which is less than or equal to a predetermined threshold, of the… weights to be 0”: 
([Col. 2, Lines 13-33] “FIG. 2 shows a compression method (compression of weights) which was proposed by one of the inventors in earlier works. 
As shown in FIG. 2, the compression method comprises learning, pruning, and training the neural network. In the first step, it learns which connection is important by training connectivity. The second step is to prune the low-weight connections (performing pruning to modify weights less than or equal to a predetermined threshold). In the third step, it retrains the neural networks by fine-tuning the weights of neural network. In recent years, studies show that in the matrix of a trained neural network model, elements with larger weights represent important connections, while other elements with smaller weights have relatively small impact and can be removed (e.g., set to zero) (pruned weights are set to zero). Thus, low-weight (a predetermined threshold) connections are pruned, converting a dense network into a sparse network. 
FIG. 3 shows synapses and neurons before and after pruning according to the method proposed in FIG. 2.
The final step of FIG. 2 involves retraining the sparse network to learn the final weights for the remaining sparse connections. By retraining the sparse network, the remaining weights in the matrix can be adjusted, ensuring that the accuracy of the network will not be compromised.”)

It would be obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to combine the base reference of YAO with the teachings of LI because both references disclose methods of compression in regard to neural networks.
One of ordinary skill in the art would be motivated to do so because, as LI points out in Col. 1, Line 64 onward, “Some of the advanced neural network models might have hundreds of layers and 65 billions of connections, and the implementation thereof is both calculation-centric and memory-centric. Since neural networks are becoming larger, it is critical to compress neural network models into smaller scale” and in Col. 2, Lines 29-34, “The final step of FIG. 2 involves retraining the sparse network to learn the final weights for the remaining sparse connections. By retraining the sparse network, the remaining weights in the matrix can be adjusted, ensuring that the accuracy of the network will not be compromised.”

Regarding claim 7, YAO teaches the limitations of claim 1. Further, YAO teaches “generating the second neural network, which is trained to perform the predetermined purpose, based on the delta weights… and the weight of the first neural network” 
([0051] “In certain examples, an incremental network quantization strategy is provided to convert a pre-trained full precision deep neural network model (a base model, pre-trained for a specific predetermined purpose) into a lossless low precision version of that model (a second model generated by fine-tuning a first neural network). This strategy is further improved through a different weight partition strategy, quantization goals, and optimization formulations provided through explicit-loss-error-aware quantization. By determining a loss error as the low-precision network model is formed from the full-precision network model (generating the second model by fine-tuning the first model), the composition, quality, and effectiveness of the low-bit, low-precision network model can be improved.”)
And further:
([Abstract] “…The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). ...”)
And further:
 ([Abstract] “… The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). ...”) Applying the “weight updater” to “update the second group of network weights based on the difference” and “deploy a low-bit network model” that includes “the low-bit second network weights” correlates directly to retraining the second neural network updated based on the compressed delta weights, which are directly influenced by “the weights of the first neural network.”
YAO fails to explicitly teach “the delta weights…which are encoded and stored…”. However, analogous art, LI, does teach this:
 ([Col. 2, Lines 13-67] “FIG. 2 shows a compression method which was proposed by one of the inventors in earlier works. 
As shown in FIG. 2, the compression method comprises learning, pruning, and training the neural network… studies show that in the matrix of a trained neural network model, elements with larger weights represent important connections, while other elements with smaller weights have relatively small impact and can be removed (e.g., set to zero)…
FIG. 3 shows synapses and neurons before and after pruning according to the method proposed in FIG. 2.
The final step of FIG. 2 involves retraining the sparse network to learn the final weights for the remaining sparse connections. By retraining the sparse network, the remaining weights in the matrix can be adjusted, ensuring that the accuracy of the network will not be compromised.
…As mentioned above, for a sparse matrix, it is desired to compress the matrix in order to reduce the memory requirements. It has been proposed to store sparse matrix by Compressed Row Storage (CRS) or Compressed Column Storage (CCS) (metadata comprising position information). 
In order to exploit the sparsity of activations, encoded sparse weight matrix W can be stored in a variation of compressed column storage (CCS) format.
For each column W1 of matrix W, it stores a vector v that contains the non-zero weights (encoded weights with metadata comprising position information of non-zero weights), …
Storing the sparse matrix by columns in CCS format makes it easy to exploit activation sparsity. It simply multiplies each non-zero activation by all of the non-zero elements in its corresponding column.”) When combined with YAO, this would naturally apply to the “delta weights”.

Claims 8, 10, 11, & 13 are rejected under 35 U.S.C. 103 as being unpatentable over YAO, as applied to claims above, and further in view of Illés, T. “Disjoint Datasets in Multi-task Learning with Deep Neural Networks for Autonomous Driving.” Available at https://smartlabai.medium.com/disjoint-datasets-in-multi-task-learning-with-deep-neural-networks-for-autonomous-driving-f6b081f6a36f on February 25 2021 (hereafter, ILLES)
Regarding claim 8, YAO teaches “A method with neural network compression, the method comprising: generating a plurality of task-specific models by fine-tuning, based on a plurality of training data sets for a plurality of purposes, a base model which is pre-trained…”:
([0051] “In certain examples, an incremental network quantization strategy is provided to convert a pre-trained full precision deep neural network model (a base model, pre-trained) into a lossless low precision version of that model (a second model generated by fine-tuning a first neural network). This strategy is further improved through a different weight partition strategy, quantization goals, and optimization formulations provided through explicit-loss-error-aware quantization. By determining a loss error as the low-precision network model is formed from the full-precision network model (generating the second model by fine-tuning the first model), the composition, quality, and effectiveness of the low-bit, low-precision network model can be improved.”) Simply applying the method more than once will satisfy the condition of doing so for each of a plurality of task-specific models.
Further, YAO teaches “for each of the plurality of task-specific models, determining delta weights by determining differences between weights of the base model and weights of the task-specific model”:
 ([Abstract] “… The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights. In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). ...”) Simply applying the method more than once will satisfy the condition of doing so for each of a plurality of task-specific models.
Further, YAO teaches “for each the plurality of task-specific models, compressing the determined delta weights based on a preset standard corresponding to the task-specific model”:
 ([Abstract] “…The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). ...”) Simply applying the method more than once will satisfy the condition of doing so for each of a plurality of task-specific models.
Further, YAO teaches “compressing and storing the plurality of task-specific models based on the compressed delta weights corresponding to the plurality of task-specific models”:
 ([Abstract] “… The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). The example apparatus includes a network model deployer to deploy a low-bit network model including the low-bit second network weights (output model and delta weights are stored).”) Simply applying the method more than once will satisfy the condition of doing so for each of a plurality of task-specific models.
Further, YAO fails to explicitly teach “a base model… pre-trained corresponding to a plurality of training data sets for a plurality of purposes.” However, analogous art, ILLES, does teach this: 
([Introduction & My Goal Sections] “In Machine Learning (ML), we typically care about optimizing for a particular metric (a pre-determined purpose), whether this is a score on a certain benchmark or a business Key Performance Indicator (KPI). In order to do this, we generally train a single model or an ensemble of models to perform our desired task. We then fine-tune and tweak these models until their performance no longer increases. While we can generally achieve acceptable performance this way, by being laser-focused on our single task, we ignore information that might help us do even better on the metric we care about. Specifically, this information comes from the training signals of related tasks. By sharing representations between related tasks, we can enable our model to generalize better on our original task. This approach is called Multi-Task Learning (MTL).
My goal
…
My goal was to develop a multi-task learning method (a plurality of purposes), which is trained on disjoint datasets (a plurality of training data sets). These datasets are disjunct subsets for different tasks. In this case, it’s obvious that some performance loss will occur. The question is that, how much is this loss, is it possible to minimalize and is there any solution, which can achieve the performance of the standard multi-task way.”)

It would be obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to combine the base reference of YAO with the teachings of ILLES because YAO discusses methods for compressing a neural network to make it lighter while ILLES discusses using a single large neural network on a variety of training sets to achieve multiple purposes, where compression may be beneficial.
One of ordinary skill in the art would be motivated to do so because, as ILLES points out in its final paragraph, “With perfecting the algorithms and fine-tuning the ideas I got better and better solutions and with knowledge distillation, I could approach the baseline. So the answer to the question, that loss can be minimalized, is absolutely YES!”

Regarding claim 10, YAO in view of ILLES teaches the limitations of claim 8. Further, YAO teaches “wherein the compressing of the determined delta weights comprises performing quantization to reduce the delta weights to a predetermined bit-width”: 
([0059] “INQ (Incremental Network Quantization) techniques described herein adopt a variable-length encoding. For example, INQ techniques can use 5-bit quantization (Quantization with a predetermined bit-width): ...”) Therefore, 5-bit or other low-bit settings may be used to set a predetermined bit-width for the quantization performed.

Regarding claim 11, YAO in view of ILLES teaches the limitations of claim 8. Further, YAO teaches “wherein the compressing and storing of the plurality of task- specific models comprises: for each of the plurality of task-specific models, retraining the task-specific model updated based on the weights of the base model and the compressed delta weights corresponding to the task-specific model”: 
([Abstract] “… The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). ...”) Applying the “weight updater” to “update the second group of network weights based on the difference” and “deploy a low-bit network model” that includes “the low-bit second network weights” correlates directly to retraining the second neural network updated based on the compressed delta weights, which are directly influenced by “the weights of the first neural network.” Simply applying the method more than once will satisfy the condition of doing so for each of a plurality of task-specific models.
Further, YAO teaches “for each of the plurality of task-specific models, encoding and storing delta weights corresponding to the task-specific model updated by the retraining”:
 ([Abstract] “… The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights (quantization is performed to compress the weights). In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss (determining a difference between weights of the first neural network and the second neural network). The example apparatus includes a weight updater to update the second group of network weights based on the difference (delta weights). The example apparatus includes a network model deployer to deploy a low-bit network model including the low-bit second network weights (delta weights are stored).”) Simply applying the method more than once will satisfy the condition of doing so for each of a plurality of task-specific models.
And further:
 	([0057] “… The weights in the first group are quantized to be either powers of two or zero by a variable-length encoding method, forming a low-precision base for an original model (here, the weights are encoded). …”) Simply applying the method more than once will satisfy the condition of doing so for each of a plurality of task-specific models.

Regarding claim 13, YAO in view of ILLES teaches the limitations of claim 8. Further, YAO teaches “wherein the preset standard comprises either one or both of a standard on a pruning ratio and a standard on a quantization bit-width”: 
([0059] “INQ (Incremental Network Quantization) techniques described herein adopt a variable-length encoding. For example, INQ techniques can use 5-bit quantization (Quantization with a predetermined bit-width/a pre-set standard on a quantization bit-width): ...”) Therefore, 5-bit or other low-bit settings may be used to set a predetermined bit-width for the quantization performed, which qualifies as a preset standard on a quantization bit-width. 

Claims 9 & 12 are rejected under 35 U.S.C. 103 as being unpatentable over YAO in view of ILLES, as applied to claims above, and further in view of LI 
Regarding claim 9, YAO in view of ILLES teaches the limitations of claim 8. YAO in view of ILLES fails to explicitly teach “wherein the compressing of the delta weights comprises performing pruning to modify a weight, which is less than or equal to a predetermined threshold, of the delta weights to be 0.” However, analogous art, LI, does teach “wherein the compressing of the… weights comprises performing pruning to modify a weight, which is less than or equal to a predetermined threshold, of the… weights to be 0”: 
([Col. 2, Lines 13-33] “FIG. 2 shows a compression method (compression of weights) which was proposed by one of the inventors in earlier works. 
As shown in FIG. 2, the compression method comprises learning, pruning, and training the neural network. In the first step, it learns which connection is important by training connectivity. The second step is to prune the low-weight connections (performing pruning to modify weights less than or equal to a predetermined threshold). In the third step, it retrains the neural networks by fine-tuning the weights of neural network. In recent years, studies show that in the matrix of a trained neural network model, elements with larger weights represent important connections, while other elements with smaller weights have relatively small impact and can be removed (e.g., set to zero) (pruned weights are set to zero). Thus, low-weight (a predetermined threshold) connections are pruned, converting a dense network into a sparse network. 
FIG. 3 shows synapses and neurons before and after pruning according to the method proposed in FIG. 2.
The final step of FIG. 2 involves retraining the sparse network to learn the final weights for the remaining sparse connections. By retraining the sparse network, the remaining weights in the matrix can be adjusted, ensuring that the accuracy of the network will not be compromised.”)

It would be obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to combine the base reference of YAO in view of ILLES with the teachings of LI because both references disclose methods of compression in regard to neural networks.
One of ordinary skill in the art would be motivated to do so because, as LI points out in Col. 1, Line 64 onward, “Some of the advanced neural network models might have hundreds of layers and 65 billions of connections, and the implementation thereof is both calculation-centric and memory-centric. Since neural networks are becoming larger, it is critical to compress neural network models into smaller scale” and in Col. 2, Lines 29-34, “The final step of FIG. 2 involves retraining the sparse network to learn the final weights for the remaining sparse connections. By retraining the sparse network, the remaining weights in the matrix can be adjusted, ensuring that the accuracy of the network will not be compromised.”

Regarding claim 12, YAO in view of ILLES teaches the limitations of claim 11. YAO in view of ILLES fails to explicitly teach “encoding the delta weights by metadata comprising position information of non-zero delta weights of the delta weights; and storing the metadata corresponding to the task-specific models.” However, analogous art, LI, does teach “encoding the… weights by metadata comprising position information of non-zero… weights; and storing the metadata corresponding to the… models”: 
 ([Col. 2, Lines 13-67] “FIG. 2 shows a compression method which was proposed by one of the inventors in earlier works. 
As shown in FIG. 2, the compression method comprises learning, pruning, and training the neural network… studies show that in the matrix of a trained neural network model, elements with larger weights represent important connections, while other elements with smaller weights have relatively small impact and can be removed (e.g., set to zero). Thus, low-weight connections are pruned, converting a dense network into a sparse network. 
FIG. 3 shows synapses and neurons before and after pruning according to the method proposed in FIG. 2.
The final step of FIG. 2 involves retraining the sparse network to learn the final weights for the remaining sparse connections. By retraining the sparse network, the remaining weights in the matrix can be adjusted, ensuring that the accuracy of the network will not be compromised.
…
CRS and CCS
As mentioned above, for a sparse matrix, it is desired to compress the matrix in order to reduce the memory requirements. It has been proposed to store sparse matrix by Compressed Row Storage (CRS) or Compressed Column Storage (CCS) (metadata comprising position information). 
In order to exploit the sparsity of activations, encoded sparse weight matrix W can be stored in a variation of compressed column storage (CCS) format.
For each column W1 of matrix W, it stores a vector v that contains the non-zero weights (encoded weights with metadata comprising position information of non-zero weights), and a second, equal-length vector z that encodes the number of zeros before the corresponding entry in v. …
Storing the sparse matrix by columns in CCS format makes it easy to exploit activation sparsity. It simply multiplies each non-zero activation by all of the non-zero elements in its corresponding column.”) When combined with YAO in view of ILLES, this would naturally apply to the “delta weights” of the “task-specific models”.

It would be obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to combine the base reference of YAO in view of ILLES with the teachings of LI because both references disclose methods of compression in regard to neural networks.
One of ordinary skill in the art would be motivated to do so because, as LI points out in Col. 2, Lines 65 onward, “Storing the sparse matrix by columns in CCS format makes it easy to exploit activation sparsity. It simply multiplies each non-zero activation by all of the non-zero elements in its corresponding column.”

Claims 19-21 are rejected under 35 U.S.C. 103 as being unpatentable over YAO, as applied to claims above, and further in view of Hu, E. et al. “LoRA: Low-Rank Adaptation of Large Language Models.” Available at https://www.semanticscholar.org/reader/a8ca46b171467ceb2d7652fbfb67fe701ad86092 on October 16 2021 (hereafter, HU)
Regarding claim 19, YAO teaches the limitations of claim 18. YAO fails to explicitly teach “wherein the updating of the task-specific neural network comprises summing the weights of the base neural network and the compressed delta weights.” However, analogous art, HU, does teach this:
 ([Page 4, 4.1 Low-Rank-Parameterized Update Matrices] “A neural network contains many dense layers which perform matrix multiplication. The weight matrices in these layers typically have full-rank. When adapting to a specific task, Aghajanyan et al. (2020) shows that the pre-trained language models have a low “intrinsic dimension” and can still learn efficiently despite a random projection to a smaller subspace. Inspired by this, we hypothesize the updates to the weights also have a low “intrinsic rank” during adaptation. For a pre-trained weight matrix W0 ∈ Rd×k, we constrain its update by representing the latter with a low-rank decomposition W0 + ∆W = W0 + BA, where B ∈ Rd×r, A ∈ Rr×k, and the rank r << min(d, k). (Where W0 represents the base models weights and B and A are delta weights generated from a compressed version of the base) During training, W0 is frozen (the base is not updated) and does not receive gradient updates, while A and B contain trainable parameters. Note both W0 and ∆W = BA are multiplied with the same input, and their respective output vectors are summed coordinate-wise. (The outputs of each are summed) For h = W0x, our modified forward pass yields:
h = W0x + ∆W x = W0x + BAx                    (3)
We illustrate our reparameterization in Figure 1… 
A Generalization of Full Fine-tuning. … as we increase the number of trainable parameters3, training LoRA roughly converges to training the original model (The original model is trained based on the stored delta weights), while adapter-based methods converges to an MLP and prefix-based methods to a model that cannot take long input sequences...”)

It would be obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to combine the base reference of YAO with the teachings of HU because both references teach optimization methods in regards to neural networks.
One of ordinary skill in the art would be motivated to do so because, as HU points out in the conclusion, “it allows for quick task-switching when deployed as a service by sharing the vast majority of the model parameters. While we focused on Transformer language models, the proposed principles are generally applicable to any neural networks with dense layers.”

Regarding claim 20, YAO teaches the limitations of claim 18. YAO fails to explicitly teach “updating the pre-trained base neural network based on the stored delta weights; and performing the predetermined task by implementing the updated base neural network.” However, analogous art, HU, does teach “updating the pre-trained base neural network based on the stored delta weights”:
 ([Page 4, 4.1 … as we increase the number of trainable parameters3, training LoRA roughly converges to training the original model (The original model is trained based on the stored delta weights), while adapter-based methods converges to an MLP and prefix-based methods to a model that cannot take long input sequences.”)
Further, HU teaches “performing the predetermined task by implementing the updated base neural network”:
 ([Page 5, 5. Empirical Experiments] “We evaluate the downstream task performance of LoRA (implementing the updated neural network) on RoBERTa (Liu et al., 2019), De-BERTa (He et al., 2021), and GPT-2 (Radford et al., b), before scaling up to GPT-3 175B (Brown et al., 2020). Our experiments cover a wide range of tasks, from natural language understanding (NLU) to generation (NLG) (to perform pre-determined tasks). Specifically, we evaluate on the GLUE (Wang et al., 2019) benchmark for RoBERTa and DeBERTa. We follow the setup of Li & Liang (2021) on GPT-2 for a direct comparison and add WikiSQL (Zhong et al., 2017) (NL to SQL queries) and SAMSum (Gliwa et al., 2019) (conversation summarization) for large-scale experiments on GPT-3. See Appendix C for more details on the datasets we use. We use NVIDIA Tesla V100 for all experiments”)

Regarding claim 21, YAO in view of HU teaches the limitations of claim 20. Further, YAO teaches “wherein the stored delta weights are stored in an external device”:
 ([0118] “The processor platform 1100 of the illustrated example also includes one or more mass storage devices 1128 for storing software and/or data (including the delta weights). Examples of such mass storage devices 1128 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, RAID systems, and digital versatile disk (DVD) drives (external devices).”)
Further, YAO teaches “the implementing of the updated base neural network comprises loading the stored delta weights by a user device”:
 ([0117] “The interface circuit 1120 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem and/or network interface card to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 1126 (loading stored data such as the delta weights onto any device in the network) (e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, coaxial cable, a cellular telephone system (a user device), etc.).”)

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW LEE LEWIS whose telephone number is (571)272-1906. The examiner can normally be reached Monday: 12:00PM - 4:00PM and Tuesday - Friday: 12:00PM - 9PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at (571)272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Matthew Lee Lewis/Examiner, Art Unit 2144                                                                                                                                                                                                        
/TAMARA T KYLE/Supervisory Patent Examiner, Art Unit 2144
Read full office action
Prosecution Timeline

Show 1 earlier event
Aug 13, 2025
Non-Final Rejection mailed — §101, §102, §103
Nov 13, 2025
Response Filed
Nov 22, 2025
Interview Requested
Dec 02, 2025
Applicant Interview (Telephonic)
Dec 02, 2025
Examiner Interview Summary
Feb 17, 2026
Final Rejection mailed — §101, §102, §103
May 16, 2026
Request for Continued Examination
May 20, 2026
Response after Non-Final Action
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
Grant Probability
With Interview (+0.0%)
3y 11m (~2m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 3 resolved cases by this examiner. Grant probability derived from career allowance rate.