Last updated: April 19, 2026
Application No. 17/887,021
ELECTRONIC DEVICE AND METHOD WITH SENSITIVITY-BASED QUANTIZED TRAINING AND OPERATION

Final Rejection §101§103
Filed
Aug 12, 2022
Examiner
NAULT, VICTOR ADELARD
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Samsung Electronics Co., Ltd.
OA Round
2 (Final)
Interview Optional

— +83.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 13 resolved cases, 2023–2026
Examiner Intelligence

NAULT, VICTOR ADELARD View full profile →
Grants 62% of resolved cases
Career Allow Rate
8 granted / 13 resolved
+6.5% vs TC avg
Strong +83% interview lift
Without
With
+83.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
29.1%
-10.9% vs TC avg
§103
40.4%
+0.4% vs TC avg
§102
7.5%
-32.5% vs TC avg
§112
21.4%
-18.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 13 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on 10/15/2025, in which claims 1-3, 5, 7-9, 11, and 13 are amended. No claims are newly cancelled. Claim 21 is newly added.
Claims 1-21 are currently pending.

Response to Arguments
With regards to the rejections of claims 1-20 under 35 U.S.C. 101 as directed towards abstract ideas, Applicant’s arguments that the claims are eligible have been considered but are not found persuasive. 
Applicant first argues that at least claim 1 is eligible under Step 2A, Prong One, asserting that the claim does not recite any abstract idea. Applicant states on pages 8 and 9 of the Remarks that the claim limitations “describe high-level operations that cannot practically be performed in the human mind, excluding them from the mental processes grouping”.
Examiner respectfully disagrees that no mental processes are recited. The limitations of generate, based on a determination of sensitivity of layers in a model to be trained, sensitivity results; and and…apply…to a layer of the layers with a low sensitivity of the sensitivity results lower than a predetermined threshold recite an evaluation of sensitivity of portions of a machine learning model and a judgement of whether a sensitivity is below a threshold, respectively, both of which are well within the capabilities of a human mind. The limitations of “(d) detecting one or more anomalies in a data set using the trained ANN;” and “(e) analyzing the one or more detected anomalies using the trained ANN to generate anomaly data;” in Example 47, Claim 2 of the July 2024 Subject Matter Eligibility Examples, which was found to be ineligible under 35 U.S.C. 101, are analogous.
Further, Applicant argues that claim 1 “do[es] not explicitly set forth mathematical equations, relationships, or calculations (e.g., no recitation of backpropagation formulas or gradient descent algorithms (e.g., no recitation of backpropagation formulas or gradient descent algorithms, as in ineligible Claim 1 of Example 47 in the 2024 Al Guidance Examples)…The distributed training operations (backward propagation, gradient-dependent weight updates) further involve computational processes, not abstract math or mental steps”.
Examiner respectfully disagrees. Claim 1 as amended recites perform one or more operations of a distributed training on the model…where the one or more operations include at least one of an operation of backward propagation of the layer or an operation of updating a weight of the model dependent on a calculated gradient. This is analogous to the limitation “(c) training, by the computer, the ANN based on the input data and a selected training algorithm to generate a trained ANN, wherein the selected training algorithm includes a backpropagation algorithm and a gradient descent algorithm;” in Example 47, Claim 2 of the July 2024 Subject Matter Eligibility Examples, which states on page 7: “Step (c) requires specific mathematical calculations (a backpropagation algorithm and a gradient descent algorithm) to perform the training of the ANN and therefore encompasses mathematical concepts”. Operations of distributed training, backward propagation, and updating a model weight using a calculated gradient are mathematical concepts under analogous reasoning, even if performed using a generic computer.
Applicant further argues that at least claim 1 is eligible under Step 2A, Prong Two, asserting that the claim integrates any recited abstract ideas into a practical application. Applicant lists several technical benefits of the invention stated within the specification on pages 9 and 10 of the Remarks. 
Examiner respectfully disagrees that claim 1 is eligible at Step 2A, Prong Two of the Subject Matter Eligibility Test. Although Examiner does not dispute that the invention provides the described improvements, Examiner notes that MPEP 2106.04(d).III. states “Because a judicial exception alone is not eligible subject matter, if there are no additional claim elements besides the judicial exception, or if the additional claim elements merely recite another judicial exception, that is insufficient to integrate the judicial exception into a practical application”, i.e. that an improvement provided solely by improving an abstract idea itself is ineligible. Examiner further notes that most limitations within claim 1 have been identified as reflecting abstract ideas, as determining sensitivity can be done mentally and training using backpropagation or calculated gradients amounts to a mathematical concept.
The only further limitations within claim 1 that are not identified as abstract ideas relate to applying generic computer components or applying generic training or quantization and do not integrate any recited judicial exceptions into a practical application themselves, as shown in the 101 rejection below.
Applicant further argues that at least claim 1 is eligible at Step 2B of the Subject Matter Eligibility Test for reciting significantly more than any recited judicial exceptions. Applicant recites on page 11 of the Remarks: “the claim amounts to significantly more than any exception, as the combination of sensitivity-threshold-based quantization in distributed operations is unconventional… providing an inventive concept beyond well-understood, routine activities. The 2025 Memo (at 7) reminds examiners that evidence of conventionality is required, bolstering this argument”.
Examiner respectfully disagrees. First, although Examiner acknowledges that MPEP 2106.05(d) states “If the additional element (or combination of elements) is a specific limitation other than what is well-understood, routine and conventional in the field, for instance because it is an unconventional step that confines the claim to a particular useful application of the judicial exception, then this consideration favors eligibility”, neither the MPEP or the December 4th 2025 “Memorandum on Subject Matter Eligibility Declarations” outright states that evidence of conventionality is required to refute an assertion that a claim element amounts to significantly more than an abstract idea.
However, the point is moot, because MPEP 2106.05.I. states “An inventive concept ‘cannot be furnished by the unpatentable law of nature (or natural phenomenon or abstract idea) itself.’…Instead, an ‘inventive concept’ is furnished by an element or combination of elements that is recited in the claim in addition to (beyond) the judicial exception, and is sufficient to ensure that the claim as a whole amounts to significantly more than the judicial exception itself”. Determination of the sensitivity of portions of a machine learning model is identified as an abstract idea. The remaining limitations relate to applying generic computer components, or applying generic training or quantization and are thus not significantly more according to MPEP 2106.05(f), as shown in the 101 rejection below. However, Examiner notes that claim 5 as amended and new claim 21 have limitations with sufficient specificity that integrate any recited abstract ideas into a practical application.
With regards to the rejections of claims 1, 2, 6, 11, 12, 13, 17, 19, and 20 under 35 U.S.C. 103 for being unpatentable over Bijalwan et al. (U.S. Patent Application Publication No. 2023/0281423) in view of Shen et al. (U.S. Patent Application Publication No. 2022/0129736), Applicant argues that one of ordinary skill in the art would not have motivation to combine Bijalwan and Shen to teach the features of independent claim 1, alleging that Bijalwan teaches away from the use of mixed precision quantization. Examiner has considered Applicant’s arguments but respectfully disagrees. 
Applicant first argues that, despite reciting the word “training”, Bijalwan actually performs its quantization after training, and instead during a validation step. Applicant states on page 13 of the Remarks: “While Bijalwan uses the term ‘training,’ it appears only due to its using validation data that may also be used after a training loop (i.e., forward prop, back prop, gradient calc, weight adjust) to determine if trained model is overfitting (e.g., if so, then prune and train more), and Bijalwan appears to analogizing its process to sensitivity-based training, but is doing after real training”.
Examiner respectfully disagrees. According to “Train, Test, & Validation Sets explained” by Deeplizard, hereinafter Deeplizard: (Deeplizard Pg. 2) “The validation set is a set of data, separate from the training set, that is used to validate our model during training. This validation process helps give information that may assist us with adjusting our hyperparameters. Recall how we just mentioned that with each epoch during training, the model will be trained on the data in the training set. Well, it will also simultaneously be validated on the data in the validation set”. Although Bijalwan might be performing quantization using a validation dataset, which validates the quantization, this is still intertwined and simultaneous with training on a training dataset, and part of the overall training process, and absolutely does not teach away from mixed precision quantization during training, nor does the quantization of Bijalwan take place ‘after real training’.
Applicant also states on page 13 of the Remarks that in Bijalwan “’Quantization aware training’ is explicitly claimed as the technique, positioning the invention as QAT. However, the process appears to lack QAT's core feature: parallel training with a quantized version to adjust weights. Instead, it's PTQ with sensitivity awareness”. 
Examiner respectfully disagrees. Bijalwan states “(Bijalwan [0038]) “Quantization of the neural network model may be performed using two techniques - Post-training quantization and Quantization-aware training. Post-training quantization is a technique in which the neural network is trained using floating-point computation and then quantized after the training. Quantization-aware training generates a quantized version of the neural network in a forward pass and parallelly trains the neural network using the quantized version. Methods in the present disclosure preferably employ Quantization-aware training technique”. As stated in Deeplizard earlier, validation on validation data is simultaneous with training on training data, and thus Bijalwan in no way teaches quantization after training.
Examiner does acknowledge that in claim 1 as amended, further specifics on training, including distributed training, are recited, which are not taught explicitly by Bijalwan or Shen, however claim 1 is rejected under a new combination of references, as detailed below. Independent claim 12 is not amended and its rejection under 103 is maintained.

Claim Objections
Claim 13 objected to because of the following informality: processing, without quantization, a layer with a high sensitivity of the sensitivity results higher than or equal to the predetermined threshold with a second precision, higher than the first precision, without quantization should read “processing, without quantization, a layer with a high sensitivity of the sensitivity results higher than or equal to the predetermined threshold with a second precision, higher than the first precision”.  Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a No therefor, subject to the conditions and requirements of this title.

Claims 1-4 and 6-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract ideas without significantly more. Claims 5 and 21 are determined to integrate any recited abstract ideas into a practical application.

Regarding claim 1,
Step 1 - “Is the claim to a process, machine, manufacture or composition of matter?”
Yes, the claim is directed towards a machine.
Step 2A, Prong 1 - “Is the claim directed to a law of nature, a natural phenomenon (product of nature) or an abstract idea?”:
The limitation of generate, based on a determination of sensitivity of layers in a model to be trained, sensitivity results; recites an evaluation of the sensitivity of layers in a model, which is a mental process, which is an abstract idea, regardless of if it’s performed on a generic computer.
The limitation of and…applying…to a layer of the layers with a low sensitivity of the sensitivity results lower than a predetermined threshold recites a judgement of the sensitivity and a threshold, which is a mental process, which is an abstract idea, regardless of if it’s performed on a generic computer.
The limitation of where the one or more operations include at least one of an operation of backward propagation of the layer or an operation of updating a weight of the model dependent on a calculated gradient recites the mathematical calculations of backward propagation operations and gradient calculation, which are mathematical concepts, which are abstract ideas.
Step 2A, Prong 2 - “Does the claim recite additional elements that integrate the judicial exception into a practical application?”:
The limitation of a processor; and a memory configured to store instructions executable by the processor, wherein the processor is configured to, in response to the instructions being executed by the processor: recites mere instructions to apply judicial exceptions with generic computer components, MPEP 2106.05(d) and 2106.05(f).
The limitation of and perform one or more operations of a distributed training on the model by applying quantization recites mere instructions to apply distributed training and quantization, MPEP 2106.05(d) and 2106.05(f).
Step 2B - “Does the claim recite additional elements that amount to significantly more than the judicial exception?”:
The limitation of a processor; and a memory configured to store instructions executable by the processor, wherein the processor is configured to, in response to the instructions being executed by the processor: recites mere instructions to apply judicial exceptions with generic computer components, MPEP 2106.05(f).
The limitation of and perform one or more operations of a distributed training on the model by applying quantization recites mere instructions to apply distributed training and quantization, MPEP 2106.05(d) and 2106.05(f).
Therefore, claim 1 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 2,
Claim 2 adds the additional limitations to claim 1:
The limitation of process the layer with the low sensitivity lower than the predetermined threshold with a first precision by quantizing the layer; recites mere instructions to apply processing and quantization to a layer, MPEP 2106.05(d) and 2106.05(f).
The limitation of and process, without quantization, a layer with a high sensitivity of the sensitivity results higher than or equal to the predetermined threshold with a second precision, higher than the first precision recites mere instructions to apply processing to a layer, MPEP 2106.05(d) and 2106.05(f).
Therefore, claim 2 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 3,
Claim 3 adds the additional limitations to claim 1:
The limitation of performing forward propagation moving from a first layer to a last layer of the model; recites the mathematical formula of forward propagation, which is a mathematical concept, which is a mental process.
The limitation of performing the backward propagation moving from the last layer to the first layer of the model; recites the mathematical formula of backward propagation, which is a mathematical concept, which is a mental process.
The limitation of determining a mean value of gradients calculated in each of a plurality of nodes used for the distributed training of the model; recites a mathematical calculation of a mean value of gradients, which is a mathematical concept, which is a mental process.
The limitation of and performing the updating of the weight of the model based on the mean value recites a mathematical calculation of updating a model weight, which is a mathematical concept, which is a mental process.
Therefore, claim 3 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 4,
Claim 4 adds the additional limitations to claim 1:
The limitation of wherein the processor is further configured to periodically determine training sensitivity of the layers for each training of the model, or for each epoch or each of one or more iterations performed during the training of the model recites an evaluation of layer sensitivity at periodic intervals, which is a mental process, which is an abstract idea, regardless of if it’s performed on a generic computer.
Therefore, claim 4 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 6,
Claim 6 adds the additional limitations to claim 1:
The limitation of classify the sensitivity results of the layers into a plurality of levels; recites an evaluation of the sensitivity of layers, which is a mental process, which is an abstract idea, regardless of if it’s performed on a generic computer.
The limitation of and train the model by applying quantization to each of the layers with a precision at a level corresponding to each of the plurality of levels recites mere instructions to apply training and quantization to a model, MPEP 2106.05(d) and 2106.05(f).
Therefore, claim 6 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 7,
Claim 7 adds the additional limitations to claim 3:
The limitation of wherein the processor is further configured to train the model by applying quantization to the layer with the low sensitivity lower than the predetermined threshold in any one or any combination of the operations of the distributed training recites mere instructions to apply training and quantization to a model during distributed training operations, MPEP 2106.05(d) and 2106.05(f).
Therefore, claim 7 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 8,
Claim 8 adds the additional limitations to claim 7:
The limitation of wherein the processor is further configured to compress data used in any one or any combination of the operations of the distributed training recites mere instructions to apply data compression, MPEP 2106.05(d) and 2106.05(f).
Therefore, claim 8 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 9,
Claim 9 adds the additional limitations to claim 1:
The limitation of wherein the processor is further configured to train the model by scaling the calculated gradient recites a mathematical calculation of scaling a gradient, which is a mathematical concept, which is a mental process.
Therefore, claim 9 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 10,
Claim 10 adds the additional limitations to claim 3:
The limitation of wherein the processor is further configured to determine the mean value using "k" largest gradients of the gradients calculated in each of the plurality of nodes, recites a judgement of which gradients to use to determine a mean value, which is a mental process, which is an abstract idea, regardless of if it’s performed on a generic computer.
The limitation of or by applying a genetic algorithm to the gradients, recites mere instructions to apply a genetic algorithm, MPEP 2106.05(d) and 2106.05(f).
The limitation of where k is an integer recites a mere additional detail on the value of k, without changing that determin[ing] the mean value using "k" largest gradients is a judgment, which is a mental process, which is an abstract idea.
Therefore, claim 10 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 11,
Claim 11 adds the additional limitations to claim 1:
The limitation of wherein the model to be trained in the distributed training is pretrained, before the distributed training, with a precision without quantization recites mere instructions to apply pretraining to a model, MPEP 2106.05(d) and 2106.05(f).
Therefore, claim 11 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 12,
Step 1 - “Is the claim to a process, machine, manufacture or composition of matter?”
Yes, the claim is directed towards a process.
Step 2A, Prong 1 - “Is the claim directed to a law of nature, a natural phenomenon (product of nature) or an abstract idea?”:
The limitation of generating, based on a determination of sensitivity of layers in a model to be trained, sensitivity results; recites an evaluation of the sensitivity of layers in a model, which is a mental process, which is an abstract idea, regardless of if it’s performed on a generic computer.
The limitation of and…applying…to a layer of the layers with a low sensitivity of the sensitivity results lower than a predetermined threshold recites a judgement of the sensitivity and a threshold, which is a mental process, which is an abstract idea, regardless of if it’s performed on a generic computer.
Step 2A, Prong 2 - “Does the claim recite additional elements that integrate the judicial exception into a practical application?”:
The limitation of and training the model by applying quantization recites mere instructions to apply training and quantization, MPEP 2106.05(d) and 2106.05(f).
Step 2B - “Does the claim recite additional elements that amount to significantly more than the judicial exception?”:
The limitation of and training the model by applying quantization recites mere instructions to apply training and quantization, MPEP 2106.05(d) and 2106.05(f).
Therefore, claim 12 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 13,
Claim 13 adds the additional limitations to claim 12:
The limitation of processing the layer with the low sensitivity lower than the predetermined threshold with a first precision by quantizing the layer; recites mere instructions to apply processing and quantization to a layer, MPEP 2106.05(d) and 2106.05(f).
The limitation of and processing, without quantization, a layer with a high sensitivity of the sensitivity results higher than or equal to the predetermined threshold with a second precision, higher than the first precision, without quantization recites mere instructions to apply processing to a layer, MPEP 2106.05(d) and 2106.05(f).
Therefore, claim 13 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 14,
Claim 14 adds the additional limitations to claim 12:
The limitation of wherein the training of the model comprises performing, on the model, distributed training comprising operations of: recites mere instructions to apply distributed training to a model, MPEP 2106.05(d) and 2106.05(f).  
The limitation of performing forward propagation moving from a first layer to a last layer of the model; recites the mathematical formula of forward propagation, which is a mathematical concept, which is a mental process.
The limitation of performing backward propagation moving from the last layer to the first layer of the model; recites the mathematical formula of backward propagation, which is a mathematical concept, which is a mental process.
The limitation of determining a mean value of gradients calculated in each of a plurality of nodes used for the distributed training of the model; recites a mathematical calculation of a mean value of gradients, which is a mathematical concept, which is a mental process.
The limitation of and updating a weight of the model based on the mean value recites a mathematical calculation of updating a model weight, which is a mathematical concept, which is a mental process.
Therefore, claim 14 is found to be ineligible subject matter under 35 U.S.C. 101.
	
Regarding claim 15,
Claim 15 adds the additional limitations to claim 12:
The limitation of wherein the determining of the sensitivity comprises periodically determining training sensitivity of the layers for each training of the model, or for each epoch or each of one or more iterations performed during the training of the model recites an evaluation of layer sensitivity at periodic intervals, which is a mental process, which is an abstract idea, regardless of if it’s performed on a generic computer.
Therefore, claim 15 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 16,
Claim 16 adds the additional limitations to claim 12:
The limitation of the determining of the sensitivity comprises generating, based on a determination of channel-wise sensitivity of a tensor used for the model, channel-wise sensitivity results, recites an evaluation of the sensitivity of channels in a tensor, which is a mental process, which is an abstract idea, regardless of if it’s performed on a generic computer.
The limitation of and the training of the model comprises training the model by applying quantization to a channel with a low channel-wise sensitivity of the channel-wise sensitivity results lower than a second predetermined threshold recites mere instructions to apply training and quantization to a channel, MPEP 2106.05(d) and 2106.05(f).
Therefore, claim 16 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 17,
Claim 17 adds the additional limitations to claim 12:
The limitation of the determining of the sensitivity comprises classifying the sensitivity results of the layers into a plurality of levels; recites an evaluation of the sensitivity of layers, which is a mental process, which is an abstract idea, regardless of if it’s performed on a generic computer.
The limitation of and the training of the model comprises training the model by applying quantization to each of the layers with a precision at a level corresponding to each of the plurality of levels recites mere instructions to apply training and quantization to a model, MPEP 2106.05(d) and 2106.05(f).
Therefore, claim 17 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 18,
Claim 18 adds the additional limitations to claim 14:
The limitation of wherein the training of the model comprises training the model by applying quantization to the layer with the low sensitivity lower than the predetermined threshold in any one or any combination of the operations the distributed training comprises recites mere instructions to apply training and quantization to a model during distributed training operations, MPEP 2106.05(d) and 2106.05(f).
Therefore, claim 18 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 19,
Claim 19 adds the additional limitations to claim 12:
The limitation of wherein the model to be trained is pretrained with a precision without quantization recites mere instructions to apply pretraining to a model, MPEP 2106.05(d) and 2106.05(f).
Therefore, claim 19 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 20,
Claim 20 discloses a computer readable medium with instructions to perform the method of claim 12, with substantially the same limitations. Therefore the same analysis and rejection applied to claim 12 applies to claim 20. 
Therefore, claim 20 is found to be ineligible subject matter under 35 U.S.C. 101.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 12, 13, 17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Bijalwan et al. (U.S. Patent Application Publication No. 2023/0281423), hereinafter Bijalwan, in view of Shen et al. (U.S. Patent Application Publication No. 2022/0129736), hereinafter Shen.

Regarding claim 12,
Bijalwan teaches An operating method, comprising: 
generating, based on a determination of sensitivity of layers in a model to be trained, sensitivity results; ((Bijalwan [0055]) At block 408, the sensitivity evaluation module 114 normalizes the features sensitivity values and the weight sensitivity values of the plurality of layers independently and combines the sensitivity values into a union sensitivity list…The first row of the union sensitivity list illustrates that a weight sensitivity value corresponding to layer 1 indicated by "Layer_l_w" is 0.77)
Shen teaches the following further limitation more explicitly than Bijalwan:
and training the model by applying quantization to a layer of the layers with a low sensitivity of the sensitivity results lower than a predetermined threshold ((Shen [0028]) “when the value of the objective function corresponding to the first layer L1 is greater than the threshold, this indicates that the loss is small, and the processing unit 120 decides to quantize the first layer L1 with the second precision”, a small loss corresponds to a low sensitivity)
At the time of filing, one of ordinary skill in the art would have motivation to combine Bijalwan and Shen by taking the method for determining the sensitivity of layers in a model taught by Bijalwan and applying a threshold to layers based on sensitivity to determine whether to apply quantization, taught by Shen, as doing so imparts the benefit of optimizing the memory efficiency and speed of the model, increased by quantizing some layers, with respect to the accuracy of the model, increased by refraining from quantizing some layers, by quantizing the layers that are not sensitive enough to significantly decrease accuracy. Such a combination would be obvious.

Regarding claim 13,
	Bijalwan and Shen jointly teach The operating method of claim 12, wherein the training of the model comprises:
	Shen further teaches:
processing the layer with the low sensitivity lower than the predetermined threshold with a first precision by quantizing the layer; ((Shen [0028]) “when the value of the objective function corresponding to the first layer L1 is greater than the threshold, this indicates that the loss is small, and the processing unit 120 decides to quantize the first layer L1 with the second precision”, a small loss corresponds to a low sensitivity)
and processing, without quantization a layer with a high sensitivity of the sensitivity results higher than or equal to the predetermined threshold with a second precision, higher than the first precision, without quantization ((Shen [0028]) “when the values of the objective function corresponding to the second layer L2 and the third layer L3 is not greater than the threshold, this indicates that the loss is large, and the processing unit 120…does not quantize the second layer L2 and the third layer L3 (that is, the second layer L2 and the third layer L3 remain at the first precision)”, a large loss for a layer corresponds to a high sensitivity for the layer)
At the time of filing, one of ordinary skill in the art would have motivation to combine the electronic device jointly taught by Bijalwan and Shen for the parent claim of claim 13, claim 12. No new embodiments are introduced, so the reason to combine is the same as for the parent claim.

Regarding claim 17,
	Bijalwan and Shen jointly teach The operating method of claim 12,
Bijalwan further teaches:
the determining of the sensitivity comprises classifying the sensitivity results of the layers into a plurality of levels; ((Bijalwan [0037]) “The grouping module 116 may cluster the plurality of layers within the input neural network model into a plurality of groups based on the union sensitivity list”)
and the training of the model comprises training the model by applying quantization to each of the layers with a precision at a level corresponding to each of the plurality of levels ((Bijalwan [0037]) “In one embodiment, the grouping module 116 clusters the plurality of layers into a plurality of groups to quantize each group into a high precision format. In another embodiment, the grouping module 116 clusters the plurality of layers into another plurality of groups to quantize each group into a lower precision format”)
At the time of filing, one of ordinary skill in the art would have motivation to combine the electronic device jointly taught by Bijalwan and Shen for the parent claim of claim 17, claim 12. No new embodiments are introduced, so the reason to combine is the same as for the parent claim.

Regarding claim 19,
	Bijalwan and Shen jointly teach The operating method of claim 12, 
	Bijalwan further teaches:
wherein the model to be trained is pretrained with a precision without quantization ((Bijalwan [0036]) “The sensitivity evaluation module 114 receives data from the data acquisition module 230 and generates a union sensitivity list. In one embodiment, the sensitivity evaluation module 114 generates a base model from the input neural network model by representing the parameters of the input neural network model in high precision format and stores as base model 210”, a model in a high precision format corresponds to a model pretrained with a precision without quantization)
At the time of filing, one of ordinary skill in the art would have motivation to combine the electronic device jointly taught by Bijalwan and Shen for the parent claim of claim 19, claim 12. No new embodiments are introduced, so the reason to combine is the same as for the parent claim.

Regarding claim 20,
Claim 20 discloses a computer readable medium with instructions to perform the method of claim 12. All other limitations in claim 20 are substantially the same as those in claim 12, therefore the same rationale for rejection applies.

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Bijalwan in view of Shen, further in view of Dong et al. “HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks”, hereinafter Dong.

Regarding claim 15,
Bijalwan and Shen jointly teach The operating method of claim 12, 
Dong teaches the following further limitation that neither Bijalwan nor Shen explicitly teaches:
wherein the determining of the sensitivity comprises periodically determining training sensitivity of the layers for each training of the model, or for each epoch or each of one or more iterations performed during the training of the model ((Dong Pgs. 1-2)) “these searching methods can require a large amount of computational resources, are time-consuming, and, worst of all, the quality of quantization is very sensitive to the initialization of their search parameters and therefore unpredictable. This makes deployment of these methods in online learning scenarios especially challenging, as in these applications a new model is trained every few hours and needs to be quantized for efficient inference. To address these issues, recent work introduced HAWQ [7], a Hessian AWare Quantization framework. The main idea is to assign higher bit-precision to layers that are more sensitive, and lower bit-precision to less sensitive layers”, application of the method of determining training sensitivity of layers to online learning scenarios where a new model is trained and quantized every few hours corresponds to periodically determining training sensitivity of the layers for each training of the model)
At the time of filing, one of ordinary skill in the art would have motivation to combine Bijalwan, Shen, and Dong by taking the method for determining the sensitivity of layers in a model and applying quantization to layers in the model that are insensitive to quantization relative to a threshold, taught jointly by Bijalwan and Shen, and adding quantizing the layers, including determining the sensitivity of the layers to quantization, every time the model is trained, as taught by Dong, as doing so increases the effectiveness of the model by optimizing its accuracy relative to its size for the newly trained parameters. Such a combination would be obvious.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Bijalwan in view of Shen, further in view of Liu et al. “Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision”, hereinafter Liu.

Regarding claim 16,
Bijalwan and Shen jointly teach The operating method of claim 12, wherein
Liu teaches the following further limitations that neither Bijalwan nor Shen explicitly teaches:
the determining of the sensitivity comprises generating, based on a determination of channel-wise sensitivity of a tensor used for the model ((Liu Pg. 2) “The b-bit linear quantization amounts to approximate real numbers using the following quantization set Q…[.]Q denotes the nearest rounding operator w.r.t. Q…[.]Q can be generalized to higher dimensional tensors by first stretching them to one-dimensional vectors then applying Eq. 3”), channel-wise sensitivity results, ((Liu Pg. 3) “For a layer L with d-dimensional input, we adopt a simple criterion, output error, to determine the target channels. Output error is the difference of the output of a channel before and after quantization”, output error of a channel in response to quantization corresponds to a determination of channel-wise sensitivity)
and the training of the model comprises training the model by applying quantization to a channel with a low channel-wise sensitivity of the channel-wise sensitivity results lower than a second predetermined threshold ((Liu Pg. 1) “we propose multipoint quantization for post-training quantization, which can achieve the flexibility similar to mixed precision, but uses only a single precision level. The idea is to approximate a full-precision weight vector by a linear combination of multiple low-bit vectors. This allows us to use a larger number of low-bit vectors to approximate the weights of more important channels, while use less points to approximate the insensitive channels”, (Liu Pg. 3) “If e(w;w-hat;DL) is larger than a predefined threshold ϵ, we apply multipoint quantization to this channel”, applying multipoint quantization, which is higher precision, to a channel with error higher than a predefined threshold, while using normal, lower-precision quantization for insensitive channels, corresponds to applying quantization to a channel with sensitivity lower than a threshold)
At the time of filing, one of ordinary skill in the art would have motivation to combine Bijalwan, Shen, and Liu by taking the method for determining the sensitivity of layers in a model and applying quantization to layers in the model that are insensitive to quantization relative to a threshold, taught jointly by Bijalwan and Shen and adding quantizing tensors, including quantizing channels with low sensitivity relative to a threshold, as taught by Liu, as Liu teaches (Liu Pg. 2) “There are two common configurations for post-training quantization, per-layer quantization and per-channel quantization. Per-layer quantization assigns the same K and B for all the weights in the same layer. Per-channel quantization is more fine-grained, and it uses different K and B for different channels. The latter can achieve higher precision, but it also requires more complicated hardware design…We propose multipoint quantization, which can be implemented with common operands on commodity hardware”. Such a combination would be obvious.

Claims 1-3, 6-8, 11, 14, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Bijalwan, in view of Shen, further in view of Xu et al. (U.S. Patent Application Publication No. 2021/0295168), hereinafter Xu.

Regarding claim 1,
Bijalwan teaches An electronic device comprising: ((Bijalwan [0028]) “In one embodiment,
the system 100 comprises…at least one device such as a computing device 104”)
a processor; ((Bijalwan [0009]) “The system comprises a memory and a processor that is coupled to the memory”)
and a memory configured to store instructions executable by the processor, ((Bijalwan [0009]) “The system comprises a memory and a processor that is coupled to the memory”)
wherein the processor is configured to, in response to the instructions being executed by the processor: generate, based on a determination of sensitivity of layers in a model to be trained, sensitivity results; ((Bijalwan [0055]) At block 408, the sensitivity evaluation module 114 normalizes the features sensitivity values and the weight sensitivity values of the plurality of layers independently and combines the sensitivity values into a union sensitivity list…The first row of the union sensitivity list illustrates that a weight sensitivity value corresponding to layer 1 indicated by "Layer_l_w" is 0.77)
Shen teaches the following further limitation more explicitly than Bijalwan:
and perform one or more operations [of a distributed training] on the model by applying quantization to a layer of the layers with a low sensitivity of the sensitivity results lower than a predetermined threshold, ((Shen [0028]) “when the value of the objective function corresponding to the first layer L1 is greater than the threshold, this indicates that the loss is small, and the processing unit 120 decides to quantize the first layer L1 with the second precision”, a small loss corresponds to a low sensitivity, Shen does not teach distributed training)
At the time of filing, one of ordinary skill in the art would have motivation to combine Bijalwan and Shen by taking the electronic device for determining the sensitivity of layers in a model taught by Bijalwan and applying a threshold to layers based on sensitivity to determine whether to apply quantization, taught by Shen, as doing so imparts the benefit of optimizing the memory efficiency and speed of the model, increased by quantizing some layers, with respect to the accuracy of the model, increased by refraining from quantizing some layers, by quantizing the layers that are not sensitive enough to significantly decrease accuracy. Such a combination would be obvious.
Xu teaches the following further limitations that neither Bijalwan nor Shen teaches:
and perform one or more operations of a distributed training… ((Xu [0025]) “A distributed system can accelerate a training process by distributing the training process across multiple computing systems, which can be referred to as worker nodes. Training data can be split into multiple portions, with each portion to be processed by a worker node. Each worker node can perform the forward and backward propagation operations independently”)
where the one or more operations include at least one of an operation of backward propagation of the layer or an operation of updating a weight of the model dependent on a calculated gradient ((Xu [0023]) “As part of the training process, each neural network layer can then perform a backward propagation process to adjust the set of weights at each neural network layer. Specifically, the highest neural network layer can receive the set of gradients and compute, in a backward propagation operation, a set of first data gradients and a set of first weight gradients based on applying the set of weights to the input data gradients in similar mathematical operations as the forward propagation operation. The highest neural network layer can adjust the set of weights of the layer based on the set of first weight gradients”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Bijalwan, Shen, and Xu by taking the electronic device for determining the sensitivity of layers in a model and applying quantization to layers in the model that are insensitive to quantization relative to a threshold, taught jointly by Bijalwan and Shen, and adding the method of distributed training taught by Xu, as Xu teaches: (Xu [0025]) “A distributed system can accelerate a training process by distributing the training process across multiple computing systems”. Such a combination would be obvious.

Regarding claim 2,
	Bijalwan, Shen, and Xu jointly teach The electronic device of claim 1, wherein the processor is further configured to: 
	Shen further teaches:
process the layer with the low sensitivity lower than the predetermined threshold with a first precision by quantizing the layer; ((Shen [0028]) “when the value of the objective function corresponding to the first layer L1 is greater than the threshold, this indicates that the loss is small, and the processing unit 120 decides to quantize the first layer L1 with the second precision”, a small loss corresponds to a low sensitivity)
and process, without quantization, a layer with a high sensitivity of the sensitivity results higher than or equal to the predetermined threshold with a second precision, higher than the first precision ((Shen [0028]) “when the values of the objective function corresponding to the second layer L2 and the third layer L3 is not greater than the threshold, this indicates that the loss is large, and the processing unit 120…does not quantize the second layer L2 and the third layer L3 (that is, the second layer L2 and the third layer L3 remain at the first precision)”, a large loss for a layer corresponds to a high sensitivity for the layer)
At the time of filing, one of ordinary skill in the art would have motivation to combine the method jointly taught by Bijalwan, Shen, and Xu for the parent claim of claim 2, claim 1. No new embodiments are introduced, so the reason to combine is the same as for the parent claim.

Regarding claim 3,
	Bijalwan, Shen, and Xu jointly teach The electronic device of claim 1, wherein the distributed training one or more operations include:
	Xu further teaches:
performing forward propagation moving from a first layer to a last layer of the model; ((Xu [0021]) “During training of a neural network, a first neural network layer can receive training input data, combine the training input data with the weights (e.g., by multiplying the training input data with the weights and then summing the products) to generate first output data for the neural network layer, and propagate the output data to a second neural network layer, in a forward propagation operation…The forward propagation operations can start at the first neural network layer and end at the highest neural network layer”)
performing the backward propagation moving from the last layer to the first layer of the model; ((Xu [0023]) “As part of the training process, each neural network layer can then perform a backward propagation process to adjust the set of weights at each neural network layer…The backward propagation operations can start from the highest neural network layer and end at the first neural network layer”)
determining a mean value of gradients calculated in each of a plurality of nodes used for the distributed training of the model; ((Xu [0025]) “Each worker node can exchange its set of weight gradients with other worker nodes, and average its set of weight gradients and the sets of weight gradients received from other worker nodes”)
performing the updating of the weight of the model based on the mean value ((Xu [0025]) “Each computing node can have the same set of averaged weight gradients, and can then update a set of weights for each neural network layer based on the averaged weight gradients”)
At the time of filing, one of ordinary skill in the art would have motivation to combine the method jointly taught by Bijalwan, Shen, and Xu for the parent claim of claim 3, claim 1. No new embodiments are introduced, so the reason to combine is the same as for the parent claim.

Regarding claim 6,
	Bijalwan, Shen, and Xu jointly teach The electronic device of claim 1, wherein the processor is further configured to: 
	Bijalwan further teaches:
classify the sensitivity results of the layers into a plurality of levels; ((Bijalwan [0037]) “The grouping module 116 may cluster the plurality of layers within the input neural network model into a plurality of groups based on the union sensitivity list”)
and train the model by applying quantization to each of the layers with a precision at a level corresponding to each of the plurality of levels ((Bijalwan [0037]) “In one embodiment, the grouping module 116 clusters the plurality of layers into a plurality of groups to quantize each group into a high precision format. In another embodiment, the grouping module 116 clusters the plurality of layers into another plurality of groups to quantize each group into a lower precision format”)
At the time of filing, one of ordinary skill in the art would have motivation to combine the method jointly taught by Bijalwan, Shen, and Xu for the parent claim of claim 6, claim 1. No new embodiments are introduced, so the reason to combine is the same as for the parent claim.

Regarding claim 7,
Bijalwan, Shen, and Xu jointly teach The electronic device of claim 3, 
Xu further teaches:
wherein the processor is further configured to train the model by applying quantization [to the layer with the low sensitivity lower than the predetermined threshold] in any one or any combination of the operations of the distributed training ((Xu [0065]) “In some embodiments, a DMA controller 1002 at the worker node 120-1 may perform one or more of the compression tasks. For example, the DMA controller 1002 may utilize a gradient compression engine (GCE) to perform sparsity analysis, gradient clipping, quantization, and/or compression”, Bijalwan teaches quantization of a layer with low sensitivity below a threshold)
At the time of filing, one of ordinary skill in the art would have motivation to combine the method jointly taught by Bijalwan, Shen, and Xu for the parent claim of claim 7, claim 3. No new embodiments are introduced, so the reason to combine is the same as for the parent claim.

Regarding claim 8,
Bijalwan, Shen, and Xu jointly teach The electronic device of claim 7, 
wherein the processor is further configured to compress data used in any one or any combination of the operations of the distributed training ((Xu [0059]) “After uncompressed gradients 806 are computed by the transmitting worker node 802, but prior to transmission, the uncompressed gradients 806 are compressed by a compression module 812 at the transmitting worker node 802 to generate compressed gradients 808. The compressed gradients 808 are then transmitted from the transmitting worker node 802 to the receiving worker node 804”)
At the time of filing, one of ordinary skill in the art would have motivation to combine the method jointly taught by Bijalwan, Shen, and Xu for the parent claim of claim 8, claim 7. No new embodiments are introduced, so the reason to combine is the same as for the parent claim.

Regarding claim 11,
	Bijalwan, Shen, and Xu jointly teach The electronic device of claim 1, 
	Bijalwan further teaches:
wherein the model to be trained [in the distributed training] is pretrained, [before the distributed training,] with a precision without quantization ((Bijalwan [0036]) “The sensitivity evaluation module 114 receives data from the data acquisition module 230 and generates a union sensitivity list. In one embodiment, the sensitivity evaluation module 114 generates a base model from the input neural network model by representing the parameters of the input neural network model in high precision format and stores as base model 210”, a model in a high precision format corresponds to a model pretrained with a precision without quantization, Xu but not Bijalwan teaches distributed training)
At the time of filing, one of ordinary skill in the art would have motivation to combine the method jointly taught by Bijalwan, Shen, and Xu for the parent claim of claim 11, claim 1. No new embodiments are introduced, so the reason to combine is the same as for the parent claim.

Regarding claim 14,
	Bijalwan and Shen jointly teach The operating method of claim 12,
	Xu teaches the following further limitations that neither Bijalwan nor Shen teaches:
wherein the training of the model comprises performing, on the model, distributed training comprising operations of: ((Xu [0025]) “A distributed system can accelerate a training process by distributing the training process across multiple computing systems, which can be referred to as worker nodes. Training data can be split into multiple portions, with each portion to be processed by a worker node. Each worker node can perform the forward and backward propagation operations independently”)
performing forward propagation moving from a first layer to a last layer of the model; ((Xu [0021]) “During training of a neural network, a first neural network layer can receive training input data, combine the training input data with the weights (e.g., by multiplying the training input data with the weights and then summing the products) to generate first output data for the neural network layer, and propagate the output data to a second neural network layer, in a forward propagation operation…The forward propagation operations can start at the first neural network layer and end at the highest neural network layer”)
performing backward propagation moving from the last layer to the first layer of the model; ((Xu [0023]) “As part of the training process, each neural network layer can then perform a backward propagation process to adjust the set of weights at each neural network layer…The backward propagation operations can start from the highest neural network layer and end at the first neural network layer”)
determining a mean value of gradients calculated in each of a plurality of nodes used for the distributed training of the model; ((Xu [0025]) “Each worker node can exchange its set of weight gradients with other worker nodes, and average its set of weight gradients and the sets of weight gradients received from other worker nodes”)
and updating a weight of the model based on the mean value ((Xu [0025]) “Each computing node can have the same set of averaged weight gradients, and can then update a set of weights for each neural network layer based on the averaged weight gradients”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Bijalwan, Shen, and Xu by taking the method for determining the sensitivity of layers in a model and applying quantization to layers in the model that are insensitive to quantization relative to a threshold, taught jointly by Bijalwan and Shen, and adding the method of distributed training taught by Xu, as Xu teaches: (Xu [0025]) “A distributed system can accelerate a training process by distributing the training process across multiple computing systems”. Such a combination would be obvious.

Regarding claim 18,
Bijalwan, Shen, and Xu jointly teach The operating method of claim 14, 
Xu further teaches:
wherein the training of the model comprises training the model by applying quantization [to the layer with the low sensitivity lower than the predetermined threshold] in any one or any combination of the operations the distributed training comprises ((Xu [0065]) “In some embodiments, a DMA controller 1002 at the worker node 120-1 may perform one or more of the compression tasks. For example, the DMA controller 1002 may utilize a gradient compression engine (GCE) to perform sparsity analysis, gradient clipping, quantization, and/or compression”, Bijalwan teaches quantization of a layer with low sensitivity below a threshold)
At the time of filing, one of ordinary skill in the art would have motivation to combine the electronic device jointly taught by Bijalwan, Shen, and Xu for the parent claim of claim 7, claim 3. No new embodiments are introduced, so the reason to combine is the same as for the parent claim.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Bijalwan in view of Shen, further in view of Xu, further in view of Dong.

Regarding claim 4,
Bijalwan, Shen, and Xu jointly teach The electronic device of claim 1,
Dong teaches the following further limitation that neither Bijalwan, nor Shen, nor Xu explicitly teaches:
wherein the processor is further configured to periodically determine training sensitivity of the layers for each training of the model, or for each epoch or each of one or more iterations performed during the training of the model ((Dong Pgs. 1-2)) “these searching methods can require a large amount of computational resources, are time-consuming, and, worst of all, the quality of quantization is very sensitive to the initialization of their search parameters and therefore unpredictable. This makes deployment of these methods in online learning scenarios especially challenging, as in these applications a new model is trained every few hours and needs to be quantized for efficient inference. To address these issues, recent work introduced HAWQ [7], a Hessian AWare Quantization framework. The main idea is to assign higher bit-precision to layers that are more sensitive, and lower bit-precision to less sensitive layers”, application of the method of determining training sensitivity of layers to online learning scenarios where a new model is trained and quantized every few hours corresponds to periodically determining training sensitivity of the layers for each training of the model)
At the time of filing, one of ordinary skill in the art would have motivation to combine Bijalwan, Shen, Xu, and Dong by taking the electronic device for determining the sensitivity of layers in a model and applying quantization to layers in the model that are insensitive to quantization relative to a threshold, taught jointly by Bijalwan, Shen, and Xu, and adding quantizing the layers, including determining the sensitivity of the layers to quantization, every time the model is trained, as taught by Dong, as doing so increases the effectiveness of the model by optimizing its accuracy relative to its size for the newly trained parameters. Such a combination would be obvious.

Claims 5 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Bijalwan in view of Shen, further in view of Xu, further in view of Liu.

Regarding claim 5,
Bijalwan teaches An electronic device comprising: ((Bijalwan [0028]) “In one embodiment,
the system 100 comprises…at least one device such as a computing device 104”)
a processor; ((Bijalwan [0009]) “The system comprises a memory and a processor that is coupled to the memory”)
and a memory configured to store instructions executable by the processor, ((Bijalwan [0009]) “The system comprises a memory and a processor that is coupled to the memory”)
wherein the processor is configured to, in response to the instructions being executed by the processor: generate, based on a determination of sensitivity of layers in a model to be trained, sensitivity results; ((Bijalwan [0055]) At block 408, the sensitivity evaluation module 114 normalizes the features sensitivity values and the weight sensitivity values of the plurality of layers independently and combines the sensitivity values into a union sensitivity list…The first row of the union sensitivity list illustrates that a weight sensitivity value corresponding to layer 1 indicated by "Layer_l_w" is 0.77)
Shen teaches the following further limitation more explicitly than Bijalwan:
train the model by applying quantization to a layer of the layers with a low sensitivity of the sensitivity results lower than a predetermined threshold, ((Shen [0028]) “when the value of the objective function corresponding to the first layer L1 is greater than the threshold, this indicates that the loss is small, and the processing unit 120 decides to quantize the first layer L1 with the second precision”, a small loss corresponds to a low sensitivity)
At the time of filing, one of ordinary skill in the art would have motivation to combine Bijalwan and Shen by taking the electronic device for determining the sensitivity of layers in a model taught by Bijalwan and applying a threshold to layers based on sensitivity to determine whether to apply quantization, taught by Shen, as doing so imparts the benefit of optimizing the memory efficiency and speed of the model, increased by quantizing some layers, with respect to the accuracy of the model, increased by refraining from quantizing some layers, by quantizing the layers that are not sensitive enough to significantly decrease accuracy. Such a combination would be obvious.
Xu teaches the following further limitations that neither Bijalwan nor Shen teaches:
where the training includes at least one of an operation of backward propagation of the layer or an operation of updating a weight of the model dependent on a calculated gradient ((Xu [0023]) “As part of the training process, each neural network layer can then perform a backward propagation process to adjust the set of weights at each neural network layer. Specifically, the highest neural network layer can receive the set of gradients and compute, in a backward propagation operation, a set of first data gradients and a set of first weight gradients based on applying the set of weights to the input data gradients in similar mathematical operations as the forward propagation operation. The highest neural network layer can adjust the set of weights of the layer based on the set of first weight gradients”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Bijalwan, Shen, and Xu by taking the electronic device for determining the sensitivity of layers in a model and applying quantization to layers in the model that are insensitive to quantization relative to a threshold, taught jointly by Bijalwan and Shen, and adding the method of training including backwards propagation and updating model weights based on gradients, taught by Xu, as backpropagation to update model weights with gradients is a very well-known and commonly used method for training machine learning models, with substantial benefits to training efficiency over its alternatives in a large majority of cases. Such a combination would be obvious.
Liu teaches the following further limitations that neither Bijalwan, nor Shen, nor Xu explicitly teaches:
wherein. to perform the application of the quantization, the processor is further configured to: generate, based on a determination of channel-wise sensitivity of a tensor ((Liu Pg. 2) “The b-bit linear quantization amounts to approximate real numbers using the following quantization set Q…[.]Q denotes the nearest rounding operator w.r.t. Q…[.]Q can be generalized to higher dimensional tensors by first stretching them to one-dimensional vectors then applying Eq. 3”) used for the model, channel-wise sensitivity results; ((Liu Pg. 3) “For a layer L with d-dimensional input, we adopt a simple criterion, output error, to determine the target channels. Output error is the difference of the output of a channel before and after quantization”, output error of a channel in response to quantization corresponds to a determination of channel-wise sensitivity)
process a channel with a low channel-wise sensitivity of the channel-wise sensitivity results lower than a second predetermined threshold with a first precision by applying quantization to the channel; ((Liu Pg. 1) “we propose multipoint quantization for post-training quantization, which can achieve the flexibility similar to mixed precision, but uses only a single precision level. The idea is to approximate a full-precision weight vector by a linear combination of multiple low-bit vectors. This allows us to use a larger number of low-bit vectors to approximate the weights of more important channels, while use less points to approximate the insensitive channels”, (Liu Pg. 3) “If e(w;w-hat;DL) is larger than a predefined threshold ϵ, we apply multipoint quantization to this channel”, applying multipoint quantization, which is higher precision, to a channel with error higher than a predefined threshold, while using normal, lower-precision quantization for insensitive channels, corresponds to applying quantization to a channel with sensitivity lower than a threshold)
and process a channel with a high channel-wise sensitivity of the channel-wise sensitivity results higher than or equal to the second predetermined threshold with a second precision, higher than the first precision, without quantization ((Liu Pg. 3) “If e(w;w-hat;DL) is larger than a predefined threshold ϵ, we apply multipoint quantization to this channel”, therefore if the error e is above the threshold multi-point quantization, which approximates full-precision, is applied, which corresponds to processing the channel without quantization)
At the time of filing, one of ordinary skill in the art would have motivation to combine Bijalwan, Shen, Xu, and Liu by taking the electronic device for determining the sensitivity of layers in a model and applying quantization to layers in the model that are insensitive to quantization relative to a threshold, taught jointly by Bijalwan, Shen, and Xu, and adding quantizing tensors, including quantizing channels with low sensitivity relative to a threshold but approximating full precision for channels with high error above a threshold, as taught by Liu, as Liu teaches (Liu Pg. 2) “There are two common configurations for post-training quantization, per-layer quantization and per-channel quantization. Per-layer quantization assigns the same K and B for all the weights in the same layer. Per-channel quantization is more fine-grained, and it uses different K and B for different channels. The latter can achieve higher precision, but it also requires more complicated hardware design…We propose multipoint quantization, which can be implemented with common operands on commodity hardware”. Such a combination would be obvious.

Regarding claim 21,
Bijalwan, Shen, and Xu jointly teach The electronic device of claim 1,
Liu teaches the following further limitations that neither Bijalwan, nor Shen, nor Xu explicitly teaches:
wherein the processor is further configured to: generate, based on a determination of channel-wise sensitivity of a tensor ((Liu Pg. 2) “The b-bit linear quantization amounts to approximate real numbers using the following quantization set Q…[.]Q denotes the nearest rounding operator w.r.t. Q…[.]Q can be generalized to higher dimensional tensors by first stretching them to one-dimensional vectors then applying Eq. 3”) used for the model, channel-wise sensitivity results; ((Liu Pg. 3) “For a layer L with d-dimensional input, we adopt a simple criterion, output error, to determine the target channels. Output error is the difference of the output of a channel before and after quantization”, output error of a channel in response to quantization corresponds to a determination of channel-wise sensitivity)
process a channel with a low channel-wise sensitivity of the channel-wise sensitivity results lower than a second predetermined threshold with a first precision by applying quantization to the channel; ((Liu Pg. 1) “we propose multipoint quantization for post-training quantization, which can achieve the flexibility similar to mixed precision, but uses only a single precision level. The idea is to approximate a full-precision weight vector by a linear combination of multiple low-bit vectors. This allows us to use a larger number of low-bit vectors to approximate the weights of more important channels, while use less points to approximate the insensitive channels”, (Liu Pg. 3) “If e(w;w-hat;DL) is larger than a predefined threshold ϵ, we apply multipoint quantization to this channel”, applying multipoint quantization, which is higher precision, to a channel with error higher than a predefined threshold, while using normal, lower-precision quantization for insensitive channels, corresponds to applying quantization to a channel with sensitivity lower than a threshold)
and process a channel with a high channel-wise sensitivity of the channel-wise sensitivity results higher than or equal to the second predetermined threshold with a second precision, higher than the first precision, without quantization ((Liu Pg. 3) “If e(w;w-hat;DL) is larger than a predefined threshold ϵ, we apply multipoint quantization to this channel”, therefore if the error e is above the threshold multi-point quantization, which approximates full-precision, is applied, which corresponds to processing the channel without quantization)
At the time of filing, one of ordinary skill in the art would have motivation to combine Bijalwan, Shen, Xu, and Liu by taking the electronic device for determining the sensitivity of layers in a model and applying quantization to layers in the model that are insensitive to quantization relative to a threshold, taught jointly by Bijalwan, Shen, and Xu, and adding quantizing tensors, including quantizing channels with low sensitivity relative to a threshold but approximating full precision for channels with high error above a threshold, as taught by Liu, as Liu teaches (Liu Pg. 2) “There are two common configurations for post-training quantization, per-layer quantization and per-channel quantization. Per-layer quantization assigns the same K and B for all the weights in the same layer. Per-channel quantization is more fine-grained, and it uses different K and B for different channels. The latter can achieve higher precision, but it also requires more complicated hardware design…We propose multipoint quantization, which can be implemented with common operands on commodity hardware”. Such a combination would be obvious.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Bijalwan in view of Shen, further in view of Xu, further in view of Micikevicius et al. “Mixed Precision Training”, hereinafter Micikevicius.

Regarding claim 9,
Bijalwan, Shen, and Xu jointly teach The electronic device of claim 1,
	Micikevicus teaches the following further limitation that neither Bijalwan, nor Shen, nor Xu teaches:
wherein the processor is further configured to train the model by scaling a gradient calculated in training the model ((Micikevicius Pg. 4) “One efficient way to shift the gradient values into FP16-representable range is to scale the loss value computed in the forward pass, prior to starting back-propagation. By chain rule back-propagation ensures that all the gradient values are scaled by the same amount…We trained a variety of networks with scaling factors ranging from 8 to 32K”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Bijalwan, Shen, Xu, and Micikevicius by taking the electronic device for determining the sensitivity of layers in a model and applying quantization to layers in the model that are insensitive to quantization relative to a threshold, taught jointly by Bijalwan, Shen, and Xu and adding scaling a gradient calculated during model training, as taught by Micikevicius, as Micikevicius teaches (Micikevicius Pg. 3) “Scaling up the gradients will shift them to occupy more of the representable range and preserve values that are otherwise lost to zeros”. Such a combination would be obvious.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Bijalwan in view of Shen, further in view of Xu, further in view of Alistarh et al. “The Convergence of Sparsified Gradient Methods” hereinafter Alistarh.

Regarding claim 10,
Bijalwan, Shen, and Xu jointly teach The electronic device of claim 3,
	Alistarh teaches the following further limitation that neither Bijalwan, nor Shen, nor Xu teaches:
wherein the processor is further configured to determine the mean value using "k" largest gradients of the gradients calculated in each of the plurality of nodes, or by applying a genetic algorithm to the gradients, ((Alistarh Pg. 3) “Strom [23], Dryden et al. [8] and Aji and Heafield [2] considered sparsifying the gradient updates by only applying the top K components, taken at at every node, in every iteration, for K corresponding to < 1% of the dimension, and accumulating the error”, Alistarh Pg. 5, Algorithm 1 shows that the TopK gradients are averaged) 

    PNG
    media_image1.png
    236
    672
    media_image1.png
    Greyscale

where k is an integer ((Alistarh Pg. 6) “To further illustrate necessity, consider a dummy instance with two nodes, dimension 2, and K = 1”, 1 is an integer)
At the time of filing, one of ordinary skill in the art would have motivation to combine Bijalwan, Shen, Xu, and Alistarh by taking the electronic device of claim 3, taught jointly by Bijalwan, Shen, and Xu, and adding determining the average gradient using the k largest gradients, taught by Alistarh, as doing so increases the efficiency of the distributed system by not utilizing network bandwidth for transmission of gradients of such small size that they provide only marginal additions to accuracy when considered. Such a combination would be obvious.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Krishnamoorthi “Quantizing deep convolutional networks for efficient inference: A whitepaper” teaches a variety of quantization techniques for neural networks.
Liang “Post Training Mixed-Precision Quantization Based on Key Layers Selection” teaches a method of mixed-precision quantization where layers are ranked based on entropy and the most sensitive layers are chosen to have higher precision.
Bleiweiss et al. (U.S. Patent Application Publication No. 2019/0205736) teaches a variety of techniques for accelerating the computation of neural networks, including quantization, compression, and distributed training.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VICTOR A NAULT whose telephone number is (703) 756-5745. The examiner can normally be reached M - F, 12 - 8.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/V.A.N./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

Aug 12, 2022
Application Filed
Jul 22, 2025
Non-Final Rejection — §101, §103
Oct 15, 2025
Response Filed
Jan 26, 2026
Final Rejection — §101, §103
Mar 26, 2026
Applicant Interview (Telephonic)
Mar 26, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

17/571,899
Patent 12579429
DEEP LEARNING BASED EMAIL CLASSIFICATION
2y 5m to grant Granted Mar 17, 2026
17/663,579
Patent 12566953
AUTOMATED PROCESSING OF FEEDBACK DATA TO IDENTIFY REAL-TIME CHANGES
2y 5m to grant Granted Mar 03, 2026
17/730,413
Patent 12561563
AUTOMATED PROCESSING OF FEEDBACK DATA TO IDENTIFY REAL-TIME CHANGES
2y 5m to grant Granted Feb 24, 2026
17/517,313
Patent 12468939
OBJECT DISCOVERY USING AN AUTOENCODER
2y 5m to grant Granted Nov 11, 2025
17/578,759
Patent 12446600
TWO-STAGE SAMPLING FOR ACCELERATED DEFORMULATION GENERATION
2y 5m to grant Granted Oct 21, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
62%
Grant Probability
99%
With Interview (+83.3%)
3y 11m
Median Time to Grant
Moderate
PTA Risk
Based on 13 resolved cases by this examiner. Grant probability derived from career allow rate.