DETAILED ACTION
This office action is in response to submission of application on 06/29/2023.
Claims 1-15 are presented for examination.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: 701, 702, and 703. Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Rejections - 35 USC § 112
Claims 4, 9, and 12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 4, 9, and 12 recite the limitations “quantifiable layers” and “quantifiable second devices”. In the context of the application, it is unclear what it would mean for a layer or device to be quantifiable. For examination purposes, the term “quantifiable” will be interpreted as quantizable, in accordance with examiner’s understanding of applicant’s intention.
Claims 4 and 12 recite the limitation “a proportion of quantifiable layers in second model data”. It is unclear what is meant by a layer in model data. For examination purposes, this limitation will be interpreted as referring to layers of a neural network.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Claim 1:
Step 1: The claim is directed to a method, which falls within the statutory category of a
process.
Step 2A Prong 1: The claim is directed to an abstract idea. Specifically, the claim recites:
determining, [by a first device], second information based on first information, wherein the second information is quantization information based on which [a second device] quantizes first model data, the first information comprises an evaluation loss corresponding to a current round of training, the second information comprises a quantization error threshold, and the first model data is model data after the current round of training; (Abstract idea – mental process. Determining a quantization error threshold for model data quantization based on an evaluation loss is a judgement which can practically be performed in the human mind or with the aid of pen and paper, for example, by viewing the evaluation loss on a sheet of paper and mentally determining a suitable quantization error threshold. The courts have recognized that claims can recite a mental process even if they are claimed as being performed on a computer. See MPEP 2106.04(a)(2)(III).)
Step 2A Prong 2: The additional elements recited in the claim do not integrate the abstract idea into a practical application, individually or in combination. Specifically, the claim recites the additional elements:
a first device and a second device (These limitations are interpreted as generic computing environments, and thus amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
sending, by the first device, the second information to the second device; and (Sending information between devices amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g).)
receiving, by the first device, a first message sent by the second device, wherein the first message comprises quantized first model data and first quantization configuration information. (Receiving quantized model data and quantization configuration information between devices amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g).)
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Specifically, the claim recites the additional elements:
a first device and a second device (These limitations are interpreted as generic computing environments, and thus amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
sending, by the first device, the second information to the second device; and (Sending information between devices amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g). Further, the limitation is directed to receiving or transmitting data over a network, which the courts have found to be well-understood, routine, and conventional in the computer arts – see MPEP 2106.05(d).)
receiving, by the first device, a first message sent by the second device, wherein the first message comprises quantized first model data and first quantization configuration information. (Receiving quantized model data and quantization configuration information between devices amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g). Further, the limitation is directed to receiving or transmitting data over a network, which the courts have found to be well-understood, routine, and conventional in the computer arts – see MPEP 2106.05(d).)
Claims 2-4 and 13-15:
Claim 2 recites The method according to claim 1, wherein the first information further comprises information about an accuracy requirement of the second device for model training and communication sensitivity information. This claim merely specifies that the information considered when determining the quantization error threshold (i.e. performing the mental process) includes an accuracy requirement and communication sensitivity information. Therefore, the claim merges with the abstract idea recited in claim 1, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.
Claim 3 recites The method according to claim 2, wherein before the determining, by a first device, second information based on first information, the method further comprises: receiving, by the first device, a second message sent by the second device, wherein the second message comprises information about the accuracy requirement and the communication sensitivity information. Receiving an accuracy requirement and communication sensitivity information from the second device amounts to adding insignificant extra-solution activity (mere data-gathering) to the judicial exception – see MPEP2106.05(g). Further, the limitation is directed to receiving or transmitting data over a network, which the courts have found to be well-understood, routine, and conventional in the computer arts – see MPEP 2106.05(d). Therefore, the claim does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.
Claim 4 recites The method according to claim 2, wherein before the determining, by a first device, second information based on first information, the method further comprises: determining, by the first device, a proportion of quantifiable layers in second model data based on third information, wherein the third information comprises an evaluation loss corresponding to a previous round of training, the information about the accuracy requirement, and the communication sensitivity information, and the second model data is model data before the current round of training; quantizing, by the first device, the second model data based on the proportion of the quantifiable layers, to obtain quantized second model data; and sending, by the first device, a third message to the second device, wherein the third message comprises the quantized second model data and second quantization configuration information, and the third message is training information based on which the second device trains the second model data to obtain the first model data. Determining a proportion of quantizable layers in model data based on a previous evaluation loss, accuracy requirement, and communication sensitivity information and quantizing model data based on the proportion of quantizable layers can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by viewing the model data, previous evaluation loss, accuracy requirement, and communication sensitivity on a sheet of paper, mentally determining a suitable proportion of model layers which can be quantized, and for the layers which are determined to be quantizable, adjusting their weights by hand by mentally rounding them. Sending quantized model data and quantization configuration information between devices amounts to adding insignificant extra-solution activity to the judicial exception, and is directed to receiving or transmitting data over a network, which is well-understood, routine, and conventional in the computer arts. Training a generic model is standard in the field of machine learning, and thus amounts to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea. Therefore, the claim merges with the abstract idea recited in claim 2, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.
Claim 13 recites The method according to claim 1, wherein the first quantization configuration information is information based on which the first device performs dequantization parsing on the quantized first model data. Performing dequantization parsing based on quantization configuration information can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by viewing the quantized model data and quantization configuration information, such as the number of bits in the pre-quantized data, on a sheet of paper, and adding bits to the quantized data by hand so that it has the same number of bits as the pre-quantized data. Therefore, the claim merges with the abstract idea recited in claim 1, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.
Claim 14 recites The method according to claim 1, wherein the first quantization configuration information includes a quantity of quantized bits of the first model data, uniform quantization or non-uniform quantization, a quantized zero point, an offset value, and/or a scaling factor. This claim merely specifies the type of quantization configuration information included in the message received by the first device, and amounts to adding insignificant extra-solution activity to the judicial exception which is well-understood, routine, and conventional in the computer arts. Therefore, the claim does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.
Claim 15 recites The method according to claim 2, wherein the quantization error threshold is determined by the first device based on the evaluation loss corresponding to the current round of training, the information about the accuracy requirement of the second device for model training, and the communication sensitivity information. This claim merely specifies that the information considered when determining the quantization error threshold (i.e. performing the mental process) includes the evaluation loss, the accuracy requirement, and the communication sensitivity information. Therefore, the claim merges with the abstract idea recited in claim 2, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.
Claim 5:
Step 1: The claim is directed to a method, which falls within the statutory category of a
process.
Step 2A Prong 1: The claim is directed to an abstract idea. Specifically, the claim recites:
quantizing, [by the second device], the first model data based on the second information; (Abstract idea – mental process. Quantizing model data based on quantization information can practically be performed in the human mind or with the aid of pen and paper, for example, by viewing the model data and quantization information on a sheet of paper and adjusting the model weights by hand by mentally rounding them in accordance with the quantization information. The courts have recognized that claims can recite a mental process even if they are claimed as being performed on a computer. See MPEP 2106.04(a)(2)(III).)
Step 2A Prong 2: The additional elements recited in the claim do not integrate the abstract idea into a practical application, individually or in combination. Specifically, the claim recites the additional elements:
a first device and a second device (These limitations are interpreted as generic computing environments, and thus amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
receiving, by a second device, second information sent by a first device, wherein the second information is quantization information based on which the second device quantizes first model data, the second information comprises a quantization error threshold, and the first model data is model data after a current round of training; (Receiving quantization information between devices amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g).)
sending, by the second device, a first message to the first device, wherein the first message comprises quantized first model data and first quantization configuration information. (Sending quantized model data and quantization configuration information between devices amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g).)
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Specifically, the claim recites the additional elements:
a first device and a second device (These limitations are interpreted as generic computing environments, and thus amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
receiving, by a second device, second information sent by a first device, wherein the second information is quantization information based on which the second device quantizes first model data, the second information comprises a quantization error threshold, and the first model data is model data after a current round of training; (Receiving quantization information between devices amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g). Further, the limitation is directed to receiving or transmitting data over a network, which the courts have found to be well-understood, routine, and conventional in the computer arts – see MPEP 2106.05(d).)
sending, by the second device, a first message to the first device, wherein the first message comprises quantized first model data and first quantization configuration information. (Sending quantized model data and quantization configuration information between devices amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g). Further, the limitation is directed to receiving or transmitting data over a network, which the courts have found to be well-understood, routine, and conventional in the computer arts – see MPEP 2106.05(d).)
Claims 6-7:
Claim 6 recites The method according to claim 5, wherein the quantizing, by the second device, the first model data based on the second information comprises: quantizing, by the second device, the first model data in a first quantization manner; determining, by the second device, a first quantization error based on the quantized first model data and the first model data before the quantization; and when the first quantization error is less than the quantization error threshold, determining, by the second device, to use the first quantization manner to quantize the first model data. Quantizing model data, determining a quantization error, comparing the quantization error to a threshold, and determining to use that manner of quantization based on the comparison can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by viewing the model data on a sheet of paper, adjusting the model weights by hand by mentally rounding them to a certain number of bits, mentally calculating quantization error as a difference between the rounded and unrounded values, mentally comparing the quantization error to a quantization error threshold, and in the case that the error is below the threshold, determining that the number of bits rounded to is an acceptable manner of quantization. Therefore, the claim merges with the abstract idea recited in claim 5, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.
Claim 7 recites The method according to claim 5, wherein before the receiving, by a second device, second information sent by a first device, the method further comprises: receiving, by the second device, a third message sent by the first device, wherein the third message comprises quantized second model data and second quantization configuration information, and second model data is model data before the current round of training; performing, by the second device, dequantization parsing based on the quantized second model data and the second quantization configuration information to obtain the second model data; and training, by the second device, the second model data, to obtain the first model data. Receiving quantized model data and quantization configuration information between devices amounts to adding insignificant extra-solution activity to the judicial exception, and is directed to receiving or transmitting data over a network, which is well-understood, routine, and conventional in the computer arts. Performing dequantization parsing based on quantized model data and quantization configuration information can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by viewing the quantized model data and quantization configuration information, such as the number of bits in the pre-quantized data, on a sheet of paper, and adding bits to the quantized data by hand so that it has the same number of bits as the pre-quantized data. Training a generic model is standard in the field of machine learning, and thus amounts to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea. Therefore, the claim merges with the abstract idea recited in claim 5, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.
Claim 8:
Step 1: The claim is directed to a method, which falls within the statutory category of a
process.
Step 2A Prong 1: The claim is directed to an abstract idea. Specifically, the claim recites:
determining, [by the first device] based on the first quantization error and the first information, whether the second device is allowed to send quantized first model data; (Abstract idea – mental process. Determining whether a device is allowed to send quantized model data based on quantization error and evaluation loss can practically be performed in the human mind or with the aid of pen and paper, for example, by viewing the quantization error and evaluation loss associated with the device on a sheet of paper and mentally determining if quantization is suitable. The courts have recognized that claims can recite a mental process even if they are claimed as being performed on a computer. See MPEP 2106.04(a)(2)(III).)
Step 2A Prong 2: The additional elements recited in the claim do not integrate the abstract idea into a practical application, individually or in combination. Specifically, the claim recites the additional elements:
a first device and a second device (These limitations are interpreted as generic computing environments, and thus amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
receiving, by a first device, a fourth message sent by a second device, wherein the fourth message comprises a first quantization error and first information, the first quantization error is determined after the second device quantizes first model data in a first quantization manner, the first information comprises an evaluation loss corresponding to a current round of training, and the first model data is model data after the current round of training; (Receiving a message comprising quantization error and evaluation loss between devices amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g).)
sending, by the first device, indication information to the second device, wherein the indication information indicates whether the second device is allowed to send the quantized first model data. (Sending an indication of whether quantization is allowed between devices amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g).)
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Specifically, the claim recites the additional elements:
a first device and a second device (These limitations are interpreted as generic computing environments, and thus amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
receiving, by a first device, a fourth message sent by a second device, wherein the fourth message comprises a first quantization error and first information, the first quantization error is determined after the second device quantizes first model data in a first quantization manner, the first information comprises an evaluation loss corresponding to a current round of training, and the first model data is model data after the current round of training; (Receiving a message comprising quantization error and evaluation loss between devices amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g). Further, the limitation is directed to receiving or transmitting data over a network, which the courts have found to be well-understood, routine, and conventional in the computer arts – see MPEP 2106.05(d).)
sending, by the first device, indication information to the second device, wherein the indication information indicates whether the second device is allowed to send the quantized first model data. (Sending an indication of whether quantization is allowed between devices amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g). Further, the limitation is directed to receiving or transmitting data over a network, which the courts have found to be well-understood, routine, and conventional in the computer arts – see MPEP 2106.05(d).)
Claims 9-12:
Claim 9 recites The method according to claim 8, wherein the determining, by the first device based on the first quantization error and the first information, whether the second device is allowed to send quantized first model data comprises: determining, by the first device, a proportion of quantifiable second devices based on the first information; and determining, by the first device based on the proportion of the quantifiable second devices, the first quantization error, and a threshold for a quantity of consecutive quantization times, whether the second device is allowed to send the quantized first model data. Determining a proportion of quantizable devices and whether a specific device is allowed to quantize can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by viewing the evaluation loss on a sheet of paper, mentally determining a suitable number n of quantizable devices, mentally sorting the devices by quantization error, and for the n devices with the lowest quantization error, if their number of consecutive quantizations is below a threshold, mentally determining that quantization is allowed. Therefore, the claim merges with the abstract idea recited in claim 8, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.
Claim 10 recites The method according to claim 8, wherein the first information further comprises information about an accuracy requirement of the second device for model training and communication sensitivity information. This claim merely specifies that the information considered when determining whether quantization is allowed (i.e. performing the mental process) includes an accuracy requirement and communication sensitivity information. Therefore, the claim merges with the abstract idea recited in claim 8, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.
Claim 11 recites The method according to claim 10, wherein before the receiving, by a first device, a fourth message sent by a second device, the method further comprises: receiving, by the first device, a second message sent by the second device, wherein the second message comprises information about the accuracy requirement and the communication sensitivity information. Receiving an accuracy requirement and communication sensitivity information from the second device amounts to adding insignificant extra-solution activity (mere data-gathering) to the judicial exception – see MPEP2106.05(g). Further, the limitation is directed to receiving or transmitting data over a network, which the courts have found to be well-understood, routine, and conventional in the computer arts – see MPEP 2106.05(d). Therefore, the claim does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.
Claim 12 recites The method according to claim 10, wherein before the receiving, by a first device, a fourth message sent by a second device, the method further comprises: determining, by the first device, a proportion of quantifiable layers in second model data based on third information, wherein the third information comprises an evaluation loss corresponding to a previous round of training, the information about the accuracy requirement, and the communication sensitivity information, and the second model data is model data before the current round of training; quantizing, by the first device, the second model data based on the proportion of the quantifiable layers, to obtain quantized second model data; and sending, by the first device, a third message to the second device, wherein the third message comprises the quantized second model data and second quantization configuration information, and the third message is training information based on which the second device trains the second model data to obtain the first model data. Determining a proportion of quantizable layers in model data based on a previous evaluation loss, accuracy requirement, and communication sensitivity information and quantizing model data based on the proportion of quantizable layers can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by viewing the model data, previous evaluation loss, accuracy requirement, and communication sensitivity on a sheet of paper, mentally determining a suitable proportion of model layers which can be quantized, and for the layers which are determined to be quantizable, adjusting their weights by hand by mentally rounding them. Sending quantized model data and quantization configuration information between devices amounts to adding insignificant extra-solution activity to the judicial exception, and is directed to receiving or transmitting data over a network, which is well-understood, routine, and conventional in the computer arts. Training a generic model is standard in the field of machine learning, and thus amounts to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea. Therefore, the claim merges with the abstract idea recited in claim 10, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over
Liu et al. (hereinafter Liu), China Patent Application CN-111401552-A (published 07/10/2020), in view of
Hou et al. (hereinafter Hou), “Loss-aware Weight Quantization of Deep Networks” (published 05/10/2018),
Liu, Shaoli et al. (hereinafter Shaoli), U.S. Patent Application Publication US-20200394522-A1 (published 12/17/2020), and
Xu et al. (hereinafter Xu), “Compressed Communication for Distributed Deep Learning: Survey and Quantitative Evaluation” (published 04/13/2020).
Regarding Claim 1,
Liu teaches A model data sending method, applied to federated learning and comprising: (0011: “The purpose of this invention is to provide a federated learning method and system.”)
determining, by a first device, second information based on first information, wherein the second information is quantization information based on which a second device quantizes first model data, (0014: “The edge server adjusts the batch size and gradient compression rate of the terminal based on the current batch size and gradient compression rate, combined with the terminal's computing power and the communication capability between the edge server and the terminal…” 0033: “In one possible implementation, when compressing gradient information, the gradient can be quantized…” The edge server (i.e. first device) determines the gradient compression rate (i.e. second information comprising quantization information) based on which the terminal (i.e. second device) will compress the gradient (i.e. model data) via quantization.)
the first model data is model data after the current round of training; (0015: “The terminal performs model learning according to the received batch size, and outputs the gradient information obtained by model learning to the edge server after compressing it according to the received extraction compression ratio.” The terminal compresses (i.e. quantizes) gradient information obtained by model learning (i.e. model data after the current round of training).)
sending, by the first device, the second information to the second device; and (0014: “The edge server… transmits the adjusted batch size and gradient compression rate to the terminal.” The edge server (i.e. first device) transmits the gradient compression rate (i.e. second information) to the terminal (i.e. second device).)
receiving, by the first device, a first message sent by the second device, wherein the first message comprises quantized first model data and [first quantization configuration information]. (0015-0016: “The terminal…outputs the gradient information obtained by model learning to the edge server after compressing it according to the received extraction compression ratio… The edge server averages all received gradient information…” The edge server (i.e. first device) receives compressed gradient information (i.e. a first message comprising quantized first model data) from the terminal (i.e. second device).)
Liu does not appear to explicitly disclose the first information comprises an evaluation loss corresponding to a current round of training,
However, Hou teaches the first information comprises an evaluation loss corresponding to a current round of training, (Pg. 3, section 3.1: “[W]e consider the loss explicitly during quantization and obtain the quantization thresholds and scaling parameter by solving an optimization problem.” The information considered when determining quantization parameters (i.e. first information) includes training loss.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu and Hou. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Hou teaches neural network model quantization where quantization parameters are dynamically determined based on training loss. One of ordinary skill would have motivation to combine Liu and Hou because loss-aware weight quantization “outperforms state-of-the-art weight quantization algorithms, and is as accurate (or even more accurate) than the full-precision network” (Hou, pg. 1, Abstract).
Liu and Hou do not appear to explicitly disclose the second information comprises a quantization error threshold,
However, Shaoli teaches the second information comprises a quantization error threshold, (0101: “In the present technical scheme, the data bit width n is adjusted according to the quantization error diffbit. Furthermore, the quantization error diffbit is compared with a threshold to obtain a comparison result. The threshold includes a first threshold and a second threshold… In practical applications, the first threshold and the second threshold may be empirical values or variable hyperparameters.” The information based on which quantization is performed (i.e. second information) includes a quantization error threshold.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Hou, and Shaoli. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Hou teaches neural network model quantization where quantization parameters are dynamically determined based on training loss. Shaoli teaches neural network model quantization including a quantization error threshold parameter. One of ordinary skill would have motivation to combine Liu, Hou, and Shaoli because “[i]n this way, the fixed-point computing speed may be greatly improved within a tolerance range of precision, which improves the resource utilization rate of an artificial intelligence processor chip” (Shaoli, 0102).
Liu, Hou, and Shaoli do not appear to explicitly disclose first quantization configuration information.
However, Xu teaches receiving first quantization configuration information (Pg. 6, section 4.2: “The framework considers context (ctx) as an opaque object that can carry any necessary metadata to allow for decompression… Below is an example function definition that takes a tensor with unique name and returns a list of compressed objects with the context needed to decompress them:
c
o
m
p
r
e
s
s
:
t
e
n
s
o
r
,
n
a
m
e
→
[
c
o
m
p
]
,
c
t
x
”. The entity receiving the quantized model data (i.e. first device) also receives quantization context (i.e. quantization configuration information).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Hou, Shaoli, and Xu. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Hou teaches neural network model quantization where quantization parameters are dynamically determined based on training loss. Shaoli teaches neural network model quantization including a quantization error threshold parameter. Xu teaches methods for neural network compression for efficient communication in distributed learning, including transmitting a context object along with quantized model data. One of ordinary skill would have motivation to combine Liu, Hou, Shaoli, and Xu in order to allow for dequantization: “The context stores additional information (such as mean and different norms) needed to dequantize” (Xu, pg. 7, section 4.3).
Regarding Claim 13, Liu, Hou, Shaoli, and Xu teach The method according to claim 1, as shown above.
Xu also teaches wherein the first quantization configuration information is information based on which the first device performs dequantization parsing on the quantized first model data. (Pg. 7, section 4.3: “The context stores additional information (such as mean and different norms) needed to dequantize…
d
e
q
u
a
n
t
i
z
e
transforms quantized values to an approximation of the original values.” Based on the context (i.e. first quantization configuration information), the quantized values (i.e. quantized first model data) are dequantized (i.e. dequantization parsing is performed).)
Claims 2-3 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Hou, Shaoli, and Xu, and further in view of
Lowell et al. (hereinafter Lowell), U.S. Patent Application Publication US-20190188557-A1 (published 06/20/2019).
Regarding Claim 2, Liu, Hou, Shaoli, and Xu teach The method according to claim 1, as shown above.
Liu also teaches wherein the first information further comprises information about [an accuracy requirement of the second device for model training] and communication sensitivity information. (0014: “The edge server adjusts the batch size and gradient compression rate of the terminal based on the current batch size and gradient compression rate, combined with the terminal's computing power and the communication capability between the edge server and the terminal…” The information based on which quantization information is determined (i.e. first information) includes communication capability between the edge server and the terminal (i.e. communication sensitivity information – see specification paragraph 0066 of the instant application).)
Liu, Hou, Shaoli, and Xu do not appear to explicitly disclose an accuracy requirement of the second device for model training
However, Lowell teaches wherein the first information further comprises information about an accuracy requirement of the second device for model training (0015: “[T]he method includes recalculating the distribution of ANN information and reselecting the quantization function from the set of quantization functions based on the recalculated distribution, if the output does not sufficiently correlate with a known correct output.” 0045: “The difference between the output and the known correct output can be referred to as the training error. On condition 450 that the training error is acceptable (e.g., the difference is below an acceptable threshold, or a heuristic applied to the output and the known correct output satisfies a desired condition), ANN 300 can be considered to be trained on this training data set.” The information based on which quantization information is determined (i.e. first information) includes a model training error threshold (i.e. an accuracy requirement of the second device for model training).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Hou, Shaoli, Xu, and Lowell. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Hou teaches neural network model quantization where quantization parameters are dynamically determined based on training loss. Shaoli teaches neural network model quantization including a quantization error threshold parameter. Xu teaches methods for neural network compression for efficient communication in distributed learning, including transmitting a context object along with quantized model data. Lowell teaches neural network model quantization where quantization parameters are dynamically determined based on a training accuracy requirement. One of ordinary skill would have motivation to combine Liu, Hou, Shaoli, Xu, and Lowell in order to ensure that quantization results in an acceptable level of model accuracy.
Regarding Claim 3, Liu, Hou, Shaoli, Xu, and Lowell teach The method according to claim 2, as shown above.
Liu also teaches wherein before the determining, by a first device, second information based on first information, the method further comprises:
receiving, by the first device, a second message sent by the second device, wherein the second message comprises information about [the accuracy requirement] and the communication sensitivity information. (0093: “Initialization: Each terminal needs to upload relevant information, such as computing power and the status of the wireless channel, to the base station.” The base station/edge server (i.e. first device) receives information including status of the wireless channel (i.e. a second message comprising communication sensitivity information) from the terminal (i.e. second device).)
Lowell teaches that model information used to determine quantization parameters includes the accuracy requirement, as shown above in regard to claim 2.
Regarding Claim 15, Liu, Hou, Shaoli, Xu, and Lowell teach The method according to claim 2, as shown above.
Liu, Hou, Shaoli, and Lowell also teach wherein the quantization error threshold is determined by the first device based on the evaluation loss corresponding to the current round of training, the information about the accuracy requirement of the second device for model training, and the communication sensitivity information. (Shaoli teaches model quantization with a quantization error threshold parameter, Hou teaches determining model quantization parameters based on evaluation loss, Lowell teaches determining model quantization parameters based on an accuracy requirement, and Liu teaches determining model quantization parameters based on communication sensitivity, as shown above in regard to claims 1-2.)
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Hou, Shaoli, Xu, and Lowell, and further in view of
Amiri et al. (hereinafter Amiri), “Federated Learning With Quantized Global Model Updates” (published 10/07/2020).
Regarding Claim 4, Liu, Hou, Shaoli, Xu, and Lowell teach The method according to claim 2, as shown above.
Liu, Hou, and Lowell also teach determining quantization parameters based on (third) information, wherein the third information comprises an evaluation loss [corresponding to a previous round of training], the information about the accuracy requirement, and the communication sensitivity information, and, as shown above.
Lowell teaches determining, by the first device, a proportion of quantifiable layers in second model data based on third information, (0043: “Rather than determining a single quantization function for all link weights in ANN 300, quantization can be performed on a per-layer basis, or for each subset of a plurality of subsets of layers.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Hou, Shaoli, Xu, and Lowell. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Hou teaches neural network model quantization where quantization parameters are dynamically determined based on training loss. Shaoli teaches neural network model quantization including a quantization error threshold parameter. Xu teaches methods for neural network compression for efficient communication in distributed learning, including transmitting a context object along with quantized model data. Lowell teaches neural network model quantization where quantization parameters are dynamically determined based on a training accuracy requirement. One of ordinary skill would have motivation to combine Liu, Hou, Shaoli, Xu, and Lowell because “Selecting a quantization function that is appropriate for each layer can have the advantage of increasing the effectiveness of the link weight quantization in ANN 300 as compared to determining a single quantization function for all link weights” (Lowell, 0043).
Liu, Hou, Shaoli, Xu, and Lowell do not appear to explicitly disclose
wherein before the determining, by a first device, second information based on first information, the method further comprises:
determining quantization parameters based on information corresponding to a previous round of training
the second model data is model data before the current round of training;
quantizing, by the first device, the second model data based on the proportion of the quantifiable layers, to obtain quantized second model data; and
sending, by the first device, a third message to the second device, wherein the third message comprises the quantized second model data and second quantization configuration information, and the third message is training information based on which the second device trains the second model data to obtain the first model data.
However, Amiri teaches wherein before the determining, by a first device, second information based on first information, the method further comprises:
determining quantization parameters based on information corresponding to a previous round of training (Pg. 2, section 1: “To be precise, the PS [parameter server] exploits the knowledge of the last global model estimate available at the devices as side information to quantize the global model update.”)
the second model data is model data before the current round of training; (Pg. 2, section 1: “In this paper, we instead consider broadcasting a quantized version of the global model update by the PS, which provides the devices with a lossy estimate of the global model (rather than its accurate estimate) with which to perform local training.” The global model update (i.e. second model data) is provided to the devices to perform training (i.e. the data is model data before the current round of training).)
quantizing, by the first device, the second model data [based on the proportion of the quantifiable layers], to obtain quantized second model data; and (Pg. 2, section 1: “We introduce a lossy FL (LFL) algorithm, where at each iteration the PS broadcasts a compressed version of the global model update to all the devices through quantization.” The PS (i.e. first device) quantizes the global model update (i.e. second model data) to obtain a compressed version (i.e. quantized second model data). The proportion of quantizable layers is taught by Lowell, as shown above.)
sending, by the first device, a third message to the second device, wherein the third message comprises the quantized second model data and [second quantization configuration information], and the third message is training information based on which the second device trains the second model data to obtain the first model data. (See the portions of pg. 2, section 1 cited above. The PS (i.e. first device) sends the quantized version of the global model update (i.e. a third message comprising quantized second model data) to the devices (i.e. second device) to perform local training (i.e. train the second model data to obtain the first model data). Sending quantization configuration information is taught by Xu, as shown above in regard to claim 1.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Hou, Shaoli, Xu, Lowell, and Amiri. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Hou teaches neural network model quantization where quantization parameters are dynamically determined based on training loss. Shaoli teaches neural network model quantization including a quantization error threshold parameter. Xu teaches methods for neural network compression for efficient communication in distributed learning, including transmitting a context object along with quantized model data. Lowell teaches neural network model quantization where quantization parameters are dynamically determined based on a training accuracy requirement. Amiri teaches model quantization for federated learning where quantization is performed not only on the model updates sent by the edge devices, but also on the model broadcasts sent by the server. One of ordinary skill would have motivation to combine Liu, Hou, Shaoli, Xu, Lowell, and Amiri because “the proposed LFL scheme, which leads to a significant communication cost saving, provides a promising performance with no visible gap to the performance of the fully lossless scenario where the communication from both PS-to-device and device-to-PS directions is assumed to be perfect” (Amiri, pg. 3, section 1).
Claims 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Shaoli and Xu.
Regarding Claim 5,
Liu teaches A model data sending method, applied to federated learning and comprising: (0011: “The purpose of this invention is to provide a federated learning method and system.”)
receiving, by a second device, second information sent by a first device, wherein the second information is quantization information based on which the second device quantizes first model data, (0014: “The edge server adjusts the batch size and gradient compression rate of the terminal based on the current batch size and gradient compression rate, combined with the terminal's computing power and the communication capability between the edge server and the terminal, and transmits the adjusted batch size and gradient compression rate to the terminal.” 0033: “In one possible implementation, when compressing gradient information, the gradient can be quantized…” The terminal (i.e. second device) receives the gradient compression rate (i.e. second information comprising quantization information) based on which the terminal will compress the gradient (i.e. first model data) via quantization).)
the first model data is model data after a current round of training; (0015: “The terminal performs model learning according to the received batch size, and outputs the gradient information obtained by model learning to the edge server after compressing it according to the received extraction compression ratio.” The terminal compresses (i.e. quantizes) gradient information obtained by model learning (i.e. model data after the current round of training).)
quantizing, by the second device, the first model data based on the second information; and (0015: “The terminal…outputs the gradient information obtained by model learning to the edge server after compressing it according to the received extraction compression ratio [gradient compression rate].” The terminal (i.e. second device) compresses (i.e. quantizes) the gradient information (i.e. first model data) based on the gradient compression rate (i.e. second information).)
sending, by the second device, a first message to the first device, wherein the first message comprises quantized first model data and [first quantization configuration information]. (0015-0016: “The terminal…outputs the gradient information obtained by model learning to the edge server after compressing it according to the received extraction compression ratio… The edge server averages all received gradient information…” The terminal (i.e. second device) sends compressed gradient information (i.e. a first message comprising quantized first model data) to the edge server (i.e. first device).)
Liu does not appear to explicitly disclose the second information comprises a quantization error threshold,
However, Shaoli teaches the second information comprises a quantization error threshold, (0101: “In the present technical scheme, the data bit width n is adjusted according to the quantization error diffbit. Furthermore, the quantization error diffbit is compared with a threshold to obtain a comparison result. The threshold includes a first threshold and a second threshold… In practical applications, the first threshold and the second threshold may be empirical values or variable hyperparameters.” The information based on which quantization is performed (i.e. second information) includes a quantization error threshold.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu and Shaoli. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Shaoli teaches neural network model quantization including a quantization error threshold parameter. One of ordinary skill would have motivation to combine Liu and Shaoli because “[i]n this way, the fixed-point computing speed may be greatly improved within a tolerance range of precision, which improves the resource utilization rate of an artificial intelligence processor chip” (Shaoli, 0102).
Liu and Shaoli do not appear to explicitly disclose first quantization configuration information.
However, Xu teaches sending first quantization configuration information (Pg. 6, section 4.2: “The framework considers context (ctx) as an opaque object that can carry any necessary metadata to allow for decompression… Below is an example function definition that takes a tensor with unique name and returns a list of compressed objects with the context needed to decompress them:
c
o
m
p
r
e
s
s
:
t
e
n
s
o
r
,
n
a
m
e
→
[
c
o
m
p
]
,
c
t
x
”. The entity sending the quantized model data (i.e. second device) also sends quantization context (i.e. quantization configuration information).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Shaoli, and Xu. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Shaoli teaches neural network model quantization including a quantization error threshold parameter. Xu teaches methods for neural network compression for efficient communication in distributed learning, including transmitting a context object along with quantized model data. One of ordinary skill would have motivation to combine Liu, Shaoli, and Xu in order to allow for dequantization: “The context stores additional information (such as mean and different norms) needed to dequantize” (Xu, pg. 7, section 4.3).
Regarding Claim 6, Liu, Shaoli, and Xu teach The method according to claim 5, as shown above.
Shaoli also teaches wherein the quantizing, by the second device, the first model data based on the second information comprises:
quantizing, by the second device, the first model data in a first quantization manner; (0110: “[I]n each iteration in the initial stage of training, the weight of the corresponding layer in the current iteration is quantized by using the data bit width used in the quantization of the corresponding layer in the previous iteration, or the weight of the current layer is quantized based on the preset data bit width n of the current layer to obtain quantized fixed-point numbers.” The model weights (i.e. first model data) are quantized using data bit width n (i.e. in a first quantization manner).)
determining, by the second device, a first quantization error based on the quantized first model data and the first model data before the quantization; and (0110: “According to the quantized weight and the corresponding pre-quantized weight, the quantization error diffbit is determined.” The quantization error is determined based on the quantized data (i.e. quantized first model data) and the pre-quantized data (i.e. first model data before quantization).)
when the first quantization error is less than the quantization error threshold, determining, by the second device, to use the first quantization manner to quantize the first model data. (0110: “According to the comparison result of the quantization error diffbit and the threshold, the data bit width n used in the quantization of the corresponding layer in the previous iteration or the preset data bit width n of the current layer is adjusted, and the adjusted data bit width is applied to the quantization of the weight of the corresponding layer in the current iteration.” 0101: “The threshold includes a first threshold and a second threshold, and the first threshold is greater than the second threshold… If the quantization error diffbit is between the first threshold and the second threshold (situation three), the data bit width remains unchanged.” In the case that the quantization error is less than the first threshold (i.e. the quantization error threshold) and greater than the second threshold, the data bit width (i.e. first quantization manner) remains unchanged and is applied to the weight quantization (i.e. the second device determines to use the first quantization manner to quantize the first model data).)
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Shaoli and Xu, and further in view of Amiri.
Regarding Claim 7, Liu, Shaoli, and Xu teach The method according to claim 5, as shown above.
Xu also teaches performing dequantization parsing based on second quantization configuration information (Pg. 7, section 4.3: “The context stores additional information (such as mean and different norms) needed to dequantize…
d
e
q
u
a
n
t
i
z
e
transforms quantized values to an approximation of the original values.” Based on the context (i.e. second quantization configuration information), the quantized values are dequantized (i.e. dequantization parsing is performed).)
Liu, Shaoli, and Xu do not appear to explicitly disclose wherein before the receiving, by a second device, second information sent by a first device, the method further comprises:
receiving, by the second device, a third message sent by the first device, wherein the third message comprises quantized second model data and second quantization configuration information, and second model data is model data before the current round of training;
performing, by the second device, dequantization parsing based on the quantized second model data and the second quantization configuration information to obtain the second model data; and
training, by the second device, the second model data, to obtain the first model data.
However, Amiri teaches wherein before the receiving, by a second device, second information sent by a first device, the method further comprises:
receiving, by the second device, a third message sent by the first device, wherein the third message comprises quantized second model data and [second quantization configuration information], and second model data is model data before the current round of training; (Pg. 2, section 1: “In this paper, we instead consider broadcasting a quantized version of the global model update by the PS [parameter server], which provides the devices with a lossy estimate of the global model (rather than its accurate estimate) with which to perform local training.” The devices (i.e. second device) receive a quantized version of the global model update (i.e. a third message comprising quantized second model data) from the parameter server (i.e. first device) with which to perform local training (i.e. the second model data is data before the current round of training). Receiving quantization configuration information along with quantized data is taught by Xu, as shown above in regard to claim 5.)
performing, by the second device, dequantization parsing based on the quantized second model data and [the second quantization configuration information] to obtain the second model data; and (Pg. 2, section 1: “The devices recover an estimate of the current global model by combining the received quantized global model update with their previous estimate, and perform local training using their estimate, and return the local model updates…” The devices (i.e. second device) recover an estimate of the global model (i.e. obtain the second model data) by combining the quantized model update with their previous model estimate (i.e. performing dequantization parsing based on the quantized second model data). Dequantization parsing based on quantization configuration information is taught by Xu, as shown above.)
training, by the second device, the second model data, to obtain the first model data. (See the portion of pg. 2, section 1 cited above. The devices (i.e. second device) perform local training using their estimate of the global model (i.e. second model data) to obtain local model updates (i.e. first model data).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Shaoli, Xu, and Amiri. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Shaoli teaches neural network model quantization including a quantization error threshold parameter. Xu teaches methods for neural network compression for efficient communication in distributed learning, including transmitting a context object along with quantized model data. Amiri teaches model quantization for federated learning where quantization is performed not only on the model updates sent by the edge devices, but also on the model broadcasts sent by the server. One of ordinary skill would have motivation to combine Liu, Shaoli, Xu, and Amiri because “the proposed LFL scheme, which leads to a significant communication cost saving, provides a promising performance with no visible gap to the performance of the fully lossless scenario where the communication from both PS-to-device and device-to-PS directions is assumed to be perfect” (Amiri, pg. 3, section 1).
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Shaoli and Hou.
Regarding Claim 8,
Liu teaches A model data sending method, applied to federated learning and comprising:
receiving, by a first device, a fourth message sent by a second device, wherein the fourth message comprises [a first quantization error] and first information, (0093: “Initialization: Each terminal needs to upload relevant information, such as computing power and the status of the wireless channel, to the base station.” The base station/edge server (i.e. first device) receives information including status of the wireless channel (i.e. first information) from the terminal (i.e. second device).)
determining, by the first device based on the [first quantization error] and the first information, whether the second device is allowed to send quantized first model data; and (0014: “The edge server adjusts the batch size and gradient compression rate of the terminal based on the current batch size and gradient compression rate, combined with the terminal's computing power and the communication capability between the edge server and the terminal…” 0033: “In one possible implementation, when compressing gradient information, the gradient can be quantized…” The edge server (i.e. first device) determines the gradient compression rate/ratio based on which the terminal (i.e. second device) will quantize the gradient (i.e. model data). When the gradient compression ratio is equal to 1, no quantization is performed. Otherwise, quantization is performed. Therefore, determining the gradient compression ratio amounts to determining whether the second device is allowed to quantize the model data.)
sending, by the first device, indication information to the second device, wherein the indication information indicates whether the second device is allowed to send the quantized first model data. (0014: “The edge server… transmits the adjusted batch size and gradient compression rate to the terminal.” The edge server (i.e. first device) transmits the gradient compression rate (i.e. an indication of whether the second device is allowed to send quantized model data) to the terminal (i.e. second device).)
the first model data is model data after the current round of training; (0015: “The terminal performs model learning according to the received batch size, and outputs the gradient information obtained by model learning to the edge server after compressing it according to the received extraction compression ratio.” The terminal compresses (i.e. quantizes) gradient information obtained by model learning (i.e. model data after the current round of training).)
Liu does not appear to explicitly disclose a first quantization error, wherein the first quantization error is determined after the second device quantizes first model data in a first quantization manner,
However, Shaoli teaches determining quantization parameters based on a first quantization error, wherein the first quantization error is determined after the second device quantizes first model data in a first quantization manner, (0110: “[I]n each iteration in the initial stage of training, the weight of the corresponding layer in the current iteration is quantized by using the data bit width used in the quantization of the corresponding layer in the previous iteration, or the weight of the current layer is quantized based on the preset data bit width n of the current layer to obtain quantized fixed-point numbers. According to the quantized weight and the corresponding pre-quantized weight, the quantization error diffbit is determined. According to the comparison result of the quantization error diffbit and the threshold, the data bit width n used in the quantization of the corresponding layer in the previous iteration or the preset data bit width n of the current layer is adjusted…” The information based on which quantization parameters are determined includes a quantization error, which is determined based on a comparison between quantized and pre-quantized model data (i.e. after the second device quantizes the model data in a first quantization manner).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu and Shaoli. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Shaoli teaches neural network model quantization where quantization parameters are dynamically determined based on quantization error. One of ordinary skill would have motivation to combine Liu and Shaoli because “[i]n this way, the fixed-point computing speed may be greatly improved within a tolerance range of precision, which improves the resource utilization rate of an artificial intelligence processor chip” (Shaoli, 0102).
Liu and Shaoli do not appear to explicitly disclose the first information comprises an evaluation loss corresponding to a current round of training,
However, Hou teaches the first information comprises an evaluation loss corresponding to a current round of training, (Pg. 3, section 3.1: “[W]e consider the loss explicitly during quantization and obtain the quantization thresholds and scaling parameter by solving an optimization problem.” The information considered when determining quantization parameters includes evaluation loss.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Shaoli, and Hou. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Shaoli teaches neural network model quantization where quantization parameters are dynamically determined based on quantization error. Hou teaches neural network model quantization where quantization parameters are dynamically determined based on training loss. One of ordinary skill would have motivation to combine Liu, Shaoli, and Hou because loss-aware weight quantization “outperforms state-of-the-art weight quantization algorithms, and is as accurate (or even more accurate) than the full-precision network” (Hou, pg. 1, Abstract).
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Shaoli and Hou, and further in view of
Kluska et al. (hereinafter Kluska), “Post-training Quantization Methods for Deep Learning Models” (published 03/04/2020).
Regarding Claim 9, Liu, Shaoli, and Hou teach The method according to claim 8, as shown above.
Liu also teaches wherein the determining, by the first device based on the first quantization error and the first information, whether the second device is allowed to send quantized first model data comprises:
determining, by the first device, a proportion of quantifiable second devices based on the first information; and (0014: “The edge server adjusts the batch size and gradient compression rate of the terminal based on the current batch size and gradient compression rate, combined with the terminal's computing power and the communication capability between the edge server and the terminal…” 0033: “In one possible implementation, when compressing gradient information, the gradient can be quantized…” 0038: “…C is the average gradient compression rate for each terminal…” The edge server (i.e. first device) determines the gradient compression rate/ratio for each terminal based on its computing power and communication capability (i.e. first information), based on which the terminals (i.e. second devices) perform quantization. When the gradient compression ratio is equal to 1, no quantization is performed. Otherwise, quantization is performed. Therefore, determining the gradient compression ratio for each terminal implicitly determines a proportion of quantizable second devices.)
determining, by the first device based on the proportion of the quantifiable second devices, [the first quantization error], and [a threshold for a quantity of consecutive quantization times], whether the second device is allowed to send the quantized first model data. (See the portions of 0014, 0033, and 0038 cited above. Each terminal (i.e. second device) is assigned a gradient compression ratio (i.e. a determination of whether quantization of model data is allowed) of the gradient compression ratios determined for the terminals (i.e. proportion of quantifiable second devices). Shaoli teaches determining quantization parameters based on quantization error, as shown above in regard to claim 8.)
Liu, Shaoli, and Hou do not appear to explicitly disclose a threshold for a quantity of consecutive quantization times
However, Kluska teaches a threshold for a quantity of consecutive quantization times (Pg. 472, section 3.3: “We propose a method that iteratively quantizes CNN with the aim to find a minimum number of bits per layer from the predefined integer set -
{
N
m
i
n
≤
n
≤
N
m
a
x
}
, where
N
m
i
n
and
N
m
a
x
define quantization search space - that does not introduce degradation loss within the given threshold.” Quantization parameters are determining based in part on a minimum number of bits
N
m
i
n
, which is effectively a maximum number of bits removed by quantization (i.e. a threshold for a quantity of consecutive quantization times).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Shaoli, Hou, and Kluska. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Shaoli teaches neural network model quantization where quantization parameters are dynamically determined based on quantization error. Hou teaches neural network model quantization where quantization parameters are dynamically determined based on training loss. Kluska teaches adaptive neural network model quantization with a limit on the number of bits that can be removed by quantization. One of ordinary skill would have motivation to combine Liu, Shaoli, Hou, and Kluska in order to avoid accumulation of quantization error by quantizing to “the smallest possible integer precision that does not introduce accuracy degradation” (Kluska, pg. 468, section 1).
Claims 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Shaoli and Hou, and further in view of Lowell.
Regarding Claim 10, Liu, Shaoli, and Hou teach The method according to claim 8, as shown above.
Liu also teaches wherein the first information further comprises information about [an accuracy requirement of the second device for model training] and communication sensitivity information. (0014: “The edge server adjusts the batch size and gradient compression rate of the terminal based on the current batch size and gradient compression rate, combined with the terminal's computing power and the communication capability between the edge server and the terminal…” The information based on which quantization information is determined (i.e. first information) includes communication capability between the edge server and the terminal (i.e. communication sensitivity information – see specification paragraph 0066 of the instant application).)
Liu, Shaoli, and Hou do not appear to explicitly disclose an accuracy requirement of the second device for model training
However, Lowell teaches wherein the first information further comprises information about an accuracy requirement of the second device for model training (0015: “[T]he method includes recalculating the distribution of ANN information and reselecting the quantization function from the set of quantization functions based on the recalculated distribution, if the output does not sufficiently correlate with a known correct output.” 0045: “The difference between the output and the known correct output can be referred to as the training error. On condition 450 that the training error is acceptable (e.g., the difference is below an acceptable threshold, or a heuristic applied to the output and the known correct output satisfies a desired condition), ANN 300 can be considered to be trained on this training data set.” The information based on which quantization information is determined (i.e. first information) includes a model training error threshold (i.e. an accuracy requirement of the second device for model training).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Shaoli, Hou, and Lowell. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Shaoli teaches neural network model quantization where quantization parameters are dynamically determined based on quantization error. Hou teaches neural network model quantization where quantization parameters are dynamically determined based on training loss. Lowell teaches neural network model quantization where quantization parameters are dynamically determined based on a training accuracy requirement. One of ordinary skill would have motivation to combine Liu, Shaoli, Hou, and Lowell in order to ensure that quantization results in an acceptable level of model accuracy.
Regarding Claim 11, Liu, Shaoli, Hou, and Lowell teach The method according to claim 10, as shown above.
Liu also teaches wherein before the receiving, by a first device, a fourth message sent by a second device, the method further comprises:
receiving, by the first device, a second message sent by the second device, wherein the second message comprises information about [the accuracy requirement] and the communication sensitivity information. (0093: “Initialization: Each terminal needs to upload relevant information, such as computing power and the status of the wireless channel, to the base station.” The base station/edge server (i.e. first device) receives information including status of the wireless channel (i.e. a second message comprising communication sensitivity information) from the terminal (i.e. second device).)
Lowell teaches that model information used to determine quantization parameters includes the accuracy requirement, as shown above in regard to claim 10.
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Shaoli, Hou, and Lowell and further in view of Amiri.
Regarding Claim 12, Liu, Shaoli, Hou, and Lowell teach The method according to claim 10, as shown above.
Liu, Hou, and Lowell also teach determining quantization parameters based on (third) information, wherein the third information comprises an evaluation loss [corresponding to a previous round of training], the information about the accuracy requirement, and the communication sensitivity information, and, as shown above.
Lowell teaches determining, by the first device, a proportion of quantifiable layers in second model data based on third information, (0043: “Rather than determining a single quantization function for all link weights in ANN 300, quantization can be performed on a per-layer basis, or for each subset of a plurality of subsets of layers.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Shaoli, Hou, and Lowell. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Shaoli teaches neural network model quantization where quantization parameters are dynamically determined based on quantization error. Hou teaches neural network model quantization where quantization parameters are dynamically determined based on training loss. Lowell teaches neural network model quantization where quantization parameters are dynamically determined based on a training accuracy requirement. One of ordinary skill would have motivation to combine Liu, Shaoli, Hou, and Lowell because “Selecting a quantization function that is appropriate for each layer can have the advantage of increasing the effectiveness of the link weight quantization in ANN 300 as compared to determining a single quantization function for all link weights” (Lowell, 0043).
Liu, Shaoli, Hou, and Lowell do not appear to explicitly disclose wherein before the receiving, by a first device, a fourth message sent by a second device, the method further comprises:
determining quantization parameters based on information corresponding to a previous round of training
the second model data is model data before the current round of training;
quantizing, by the first device, the second model data based on the proportion of the quantifiable layers, to obtain quantized second model data; and
sending, by the first device, a third message to the second device, wherein the third message comprises the quantized second model data and second quantization configuration information, and the third message is training information based on which the second device trains the second model data to obtain the first model data.
However, Amiri teaches wherein before the receiving, by a first device, a fourth message sent by a second device, the method further comprises:
determining quantization parameters based on information corresponding to a previous round of training (Pg. 2, section 1: “To be precise, the PS [parameter server] exploits the knowledge of the last global model estimate available at the devices as side information to quantize the global model update.”)
the second model data is model data before the current round of training; (Pg. 2, section 1: “In this paper, we instead consider broadcasting a quantized version of the global model update by the PS, which provides the devices with a lossy estimate of the global model (rather than its accurate estimate) with which to perform local training.” The global model update (i.e. second model data) is provided to the devices to perform training (i.e. the data is model data before the current round of training).)
quantizing, by the first device, the second model data [based on the proportion of the quantifiable layers], to obtain quantized second model data; and (Pg. 2, section 1: “We introduce a lossy FL (LFL) algorithm, where at each iteration the PS broadcasts a compressed version of the global model update to all the devices through quantization.” The PS (i.e. first device) quantizes the global model update (i.e. second model data) to obtain a compressed version (i.e. quantized second model data). The proportion of quantizable layers is taught by Lowell, as shown above.)
sending, by the first device, a third message to the second device, wherein the third message comprises the quantized second model data and [second quantization configuration information], and the third message is training information based on which the second device trains the second model data to obtain the first model data. (See the portions of pg. 2, section 1 cited above. The PS (i.e. first device) sends the quantized version of the global model update (i.e. a third message comprising quantized second model data) to the devices (i.e. second device) to perform local training (i.e. train the second model data to obtain the first model data). Sending quantization configuration information is taught by Xu, as shown above in regard to claim 8.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Shaoli, Hou, Lowell, and Amiri. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Shaoli teaches neural network model quantization where quantization parameters are dynamically determined based on quantization error. Hou teaches neural network model quantization where quantization parameters are dynamically determined based on training loss. Lowell teaches neural network model quantization where quantization parameters are dynamically determined based on a training accuracy requirement. Amiri teaches model quantization for federated learning where quantization is performed not only on the model updates sent by the edge devices, but also on the model broadcasts sent by the server. One of ordinary skill would have motivation to combine Liu, Shaoli, Hou, Lowell, and Amiri because “the proposed LFL scheme, which leads to a significant communication cost saving, provides a promising performance with no visible gap to the performance of the fully lossless scenario where the communication from both PS-to-device and device-to-PS directions is assumed to be perfect” (Amiri, pg. 3, section 1).
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Hou, Shaoli, and Xu, and further in view of
Krishnamoorthi et al. (hereinafter Krishnamoorthi), “Quantizing deep convolutional networks for efficient inference: A whitepaper” (published 06/21/2018).
Regarding Claim 14, Liu, Hou, Shaoli, and Xu teach The method according to claim 1, as shown above.
Liu, Hou, Shaoli, and Xu do not appear to explicitly disclose wherein the first quantization configuration information includes a quantity of quantized bits of the first model data, uniform quantization or non-uniform quantization, a quantized zero point, an offset value, and/or a scaling factor.
However, Krishnamoorthi teaches wherein the first quantization configuration information includes a quantity of quantized bits of the first model data, uniform quantization or non-uniform quantization, a quantized zero point, an offset value, and/or a scaling factor. (Pg. 4-5, section 2.1: “Consider a floating point variable with range
(
x
m
i
n
,
x
m
a
x
)
that needs to be quantized to the range
(
0
,
N
l
e
v
e
l
s
-
1
)
where
N
l
e
v
e
l
s
=
256
for 8-bits of precision. We derive two parameters: Scale (
∆
) and Zero-point (
z
) which map the floating point values to integers… The de-quantization operation is:
x
f
l
o
a
t
=
(
x
Q
-
z
)
∆
” Parameters which are used for quantization and dequantization (i.e. quantization configuration information) include a scaling factor and zero-point.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Hou, Shaoli, Xu, and Krishnamoorthi. Liu teaches model quantization for federated learning where quantization parameters are dynamically determined by the server based on information from clients. Hou teaches neural network model quantization where quantization parameters are dynamically determined based on training loss. Shaoli teaches neural network model quantization including a quantization error threshold parameter. Xu teaches methods for neural network compression for efficient communication in distributed learning, including transmitting a context object along with quantized model data. Krishnamoorthi teaches methods for neural network model quantization, including uniform affine quantization, where quantization and dequantization operations are parameterized by a scaling factor and zero-point. One of ordinary skill would have motivation to combine Liu, Hou, Shaoli, Xu, and Krishnamoorthi because quantization using a scaling factor and zero-point allows for accurate dequantization and “ensure[s] that common operations like zero padding do not cause quantization error” (Krishnamoorthi, pg. 4, section 2.1).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN M ROHD whose telephone number is (571)272-6445. The examiner can normally be reached Mon-Thurs 8:00-6:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/B.M.R./Examiner, Art Unit 2147
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147