Office Action Analysis: 18176867 — BI-DIRECTIONAL GRADIENT COMPRESSION FOR DISTRIBUTED AND FEDERATED LEARNING

Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 3/1/2023 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

With regard to Claim 1,

Step 2A, Prong 1
This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04, subsection II, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim.

Claim 1 recites:
A method comprising: computing, by each client participating in a round of a distributed learning (DL) or federated learning (FL) procedure for training an artificial neural network (ANN), a gradient with respect to a local copy of the ANN; compressing, by the client, the gradient using a linear quantization technique that is identical across all clients participating in the round; transmitting, by the client, the compressed gradient to a parameter server; receiving, by the client, a compressed global gradient from the parameter server; decompressing, by the client, the compressed global gradient using the linear quantization technique; and updating, by the client, one or more model weights of the local copy of the ANN based on the decompressed global gradient.

The broadest reasonable interpretation of the bolded limitations above are directed to mathematical concepts. Computing a gradient, compressing using linear quantization and decompressing using the same are mathematical concepts.

Step 2A, Prong 1 (Yes).

Step 2A, Prong 2
This part of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception or whether the claim is “directed to” the judicial exception. This evaluation is performed by (1) identifying whether there are any additional elements recited in the claim beyond the judicial exception, and (2) evaluating those additional elements individually and in combination to determine whether the claim as a whole integrates the exception into a practical application. See MPEP 2106.04(d).

The additional elements are the clients, the transmitting receiving step, and the updating step.
The clients are generic computer components performing generic computer functions.
The receiving/transmitting steps are mere data gathering and outputting recited at a high level of generality and thus are insignificant extra-solution activity. See MPEP 2106.05(g). 
The updating step amounts to no more than mere instructions to apply the exception using a generic computer. See MPEP 2106.05(f).

Even when viewed in combination the additional element does not integrate the recited judicial exception into a practical application.

Step 2A, Prong 2 (No).

Step 2B
This part of the eligibility analysis evaluates whether the claim as a whole amounts to significantly more than the recited exception i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim. See MPEP 2106.05.

As discussed above:
The clients are generic computer components performing generic computer functions.
The receiving/transmitting steps are mere data gathering and outputting recited at a high level of generality and thus are insignificant extra-solution activity. See MPEP 2106.05(g). These elements amount to receiving or transmitting data over a network and are well-understood, routine and conventional activity.
The updating step amounts to no more than mere instructions to apply the exception using a generic computer. See MPEP 2106.05(f).


Step 2B (No).
Claim 1 is ineligible.
Claim 8 and 15 are similar in scope and rejected likewise.

Dependent Claims:

Each of the dependent claims merely elaborates on the specific mathematical concepts and do not provide any additional elements.
	Thus these claims are ineligible.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 8, 10, 15, 17 are rejected under 35 U.S.C. 103 as being unpatentable over Lee (US 20230196205 A1), in view of Wu (US 20250086474 A1).
Regarding claim 1, Lee discloses “computing, by each client participating in a round of a distributed learning (DL) or federated learning (FL) procedure for training an artificial neural network (ANN), a gradient with respect to a local copy of the ANN;” (See [0100], [0102]; the gradient is computed by each wireless device (client) and this is done for each iteration of distributed learning)
“compressing, by the client, the gradient;” (See [0010]; compressed gradients from the parameters are generated by the local device (client))
“transmitting, by the client, the compressed gradient to a parameter server;” (See [0010]; the local device (client) transmits the compressed gradient to a parameter server)
“receiving, by the client, a compressed global gradient from the parameter server;” (See [0024]; the compressed global model is received by the parameter server)
	“decompressing, by the client, the compressed global gradient using the linear quantization technique; and” (See [0024], [0135]; the gradients are reconstructed (decompressing gradients is also referred to as reconstructing) from the model received from the server using a linear quantization process (a non-linear quantization process transformed into a linear quantization process))
	“updating, by the client, one or more model weights of the local copy of the ANN based on the decompressed global gradient.” (See [0028]; the local model parameters are updated based on the global model from the server)
	Lee fails to explicitly disclose, “compressing … using a linear quantization technique”.
	Wu teaches “compressing … using a linear quantization technique” (See [0026], [0065]; a linear quantization is used to compress gradients with a computation device (client)).
	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention having Lee and Wu before them to modify Lee to compress using linear quantization by the client. One would be motivated to do so in order to specify which compression technique to use, as well as the fact that linear quantization is one of the most common techniques for quantization, which is intended to reduce model size through compression.

	Regarding claim 3, Lee discloses “prior to compressing the gradient, the client pre-processes the gradient by applying a transform,” (See [0057]; Lee discloses that generating compressed gradients comprises transforming the gradients to prepare them for compression)
	“wherein subsequently to decompressing the compressed global gradient, the client applies an inverse transform to the decompressed global gradient that corresponds to the transform.” (See [0068]; Lee discloses reconstructing (decompressing) gradients using an inverse normalization transformation)

	Regarding claims 8 and 15, these claims are similar in scope to claim 1.
	Regarding claims 10 and 17, these claims are similar in scope to claim 3.

Claim Rejections - 35 USC § 103
Claims 2, 9, 16 are rejected under 35 U.S.C. 103 as being unpatentable over Lee (US 20230196205 A1), in view of Wu (US 20250086474 A1), and further in view of Marzban (US 20240171991 A1).
	Regarding claim 2, Lee fails to explicitly disclose, “upon receiving compressed gradients from said all clients participating in the round, the parameter server computes the compressed global gradient by aggregating the compressed gradients without performing any decompression.”
	Mazban teaches “upon receiving compressed gradients from said all clients participating in the round, the parameter server computes the compressed global gradient by aggregating the compressed gradients without performing any decompression” (See [0086]; Marzban discloses receiving gradients or compressed gradients to compute a global gradient by aggregating the gradients and makes no mention of any decompression in the entire publication).
	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention having Lee and Mazban before them to modify Lee to aggregate the compressed gradients without decompressing them. One would be motivated to do so in order to skip the amount of computational work and runtime needed to decompress the gradients before aggregating them, as aggregation does not require the gradients to be decompressed if it is possible to aggregate them without decompression.

Regarding claims 9 and 16, these claims are similar in scope to claim 2.

Claim Rejections - 35 USC § 103
Claims 4, 11, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Lee (US 20230196205 A1), in view of Wu (US 20250086474 A1), and further in view of Theodoridis (Pattern Recognition).
	Regarding claim 4, Lee fails to explicitly disclose, “the transform and the inverse transform are super- linear in time complexity.”
	Theodoridis teaches “the transform and the inverse transform are super- linear in time complexity” (See [Section 6.10, The Hadamard Transform, page 369]; Theodoridis discloses that the Hadamard transform, one of the transformations listed as an example in the application's specification, has a time complexity of O(N log N), which is a super-linear time complexity (faster than exponential time, but slower than linear time)).
	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention having Lee and Theodoridis before them to modify Lee to specify what the time complexity of the algorithms used for the transform and inverse transform are. One would be motivated to do so in order to define what kinds of algorithms could be used for the transformations, as only a specific class of algorithms are able to run at a specific time complexity, and competing algorithms would have to either run at a slower or faster time complexity. Other algorithms that run at a similar time complexity could potentially be determined to be too similar to the algorithm used by the applicant.

Regarding claims 11 and 18, these claims are similar in scope to claim 4.
Claim Rejections - 35 USC § 103
Claims 5, 12, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Lee (US 20230196205 A1), in view of Wu (US 20250086474 A1), and further in view of Alistarh (QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding).
	Regarding claim 5, Lee fails to explicitly disclose, “the compressing comprises: compressing the gradient using stochastic quantization with a quantization range that is common to said all clients participating in the round.”
	Alistarh teaches “the compressing comprises: compressing the gradient using stochastic quantization with a quantization range that is common to said all clients participating in the round.” (See [Section 3.1]; Alistarh discloses compressing a gradient using a stochastic quantization function and also shows that the number of quantization levels to use as the range is common to each value (client)).
	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention having Lee and Alistarh before them to modify Lee to compress gradients using stochastic quantization while using a quantization range that all clients would use for the current round. One would be motivated to compress using stochastic quantization to reduce the size of gradients for the purpose of reducing communication bandwidth between the server and clients. Additionally, using a common quantization range ensures that all gradients are compressed with stochastic quantization in a consistent manner.

Regarding claims 12 and 19, these claims are similar in scope to claim 5.




Claim Rejections - 35 USC § 103
Claims 6, 7, 13, 14, 20, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Lee (US 20230196205 A1), in view of Wu (US 20250086474 A1), and further in view of Alistarh (QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding), and further view in of Subramanian (Practical Quantization in PyTorch).
	Regarding claim 6, Lee fails to explicitly disclose, “the quantization range is statically set at the start of the DL or FL procedure”.
	Subramanian teaches “the quantization range is statically set at the start of the DL or FL procedure” (See [Post-Training Static Quantization (PTQ), page 12]; Subramanian discloses setting the range by pre-calibrating the range before the operations).
	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention having Lee and Subramanian before them to modify Lee to set the quantization range to a static range at the start of a DL or FL procedure before running the operations. One would be motivated to do so in order to pre-determine a range so the procedure has a set quantization range to adhere to during runtime.
	
Regarding claim 7, Lee fails to explicitly disclose, “the quantization range is dynamically adjusted for each round of the DL or FL procedure”.
	Subramanian teaches “the quantization range is dynamically adjusted for each round of the DL or FL procedure” (See []; Subramanian discloses setting the range during inference for each input).
	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention having Lee and Subramanian before them to modify Lee to change the quantization range during runtime for each round of the procedure. One would be motivated to do so in order to adjust the range during runtime to adapt to the changes in data distribution during the procedure, as this ensures that the model remains accurate as the data evolves.
Regarding claims 13 and 20, these claims are similar in scope to claim 6.
Regarding claims 14 and 21, these claims are similar in scope to claim 7.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID KIM whose telephone number is (571)272-4331. The examiner can normally be reached 7:30 AM - 4:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Ell can be reached at (571) 270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/D.K./Examiner, Art Unit 2141                                                                                                                                                                                                        
/MATTHEW ELL/Supervisory Patent Examiner, Art Unit 2141
Read full office action
BI-DIRECTIONAL GRADIENT COMPRESSION FOR DISTRIBUTED AND FEDERATED LEARNING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

BI-DIRECTIONAL GRADIENT COMPRESSION FOR DISTRIBUTED AND FEDERATED LEARNING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email