Last updated: April 19, 2026
Application No. 17/714,884
GRADIENT GROUPING FOR COMPRESSION IN FEDERATED LEARNING FOR MACHINE LEARNING MODELS

Final Rejection §101§103
Filed
Apr 06, 2022
Examiner
ALGHAZZY, SHAMCY
Art Unit
2128
Tech Center
2100 — Computer Architecture & Software
Assignee
Qualcomm Incorporated
OA Round
4 (Final)
Interview Optional

— +0.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 62 resolved cases, 2023–2026
Examiner Intelligence

ALGHAZZY, SHAMCY View full profile →
Grants 48% of resolved cases
Career Allow Rate
30 granted / 62 resolved
-6.6% vs TC avg
Minimal +1% lift
Without
With
+0.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
25 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
34.9%
-5.1% vs TC avg
§103
39.3%
-0.7% vs TC avg
§102
11.1%
-28.9% vs TC avg
§112
10.0%
-30.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 62 resolved cases
Office Action

§101 §103
DETAILED ACTION
This final rejection is responsive to the claims filed on 23-APRIL-2025. Claims 1-30 are pending. Claims 1, 11, 18, and 28 are independent claims. Claims 1 and 18 are amended.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner's Note
The Examiner respectfully requests of the Applicant in preparing responses, to fully consider the entirety of the reference(s) as potentially teaching all or part of the claimed invention.  It is noted, REFERENCES ARE RELEVANT AS PRIOR ART FOR ALL THEY CONTAIN.  “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned.  They are part of the literature of the art, relevant for all they contain.”  In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)).  A reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art, including non-preferred embodiments (see MPEP 2123).  The Examiner has cited particular locations in the reference(s) as applied to the claim(s) above for the convenience of the Applicant.  Although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim(s), typically other passages and figures will apply as well.

Response to Amendment
The amendments dated 12/15th/2025 have been entered and considered by the examiner. 
The amendments to overcome the rejections have been fully considered and are moot in light of the new rejections below. 

Response to Arguments
Applicant's arguments filed 12/15th/2025, referenced herein as the Remarks, have been fully considered but they are not persuasive. The rejections under 35 U.S.C. are maintained, and updated as necessitated by amendment.

Applicant argument# 1:
Applicant submits that claim 1 does not recite a mental process. It is noted that it is
impossible to perform federated learning with a UE and a network entity in the human mind.
More specifically, claim 1 recites wireless communication with analog over the air (OTA)
signals on a multiple access channel. Such communication is impossible to perform in the
human mind and is far more than a general linking to a particular technological environment.
Applicant submits that is also impossible to perform in the huma mind: "transmitting the
representative values to the network entity for the first communication round of the federated
learning in an analog over the air (OTA) signal on a multiple access channel for OTA
aggregation." Moreover, it is submitted that such a limitation is not extra-solution activity.
Paragraph 78 discusses the disadvantages of digital communication. Claim 1 proposes a solution
with analog communication, more specifically, transmitting an analog OTA signal on a multiple
access channel for OTA aggregation, to overcome the shortcomings of digital transmissions.

Examiner’s response# 1:
The examiner respectfully disagrees. Please refer to the 101 analysis below to review the claim limitations that are analyzed as mental processes and the limitations that are analyzed as additional elements. Each claim is analyzed based on the individual limitations as being a mental process or an additional element. Performing federated learning is analyzed as instructions to implement the exception using generic computer components. Transmission of data in any format is recited at a high level of generality and amounts to extra-solution activity of transmitting data (see MPEP 2106.05(g)).

Applicant argument# 2:
FedZip requires digital communication. See page 2, 3rd paragraph "the weights ... of
clients' models are aggregated inside the server." Digital communication permits sparsification
and quantization. As noted in applicant's paragraph 77, quantization and sparsification are not
applicable to analog transmission based over the air computation. As such, FedZip is not
applicable, as it relies on techniques that will not operate with the claimed analog OTA signal.
Similarly, CADSGD requires sparsification and quantization. As such, CADSGD is not
applicable to claim 1 as CADSGD operates in a digital environment that is quite different from
what is claimed.
Examiner’s response# 2:
The examiner respectfully disagrees. The applicant makes a general allegation that the references used are inapplicable and not appropriate for rejecting the current claims without providing an explanation as to why that is. CADSGD does, as laid out in the prior art rejection section teach transmitting analog signals OTA .

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-30 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Step 1 For All Claims
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Regarding Step 1 of the Alice/Mayo framework, claims 1-10 are directed to a method (a process), claims 11-17 are directed to a method (a process), claims 18-27 are directed to an apparatus (a machine), and claims 28-30 are directed to an apparatus (a machine), which each fall within one of the four statutory categories.

Claim 1
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
Claim 1 recites the following mental processes, that in each case under the broadest reasonable interpretation (BRI) as written and in light of the instant specification, covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components (e.g., “user equipment”):
“computing a set of gradient vector parameters”. Paragraph [0063] of the instant specification states “At the top layer, the gradient may correspond directly to the value of a weight connecting an activated neuron in the penultimate layer and a neuron in the output layer. In lower layers, the gradient may depend on the value of the weights and on the computed error gradients of the higher layers”, but does not appear to explicitly define a gradient. Further, paragraph [0088] of the instant specification states “At block 704, the user equipment (UE) computes a set of gradient vector parameters during a first communication round of the federated learning for the machine learning model using a local dataset. For example, the UE (e.g., using the controller/processor 280, and/or memory 282) may compute the set of gradient vector parameters”, but does not appear to explicitly define calculating a gradient. As drafted and under its BRI in light of the instant specification, this limitation encompasses determining numerical values, which is reasonably understood to be directed to a mental concept (i.e., evaluation or judgement) based on a mathematical concept (i.e., mathematical calculations) and thus falls under the abstract idea of a mental process based on a mathematical concept (i.e., mathematical relationships, mathematical formulas or equations, and mathematical calculations). As drafted and under its BRI, this limitation encompasses utilizing previously calculated numerical values to determine a new numerical value.
“grouping the set of gradient vector parameters of the machine learning model into a plurality of subsets”. As drafted and under its BRI in light of the instant specification, this limitation falls under the abstract idea of a mental process. As drafted and under its BRI in light of the instant specification, this limitation encompasses partitioning, grouping, or clustering vectors for a grouping pattern.
“computing a representative value of all gradients within each of the plurality of subsets to obtain representative values for each of the plurality of subsets”. Paragraph [0082] of the instant specification states “computing the representative value (e.g., averaging)”, but does not appear to explicitly define computing a representative value. As drafted and under its BRI in light of the instant specification, this limitation encompasses performing computational operations, which are reasonably understood to be directed to a mental concept (i.e., evaluation or judgement) based on a mathematical concept (i.e., mathematical calculations) and thus falls under the abstract idea of a mental process based on a mathematical concept (i.e., mathematical relationships, mathematical formulas or equations, and mathematical calculations). As drafted and under its BRI, this limitation encompasses applying a mathematical formula, such as average, mean, or median, to a set of values to determine a single value.
Because the claim recites limitations which can practically be implemented as mental processes and/or mathematical calculations, the claim recites an abstract idea.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional elements of “receiving, from a network entity, a machine learning model for federated learning”, which is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e., pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).
Furthermore, the claim recites the additional elements of “receiving, from the network entity, a grouping configuration for grouping a quantity of parameters in each layer of the machine learning model”, which is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e., pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).
Regarding the “receiving, from the network entity, a grouping configuration for grouping a quantity of parameters in each layer of the machine learning model”, no details of the model are recited and the model and its training are recited at a high level of generality such that the model can be constructed by hand with pen and paper. The claimed “machine learning model for federated learning” under the BRI, in light of the specification, could be constructed by hand with pen and paper and then manually modified/trained based on a reasonable amount of observed data (i.e., training data). The model is recited at a high level of generality and therefore is being interpreted as performing a mental process on a generic computer. See MPEP 2106.04(a)(2) § III.C which states that “a concept that is performed in the human mind and applicant is merely claiming that concept performed 1) on a generic computer, or 2) in a computer environment, or 3) is merely using a computer as a tool to perform the concept” still recite a mental process. 
In particular, the claim additionally recites the further element of “during a first communication round of the federated learning for the machine learning model using a local dataset… for the first communication round of the federated learning in an analog over the air (OTA) signal on a multiple access channel for OTA aggregation”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
In particular, the claim additionally recites the further element of “in accordance with the grouping configuration received from the network entity”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
In particular, the claim further recites the additional element of “and transmitting the representative values to the network entity for the first communication round of the federated learning in an analog over the air (OTA) signal on a multiple access channel for OTA aggregation”, which is recited at a high level of generality and amounts to extra-solution activity of transmitting data, i.e., pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)). The examiner notes that, as written, the recitation of signal format is directed to requiring the transmission of the values be performed in the specified format, not to performing translation of the format or performing aggregation of the values.
Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “receiving, from a network entity, a machine learning model for federated learning” is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e., pre-solution activity of gathering data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
As discussed above, the further element of “receiving, from the network entity, a grouping configuration for grouping a quantity of parameters in each layer of the machine learning model”, is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e., pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)). The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
As discussed above, the further element of “during a first communication round of the federated learning for the machine learning model using a local dataset… for the first communication round of the federated learning in an analog over the air (OTA) signal on a multiple access channel for OTA aggregation” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)).
As discussed above, the further element of “in accordance with the grouping configuration received from the network entity”. amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)).
As discussed above, the additional element of “and transmitting the representative values to the network entity for the first communication round of the federated learning in an analog over the air (OTA) signal on a multiple access channel for OTA aggregation” is recited at a high level of generality and amounts to extra-solution activity of transmitting data, i.e., pre-solution activity of gathering data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.

Claim 2
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “the plurality of subsets each include a number of parameters, the number of parameters being global to the machine learning model”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “the plurality of subsets each include a number of parameters, the number of parameters being global to the machine learning model” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 3
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “the plurality of subsets each include a number of parameters, the number of parameters being local to a weight matrix at each neural network layer of the machine learning model”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “the plurality of subsets each include a number of parameters, the number of parameters being local to a weight matrix at each neural network layer of the machine learning model” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 4
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “the plurality of subsets each include a number of parameters, the number of parameters being local to a column of a weight matrix at each neural network layer of the machine learning model”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “the plurality of subsets each include a number of parameters, the number of parameters being local to a column of a weight matrix at each neural network layer of the machine learning model” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 5
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim recites in part: 
“adaptively adjusting a learning rate to reduce a level of distortion resulting from a mismatch between computed gradients and transmitted gradients”. Paragraph [0082] of the instant specification states “A learning rate is a tuning parameter in an optimization algorithm, such as gradient descent, used for training machine learning models. According to aspects of the present disclosure, the learning rate can be adaptively adjusted to reduce or even minimize the distortion due to not transmitting the full gradient vector. For example, the learning rate may be adjusted to reduce a level of distortion resulting from a mismatch between computed gradients and transmitted gradients”, but does not appear to explicitly define adaptively adjusting a learning rate. As drafted and under its BRI in light of the instant specification, this limitation falls under the abstract idea of a mental process. As drafted and under its BRI in light of the instant specification, this limitation encompasses evaluating or judging modifications (equivalently, adjustments) of the learning rate parameter, such as increasing or decreasing the value based on the measured distance between two sets of values.
Because the claim recites a limitation which can practically be implemented as mental processes and/or mathematical calculations, the claim recites an abstract idea.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “due to grouping the set of gradient vector parameters”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “due to grouping the set of gradient vector parameters” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 6
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim recites in part: 
“computing a difference between true gradients and the transmitted representative values”. Paragraph [0086] of the instant specification states “Gradient accumulation refers to finding a mismatch between the computed and transmitted gradients in each UE at time t (or in the current round) and adding this difference to the computed gradient for the related UE at time t+1 (or in the next round)”, but does not appear to explicitly define computing a difference between gradients. As drafted and under its BRI in light of the instant specification, this limitation encompasses performing comparison operations, which are reasonably understood to be directed to a mental concept (i.e., evaluation) based on a mathematical concept (i.e., mathematical calculations) and thus falls under the abstract idea of a mental process based on a mathematical concept (i.e., mathematical relationships, mathematical formulas or equations, and mathematical calculations). As drafted and under its BRI, this limitation encompasses determining a measure of distance between numerical values.
“and adding the difference to a next gradient vector”. As drafted and under its BRI in light of the instant specification, this limitation falls under the abstract idea of a mathematical calculation. As drafted and under its BRI in light of the instant specification, this limitation encompasses adding a numerical value to a numerical vector.
Because the claim recites limitations which can practically be implemented as mental processes and/or mathematical calculations, the claim recites an abstract idea.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “for a second communication round of the federated learning”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “for a second communication round of the federated learning” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 7
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim recites in part: 
“grouping the gradient vector parameters of the machine learning model into different subsets with a different grouping pattern”. As drafted and under its BRI in light of the instant specification, this limitation falls under the abstract idea of a mental process. As drafted and under its BRI in light of the instant specification, this limitation encompasses partitioning or clustering numerical parameters based on a specified pattern.
Because the claim recites a limitation which can practically be implemented as mental processes and/or mathematical calculations, the claim recites an abstract idea.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “for a second communication round”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “for a second communication round” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 8
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim recites in part: 
“grouping the gradient vector parameters of the machine learning model sequentially”. As drafted and under its BRI in light of the instant specification, this limitation falls under the abstract idea of a mental process. As drafted and under its BRI in light of the instant specification, this limitation encompasses partitioning or clustering numerical parameters sequentially.
Because the claim recites a limitation which can practically be implemented as mental processes and/or mathematical calculations, the claim recites an abstract idea.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “for a random access channel (RACH) communication round”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “for a random access channel (RACH) communication round” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 9
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim recites in part: 
“interleaving the gradient vector parameters”. As drafted and under its BRI in light of the instant specification, this limitation falls under the abstract idea of a mental process. As drafted and under its BRI in light of the instant specification, this limitation encompasses ordering the numerical gradient vector parameters in a specified manner.
Because the claim recites a limitation which can practically be implemented as mental processes and/or mathematical calculations, the claim recites an abstract idea.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional elements of “prior to grouping the gradient vector parameters” and “an interleaving pattern determined deterministically in accordance with a function of known parameters”. Such limitations amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the elements of “prior to grouping the gradient vector parameters” and “an interleaving pattern determined deterministically in accordance with a function of known parameters” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 10
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “sampling the gradient vector parameters”, which is recited at a high level of generality and amounts to extra-solution activity of collecting data, i.e., pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).
In particular, the claim further recites the additional element of “a sampling pattern determined deterministically in accordance with a function of known parameters”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “sampling the gradient vector parameters” is recited at a high level of generality and amounts to extra-solution activity of collecting data, i.e., pre-solution activity of gathering data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
As discussed above, the additional element of “a sampling pattern determined deterministically in accordance with a function of known parameters” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)).
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.

Claim 11
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
Claim 11 recites the following mental processes, that in each case under the BRI as written and in light of the instant specification, covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components (e.g., “network entity”):
“reconstructing full dimensional gradient vectors based on the representative values”. Paragraph [0081] of the instant specification states “Because the parameter server is aware of the grouping structure, the parameter server can reconstruct the full dimensional gradient vector from the received signal”, but does not appear to explicitly define reconstructing full dimensional gradient vectors based on the received values. As drafted and under its BRI in light of the instant specification, this limitation falls under the abstract idea of a mental process. As drafted and under its BRI in light of the instant specification, this limitation encompasses duplicating the representative value once for each item in its corresponding group (or cluster) of gradient parameters within the user equipment.
“and updating the machine learning model based on the full dimensional gradient vectors”. Paragraph [0007] of the instant specification states “The method includes updating the machine learning model based on the full dimensional gradient vectors” and paragraph [0073] of the instant specification states “Between each layer 556, 558, 560, 562, 564 of the deep convolutional network 550 are weights (not shown) that are to be updated”, but does not appear to explicitly define updating the machine learning model. As drafted and under its BRI in light of the instant specification, this limitation encompasses performing update operations for numerical parameters within a defined model, which are reasonably understood to be directed to a mental concept (i.e., optimization) based on a mathematical concept (i.e., mathematical calculations) and thus falls under the abstract idea of a mental process based on a mathematical concept (i.e., mathematical relationships, mathematical formulas or equations, and mathematical calculations). As drafted and under its BRI, this limitation encompasses modifying numerical parameters by applying mathematical functions with pre-computed data.
Because the claim recites a limitation which can practically be implemented as mental processes, the claim recites an abstract idea.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional elements of “transmitting, to a plurality of user equipment (UEs), a machine learning model for federated learning” and “transmitting, to the plurality of UEs, a grouping structure to enable the plurality of UEs to group sets of gradient vector parameters for the machine learning model into a plurality of subsets” which are both recited at a high level of generality and amounts to extra-solution activity of transmitting data, i.e., pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).
In particular, the claim recites the additional element of “the grouping structure comprising at least one of a grouping pattern for a specific round of federated learning, an interleaving pattern, or a sampling pattern”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
In particular, the claim recites the additional element of “receiving, from each of the plurality of UEs in an analog over the air (OTA) aggregated signal on a multiple access channel, representative values for each of the plurality of subsets”, which is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e., pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)). The examiner notes that, as written, the recitation of signal format is directed to requiring the values are in the specified format, not to performing translation of the format or performing aggregation of the values.
In particular, the claim recites the additional element of “from each of the plurality of UEs in an analog over the air (OTA) aggregated signal on a multiple access channel, representative values for each of the plurality of subsets”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
In particular, the claim recites the additional element of “and updating the machine learning model based on the full dimensional gradient vectors”, which can be characterized as insignificant extra solution activity that is well understood routine and conventional. See MPEP 2106.05(g). This additional element repeats mental steps (based on mathematical concepts), as discussed above regarding the mental process based on a mathematical computation to update the machine learning model by performing mathematical computations between the model’s parameters and pre-computed numerical values.
Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the elements of “transmitting, to a plurality of user equipment (UEs), a machine learning model for federated learning” and “transmitting, to the plurality of UEs, a grouping structure to enable the plurality of UEs to group sets of gradient vector parameters for the machine learning model into a plurality of subsets” are recited at a high level of generality and amount to extra-solution activity of transmitting data, i.e., pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
As discussed above, the element of “the grouping structure comprising at least one of a grouping pattern for a specific round of federated learning, an interleaving pattern, or a sampling pattern” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)).
As discussed above, the element of “receiving, from each of the plurality of UEs in an analog over the air (OTA) aggregated signal on a multiple access channel, representative values for each of the plurality of subsets” is recited at a high level of generality and amounts to extra-solution activity of transmitting data, i.e., pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
As discussed above, the element of “from each of the plurality of UEs in an analog over the air (OTA) aggregated signal on a multiple access channel, representative values for each of the plurality of subsets” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)).
As discussed above, the element of “and updating the machine learning model based on the full dimensional gradient vectors”, which can be characterized as insignificant extra solution activity that is well understood routine and conventional. See MPEP 2106.05(d)(II) example (ii) provides that performing repetitive calculations has been understood by the courts to be well-understood, routine and conventional. 
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.

Claim 12
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “the grouping structure is global to the machine learning model”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “the grouping structure is global to the machine learning model” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 13
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “the grouping structure is local to a weight matrix at each neural network layer of the machine learning model”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “the grouping structure is local to a weight matrix at each neural network layer of the machine learning model” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 14
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “the grouping structure is local to a column of a weight matrix at each neural network layer of the machine learning model”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “the grouping structure is local to a column of a weight matrix at each neural network layer of the machine learning model” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 15
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “the grouping structure is different for different communication rounds of the federated learning”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “the grouping structure is different for different communication rounds of the federated learning” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 16
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “in which the interleaving pattern to the plurality of UEs enables the plurality of UEs to interleave the gradient vector parameters prior to grouping the gradient vector parameters”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the additional element of “in which the interleaving pattern to the plurality of UEs enables the plurality of UEs to interleave the gradient vector parameters prior to grouping the gradient vector parameters” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 17
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “in which the sampling pattern to the plurality of UEs enables the plurality of UEs to sample the gradient vector parameters”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the additional element of “in which the sampling pattern to the plurality of UEs enables the plurality of UEs to sample the gradient vector parameters” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 18
Claim 18: is substantially similar to claim 1 and therefore is rejected on similar grounds as claim 1.

Claim 19
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “the plurality of subsets each include a number of parameters, the number of parameters being global to the machine learning model”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “the plurality of subsets each include a number of parameters, the number of parameters being global to the machine learning model” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 20
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “the plurality of subsets each include a number of parameters, the number of parameters being local to a weight matrix at each neural network layer of the machine learning model”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “the plurality of subsets each include a number of parameters, the number of parameters being local to a weight matrix at each neural network layer of the machine learning model” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 21
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “the plurality of subsets each include a number of parameters, the number of parameters being local to a column of a weight matrix at each neural network layer of the machine learning model”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “the plurality of subsets each include a number of parameters, the number of parameters being local to a column of a weight matrix at each neural network layer of the machine learning model” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 22
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim recites in part: 
“to adaptively adjust a learning rate to reduce a level of distortion resulting from a mismatch between computed gradients and transmitted gradients”. Paragraph [0082] of the instant specification states “A learning rate is a tuning parameter in an optimization algorithm, such as gradient descent, used for training machine learning models. According to aspects of the present disclosure, the learning rate can be adaptively adjusted to reduce or even minimize the distortion due to not transmitting the full gradient vector. For example, the learning rate may be adjusted to reduce a level of distortion resulting from a mismatch between computed gradients and transmitted gradients”, but does not appear to explicitly define adaptively adjusting a learning rate. As drafted and under its BRI in light of the instant specification, this limitation falls under the abstract idea of a mental process. As drafted and under its BRI in light of the instant specification, this limitation encompasses evaluating or judging modifications (equivalently, adjustments) of the learning rate parameter, such as increasing or decreasing the value based on the measured distance between two sets of values.
Because the claim recites a limitation which can practically be implemented as mental processes and/or mathematical calculations, the claim recites an abstract idea.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “due to grouping the set of gradient vector parameters”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “due to grouping the set of gradient vector parameters” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 23
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim recites in part: 
“to compute a difference between true gradients and the transmitted representative values”. Paragraph [0086] of the instant specification states “Gradient accumulation refers to finding a mismatch between the computed and transmitted gradients in each UE at time t (or in the current round) and adding this difference to the computed gradient for the related UE at time t+1 (or in the next round)”, but does not appear to explicitly define computing a difference between gradients. As drafted and under its BRI in light of the instant specification, this limitation encompasses performing comparison operations, which are reasonably understood to be directed to a mental concept (i.e., evaluation) based on a mathematical concept (i.e., mathematical calculations) and thus falls under the abstract idea of a mental process based on a mathematical concept (i.e., mathematical relationships, mathematical formulas or equations, and mathematical calculations). As drafted and under its BRI, this limitation encompasses determining a measure of distance between numerical values.
“and to add the difference to a next gradient vector”. As drafted and under its BRI in light of the instant specification, this limitation falls under the abstract idea of a mathematical calculation. As drafted and under its BRI in light of the instant specification, this limitation encompasses adding a numerical value to a numerical vector.
Because the claim recites limitations which can practically be implemented as mental processes and/or mathematical calculations, the claim recites an abstract idea.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “for a second communication round of the federated learning”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “for a second communication round of the federated learning” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 24
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim recites in part: 
“to group the gradient vector parameters of the machine learning model into different subsets with a different grouping pattern”. As drafted and under its BRI in light of the instant specification, this limitation falls under the abstract idea of a mental process. As drafted and under its BRI in light of the instant specification, this limitation encompasses partitioning or clustering numerical parameters based on a specified pattern.
Because the claim recites a limitation which can practically be implemented as mental processes and/or mathematical calculations, the claim recites an abstract idea.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “for a second communication round”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “for a second communication round” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 25
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim recites in part: 
“to group the gradient vector parameters of the machine learning model sequentially”. As drafted and under its BRI in light of the instant specification, this limitation falls under the abstract idea of a mental process. As drafted and under its BRI in light of the instant specification, this limitation encompasses partitioning or clustering numerical parameters sequentially.
Because the claim recites a limitation which can practically be implemented as mental processes and/or mathematical calculations, the claim recites an abstract idea.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “for a random access channel (RACH) communication round”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “for a random access channel (RACH) communication round” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.


Claim 26
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim recites in part: 
“to interleave the gradient vector parameters”. As drafted and under its BRI in light of the instant specification, this limitation falls under the abstract idea of a mental process. As drafted and under its BRI in light of the instant specification, this limitation encompasses ordering the numerical gradient vector parameters in a specified manner.
Because the claim recites a limitation which can practically be implemented as mental processes and/or mathematical calculations, the claim recites an abstract idea.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional elements of “prior to grouping the gradient vector parameters” and “an interleaving pattern determined deterministically in accordance with a function of known parameters”. Such limitations amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the elements of “prior to grouping the gradient vector parameters” and “an interleaving pattern determined deterministically in accordance with a function of known parameters” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 27
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “to sample the gradient vector parameters”, which is recited at a high level of generality and amounts to extra-solution activity of collecting data, i.e., pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).
In particular, the claim further recites the additional element of “a sampling pattern determined deterministically in accordance with a function of known parameters”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “to sample the gradient vector parameters” is recited at a high level of generality and amounts to extra-solution activity of collecting data, i.e., pre-solution activity of gathering data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
As discussed above, the additional element of “a sampling pattern determined deterministically in accordance with a function of known parameters” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)).
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.

Claim 28
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
Claim 28 recites the following mental processes, that in each case under the BRI as written and in light of the instant specification, covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components (e.g., “apparatus for wireless communication”, “network entity”, “memory”):
“to reconstruct full dimensional gradient vectors based on the representative values”. Paragraph [0081] of the instant specification states “Because the parameter server is aware of the grouping structure, the parameter server can reconstruct the full dimensional gradient vector from the received signal”, but does not appear to explicitly define reconstructing full dimensional gradient vectors based on the received values. As drafted and under its BRI in light of the instant specification, this limitation falls under the abstract idea of a mental process. As drafted and under its BRI in light of the instant specification, this limitation encompasses duplicating the representative value once for each item in its corresponding group (or cluster) of gradient parameters within the user equipment.
“and to update the machine learning model based on the full dimensional gradient vectors”. Paragraph [0007] of the instant specification states “The method includes updating the machine learning model based on the full dimensional gradient vectors” and paragraph [0073] of the instant specification states “Between each layer 556, 558, 560, 562, 564 of the deep convolutional network 550 are weights (not shown) that are to be updated”, but does not appear to explicitly define updating the machine learning model. As drafted and under its BRI in light of the instant specification, this limitation encompasses performing update operations for numerical parameters within a defined model, which are reasonably understood to be directed to a mental concept (i.e., optimization) based on a mathematical concept (i.e., mathematical calculations) and thus falls under the abstract idea of a mental process based on a mathematical concept (i.e., mathematical relationships, mathematical formulas or equations, and mathematical calculations). As drafted and under its BRI, this limitation encompasses modifying numerical parameters by applying mathematical functions with pre-computed data.
Because the claim recites a limitation which can practically be implemented as mental processes, the claim recites an abstract idea.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional elements of “to transmit, to a plurality of user equipment (UEs), a machine learning model for federated learning” and “to transmit, to the plurality of UEs, a grouping structure to enable the plurality of UEs to group sets of gradient vector parameters for the machine learning model into a plurality of subsets” which are both recited at a high level of generality and amounts to extra-solution activity of transmitting data, i.e., pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).
In particular, the claim recites the additional element of “to receive, from each of the plurality of UEs in an analog over the air (OTA) aggregated signal on a multiple access channel, representative values for each of the plurality of subsets”, which is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e., pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)). The examiner notes that, as written, the recitation of signal format is directed to requiring the values are in the specified format, not to performing translation of the format or performing aggregation of the values.
In particular, the claim recites the additional element of “from each of the plurality of UEs in an analog over the air (OTA) aggregated signal on a multiple access channel, representative values for each of the plurality of subsets”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
In particular, the claim recites the additional element of “and to update the machine learning model based on the full dimensional gradient vectors”, which can be characterized as insignificant extra solution activity that is well understood routine and conventional. See MPEP 2106.05(g). This additional element repeats mental steps (based on mathematical concepts), as discussed above regarding the mental process based on a mathematical computation to update the machine learning model by performing mathematical computations between the model’s parameters and pre-computed numerical values.
Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the elements of “to transmit, to a plurality of user equipment (UEs), a machine learning model for federated learning” and “to transmit, to the plurality of UEs, a grouping structure to enable the plurality of UEs to group sets of gradient vector parameters for the machine learning model into a plurality of subsets” are recited at a high level of generality and amount to extra-solution activity of transmitting data, i.e., pre-solution activity of gathering data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
As discussed above, the element of “to receive, from each of the plurality of UEs in an analog over the air (OTA) aggregated signal on a multiple access channel, representative values for each of the plurality of subsets” is recited at a high level of generality and amounts to extra-solution activity of transmitting data, i.e., pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
As discussed above, the element of “from each of the plurality of UEs in an analog over the air (OTA) aggregated signal on a multiple access channel, representative values for each of the plurality of subsets” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)).
As discussed above, the element of “and to update the machine learning model based on the full dimensional gradient vectors”, which can be characterized as insignificant extra solution activity that is well understood routine and conventional. See MPEP 2106.05(d)(II) example (ii) provides that performing repetitive calculations has been understood by the courts to be well-understood, routine and conventional. 
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.

Claim 29
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “the grouping structure is global to the machine learning model”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “the grouping structure is global to the machine learning model” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.

Claim 30
Step 2A, prong 1 – Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea?
The claim does not recite additional laws of nature, natural phenomenon, or abstract ideas.
Step 2A, prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
In particular, the claim recites the additional element of “the grouping structure is local to a weight matrix at each neural network layer of the machine learning model”. Such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." (Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981)). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.
Accordingly, at Step 2A, prong two, the additional element individually or in combination does not integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the element of “the grouping structure is local to a weight matrix at each neural network layer of the machine learning model” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which is not significantly more than the judicial exception. (See MPEP 2106.05(h)). 
Accordingly, at Step 2B, the additional element individually or in combination does not amount to significantly more than the judicial exception.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

FEDZIP and CADSGD and Ren
Claims 1-3, 6-7, 9-10, 12-13, and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over “FEDZIP: A COMPRESSION FRAMEWORK FOR COMMUNICATION-EFFICIENT FEDERATED LEARNING” by Amirhossein Malekijoo et al., referenced herein as FEDZIP, in view of “Federated Learning over Wireless Fading Channels” by Mohammad Mohammadi Amiri et al., referenced herein as CADSGD, in view of “US20250141758A1” by Yuwei REN et al., referenced herein as REN, in view of Kobayashi (US20220175331A1).

Claim 1
FEDZIP teaches “A method of wireless communication for federated learning, by a user equipment (UE), comprising:” (page 1, ABSTRACT, FEDZIP: “Federated learning… decentralized machine learning (especially deep learning) for wireless devices… It assigns the learning process independently to each client… In this work, we propose a novel framework, FedZip, that significantly decreases the size of the updates while transferring weights from the deep learning model between clients and their servers”; Examiner’s Note (EN): Paragraph [0034] of the instant specification states “UEs 120(e.g., 120a, 120b, 120c) may be dispersed throughout the wireless network 100, and each UE may be stationary or mobile. A UE may also be referred to as an access terminal, a terminal, a mobile station, a subscriber unit, a station, and/or the like. A UE may be a cellular phone (e.g., a smart phone), a personal digital assistant (PDA), a wireless modem, a wireless communications device, a handheld device, a laptop computer, a cordless phone, a wireless local loop (WLL) station, a tablet, a camera, a gaming device, a netbook, a smartbook, an ultrabook, a medical device or equipment, biometric sensors/devices, wearable devices (smart watches, smart clothing, smart glasses, smart wrist bands, smart jewelry (e.g., smart ring, smart bracelet)), an entertainment device (e.g., a music or video device, or a satellite radio), a vehicular component or sensor, smart meters/sensors, industrial manufacturing equipment, a global positioning system device, or any other suitable device that is configured to communicate via a wireless or wired medium”, but does not appear to explicitly define a user equipment. As written and in light of the instant specification, the BRI of a user equipment encompasses a client, or “client device” (page 2, section 1, paragraph 3, FEDZIP) as taught by FEDZIP. As written and in light of the instant specification, the BRI of wireless communication includes transferring data between wireless devices).
FEDZIP further teaches “receiving, from a network entity, a machine learning model for federated learning” (page 3, figure 1, FEDZIP: “
    PNG
    media_image1.png
    533
    1202
    media_image1.png
    Greyscale
”; (EN): As written and in light of the instant specification, the BRI of receiving a machine learning model from a network entity encompasses a cloud server which broadcasts a machine learning model to the client device, where the network entity corresponds to the cloud server).
FEDZIP further teaches “computing a set of gradient vector parameters during a first communication round of the federated learning for the machine learning model using a local dataset” (page 2, section 1, paragraph 3, FEDZIP: “FL clients run a DNN model locally, employing the raw data stored locally, and builds a classifier model, an approach called local learning… improving accuracy in each round of communication”; (EN): [0025] of the instant specification states “For example, at each communication round of the federated learning process, a parameter server, such as a base station, selects a number of users and sends a copy of a global machine learning model”, but does not appear to explicitly define a first communication round. FEDZIP specifies that “A major novelty of our framework is compressing bits per communication round” (page 7, section 4, paragraph 1, FEDZIP), demonstrating that FEDZIP performs the process for every communication round, which would necessarily include the first communication round. A person of reasonable skill in the art will appreciate that building a classifier model includes computing gradients, including FEDZIP’s definition of a local weight update in Equation 2 (                        
                            
                                
                                    w
                                
                                
                                    t
                                    +
                                    1
                                
                            
                        
                                             
                            ←
                            
                                
                                    w
                                
                                
                                    t
                                
                            
                            -
                             
                            η
                        
                                             
                            
                                
                                    ∑
                                    
                                        m
                                        ∈
                                        
                                            
                                                S
                                            
                                            
                                                t
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    m
                                                
                                            
                                        
                                        
                                            n
                                        
                                    
                                
                            
                        
                    )                        
                            
                                
                                    
                                        
                                            w
                                        
                                        
                                            t
                                        
                                    
                                    -
                                    
                                        
                                            w
                                        
                                        
                                            t
                                            +
                                            1
                                            ,
                                            m
                                        
                                    
                                
                            
                        
                    ), which is explicitly reliant on the gradient (                        
                            
                                
                                    (
                                    w
                                
                                
                                    t
                                
                            
                            -
                            
                                
                                    w
                                
                                
                                    t
                                    +
                                    1
                                    ,
                                    m
                                
                            
                            )
                        
                    ) (page 6, section 3, Equation 2, FEDZIP). As written and in light of the instant specification, the BRI of computing gradients in a first round of communication encompasses creating a classifier model in each round of communication).
FEDZIP further teaches “grouping the set of gradient vector parameters of the machine learning model into a plurality of subsets” (page 7, section 4, paragraph 1, FEDZIP: “FedZip quantizes each tensor in                         
                            Δ
                            w
                        
                    . We use a statistical quantization form, in which each tensor includes biases and weights (separately for each layer of the model) is quantized by a k-mean clustering algorithm. We use clustering”; (EN): FEDZIP specifies that in cases which “start from the same global model (i.e. the weights which initialize the clients in each round), averaging the gradients is equivalent to averaging the weights themselves” (page 3, section 2.1, paragraph 4, FEDZIP), due to the relationship between gradient and weight vectors. As written and in light of the instant specification, grouping tensors which express the weight and bias values for the layer of the model is encompassed by the BRI of grouping sets of gradients of the model).
FEDZIP further teaches “computing a representative value of all gradients within each of the plurality of subsets to obtain representative values for each of the plurality of subsets” (page 7, section 4.2, paragraph 1, FEDZIP: “We use clustering because the centroids of each cluster offer a reasonable reflection of the distribution for the tensor’s value… converting the value of a tensor to a number of clusters and using the centroid of those clusters as the best representative”; and page 9, section 4.3, paragraph 1, FEDZIP: “Our Quantization, which is achieved through clustering, results in                         
                            Δ
                            w
                        
                     represented by three centroids; each element is clustered and represented by one of the centroids”). 
FEDZIP further teaches “and transmitting the representative values to the network entity for the first communication round of the federated learning” (page 3, figure 1, FEDZIP: as attached above; and [age 8, algorithm 1, lines 20-22, FREDZIP: “encoding (                        
                            Δ
                            w
                        
                    ):                         
                            m
                            s
                            
                                
                                    g
                                
                                
                                    t
                                    +
                                    1
                                
                                
                                    m
                                
                            
                            ,
                            
                                
                                    Θ
                                
                                
                                    t
                                    +
                                    1
                                
                                
                                    m
                                
                            
                            ←
                            e
                            n
                            c
                            o
                            d
                            i
                            n
                            g
                            
                                
                                    w
                                
                            
                        
                     return                         
                            m
                            s
                            
                                
                                    g
                                
                                
                                    t
                                    +
                                    1
                                
                                
                                    m
                                
                            
                            ,
                            
                                
                                    Θ
                                
                                
                                    t
                                    +
                                    1
                                
                                
                                    m
                                
                            
                        
                     to the Server”; (EN): As discussed above, the cloud server is encompassed by the BRI of the network entity, as written and in light of the specification. Further, the Server is encompassed by the BRI of the network entity, as written and in light of the instant specification).
FEDZIP does not appear to explicitly disclose “and transmitting the representative values to the network entity for the first communication round of the federated learning in an analog over the air (OTA) signal on a multiple access channel for OTA aggregation”.
However, analogous art CADSGD provides this additional functionality by teaching “and transmitting the representative values to the network entity for the first communication round of the federated learning in an analog over the air (OTA) signal on a multiple access channel for OTA aggregation” (page 4, bottom paragraph, CADSGD: “We then study analog transmission from the devices to the PS motivated by the signal-superposition property of the wireless MAC… the gradient estimate sent by each device… analog over-the-air computation”; and page 7, second paragraph, CADSGD: “We remark that the goal is to recover the average of the local gradient estimates of the devices at the PS, which is a distributed lossy computation problem over a noisy MAC… an analog transmission approach, where the gradients are transmitted simultaneously over the wireless MAC in an analog fashion… Analog transmission has been well studied”; (EN): The gradient estimate sent by each device corresponds to the representative value. Averaging is a form of aggregation).
FEDZIP and CADSGD are analogous art because they are from the same field of endeavor as the claimed invention, federated learning. FEDZIP teaches transmitting the representative values to the network entity for the first communication round of the federated learning, but does not appear to distinctly disclose transmitting the representative values to the network entity for the first communication round of the federated learning in an analog over the air (OTA) signal on a multiple access channel for OTA aggregation as taught by CADSGD. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of FEDZIP with CADSGD’s analog transmission on a multiple access channel for aggregation because “Numerical results for the MNIST classification task show that the proposed CA-DSDG scheme improves upon the other analog and digital schemes under consideration with the same average power constraint and bandwidth resources, with the improvement more significant when the datasets across devices are non-independent and identically distributed (i. i. d.). Its performance is also shown to be robust against imperfect channel state information (CSI) at the devices… This highlights the energy efficiency of over-the-air computation, and makes it particularly attractive for FL across low-power IoT sensors” (pages 4-5, bottom paragraph of page-first paragraph of page, CADSGD), as suggested by CADSGD.
FEDZIP is not relied upon to explicitly teach “receiving, from the network entity, a grouping configuration for grouping a quantity of parameters in each layer of the machine learning model”. However, KOBAYASHI teaches “receiving, from the network entity, a grouping configuration for grouping a quantity of parameters in each layer of the machine learning model” ([0078] For learning of a convolutional neural network (CNN) 303, for example, it is preferable that learning is performed before introduction to the use environment of the user, and the parameter group of the learned CNN 303 is obtained in advance. The examiner notes that FEDZIP and KOBAYASHI are both directed to machine learning and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified FEDZIP’s data collection to incorporate “receiving, from the network entity, a grouping configuration for grouping a quantity of parameters in each layer of the machine learning model” as taught by KOBAYASHI [0078] in order to train a CNN before introduction to the use environment of the user [0078].)
FEDZIP is not relied upon to explicitly teach “in accordance with the grouping configuration received from the network entity”. However, Ren teaches “in accordance with the grouping configuration received from the network entity” ([0117] In some aspects, the UEs may follow one or more rules, or criteria, to determine how to form a cluster or UE group. As an example, the UEs 902 and 904 may form the UE group at 916 based on the exchange of one or more sidelink messages, whereas the signaling with the network may be based on an access link, or Uu link. As another example, the UE 902 may join a UE group with the UE 904 through the exchange of one or more sidelink messages. FIGS. 11 and 12 illustrate example aspects of signaling in order to join, or form, a UE group for model update reporting. The examiner notes that FEDZIP and Ren are both directed to machine learning and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified FEDZIP’s clustering to incorporate “in accordance with the grouping configuration received from the network entity” as taught by Ren [0117] to determine how to form a cluster or UE group [0117].)

Claim 2
FEDZIP teaches “The method of claim 1”, as discussed above.
FEDZIP further teaches “in which the plurality of subsets each include a number of parameters, the number of parameters being global to the machine learning model” (page 7, section 4.2, paragraph 2, FEDZIP: “To identify the optimal number of clusters, we use the silhouette index and identify                         
                            k
                            =
                            3
                        
                     as the best value for the k-mean hyper-parameter for both datasets”; (EN): Paragraph [0049] of the instant specification states “system parameters associated with a computational device (e.g., neural network with weights)”, but does not appear to explicitly disclose a parameter, a number of parameters, or global parameter(s). As written and in light of the instant specification, a system specification which is shared across user devices, such as the value defining the number of clusters (                        
                            k
                        
                     of FEDZIP), is encompassed by the BRI of a global parameter).

Claim 3
FEDZIP teaches “The method of claim 1”, as discussed above.
FEDZIP further teaches “in which the plurality of subsets each include a number of parameters, the number of parameters being local to a weight matrix at each neural network layer of the machine learning model” (page 7, section 4.2, paragraph 1, FEDZIP: “We use a statistical quantization form, in which each tensor includes biases and weights (separately for each layer of the model) is quantized by a k-mean clustering algorithm. We use clustering because the centroids of each cluster offer a reasonable reflection of the distribution for the tensor’s value”; (EN): Cluster centroids, which are computed separately for each layer, is encompassed by the BRI, as written and in light of the instant specification, of a parameter being local to a weight matrix at each neural network layer of the model).

Claim 6
FEDZIP teaches “The method of claim 1”, as discussed above.
CADSGD further teaches “further comprising: computing a difference between true gradients and the transmitted representative values” (page 4, second paragraph of page, CADSGD: “With CA-DSGD, we exploit the similarity in the sparsity patterns of the gradient estimates at different devices to speed up the computations, where each device projects its gradient estimate to a low-dimensional vector and transmits only the important gradient entries while accumulating the error”; (EN): Accumulating an error corresponding to an estimation of reduced dimensionality gradients is encompassed by the BRI, as written and in light of the instant specification, of computing a difference between true gradient values and the transmitted values which represent the gradients).
CADSGD further teaches “and adding the difference to a next gradient vector for a second communication round of the federated learning” (pages 8-9, bottom paragraph of page-first paragraph of page, CADSGD: “the gradient estimate                         
                            
                                
                                    g
                                
                                
                                    m
                                
                            
                            
                                
                                    
                                        
                                            Θ
                                        
                                        
                                            t
                                        
                                    
                                
                            
                        
                    , computed at device                         
                            m
                        
                    , is added to the error accumulated from previous iterations”; (EN): Adding a computed measure of gradient error to the next iteration is reasonably understood to be equivalent to adding the current gradient to the computed measure of gradient error from the previous iteration).
FEDZIP and CADSGD are analogous art because they are from the same field of endeavor as the claimed invention. FEDZIP teaches a method for federated learning which compresses updates via clustering with representative values, but does not appear to distinctly disclose adding a distance between computed and transmitted values, or a measure of error, from previous iterations to gradient estimations in current iterations. CADSGD provides the additional functionality by disclosing an accumulation of error across gradient estimations and utilizing the accumulated error in subsequent iterations. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of FEDZIP with CADSGD’s method of error aggregation and adjustment because the “CA-DSGD scheme provides the flexibility of adjusting the dimension of the gradient estimate sent by each device, which is particularly important for bandwidth-limited wireless channels, where the bandwidth available for transmission may not be sufficient to send the entire gradient vector at a single time slot” (page 4, second paragraph of page, CADSGD), as suggested by CADSGD.

Claim 7
FEDZIP teaches “The method of claim 1”, as discussed above.
FEDZIP further teaches “further comprising grouping the gradient vector parameters of the machine learning model into different subsets with a different grouping pattern for a second communication round” (page 7, section 4.2, paragraph 1, FEDZIP: “We use a statistical quantization form, in which each tensor includes biases and weights (separately for each layer of the model) is quantized by a k-mean clustering algorithm. We use clustering because the centroids of each cluster offer a reasonable reflection of the distribution for the tensor’s value”; (EN): Paragraph [0083] of the instant specification states “In some aspects of the present disclosure, different grouping patterns may be used at different communication rounds. For example, a base station may signal several different grouping patterns, and at each round, the base station indicates which grouping pattern to use. The grouping pattern may be determined deterministically as a function of the round index or other known parameters”, but does not appear to explicitly define a different grouping pattern. As such, a grouping of different numerical values would lead to updated clusters, resulting in differing calculated centroids. By determining cluster centroids for each layer of the model for every round, FEDZIP teaches the BRI, as written and in light of the instant specification, of a different grouping pattern for the second round of communication).

Claim 9
FEDZIP teaches “The method of claim 1”, as discussed above.
FEDZIP further teaches “further comprising interleaving the gradient vector parameters prior to grouping the gradient vector parameters” (page 8, section 4.2, equation 5, FEDZIP: “
    PNG
    media_image2.png
    69
    883
    media_image2.png
    Greyscale
”; (EN): Paragraph [0084] of the instant specification states “In some aspects of the present disclosure, interleaving may be applied to the entries of the gradient vector, followed by grouping of consecutive entries. The interleaving pattern may change across rounds. In some aspects, the interleaving pattern may be indicated by the base station. In other aspects, the interleaving pattern may be derived deterministically as a function of known parameters. Interleaving may be confined within each layer or may be across layers”, but does not appear to explicitly define interleaving the gradient values. As written and in light of the instant specification, the BRI of this limitation is directed to ordering the values before grouping, such as by processing each parameter one by one when associating a representative value).
FEDZIP further teaches “an interleaving pattern determined deterministically in accordance with a function of known parameters” (page 8, section 4.2, equation 5, FEDZIP: “
    PNG
    media_image2.png
    69
    883
    media_image2.png
    Greyscale
”; (EN): As discussed above, the instant specification does not appear to explicitly define interleaving the gradient values. Thus, as written and in light of the instant specification, this claim limitation encompasses an interleaving as a sequential processing of known model parameters, otherwise known as deterministically determined in accordance with a function of known parameters).

Claim 10
FEDZIP teaches “The method of claim 1”, as discussed above.
FEDZIP further teaches “further comprising sampling the gradient vector parameters” (page 8, algorithm 1, lines 14-16, FEDZIP: “                        
                            B
                             
                            ←
                        
                     (split                         
                            
                                
                                    P
                                
                                
                                    m
                                
                            
                        
                     into batches of size                         
                            B
                        
                    ) for local epoch i from 1 to E do for batch                         
                            b
                             
                            ∈
                            B
                        
                     do”).
FEDZIP further teaches “a sampling pattern determined deterministically in accordance with a function of known parameters” (page 5, section 3.1, paragraph 2, FEDZIP: “Element                         
                            B
                        
                     is the local mini-batch size and describes the amount of the local dataset that is covered.                         
                            B
                            =
                            ∞
                        
                     means all data points are fed to the model”; (EN): As written and in light of the instant specification, sampling all i data elements is encompassed by the BRI of a sampling pattern determined deterministically in accordance with a function of known parameters, where the number of data elements if the know parameter).

Claim 12
FEDZIP teaches “The method of claim 11”, as discussed above.
FEDZIP further teaches “in which the grouping structure is global to the machine learning model” (page 7, section 4.2, paragraph 2, FEDZIP: “To identify the optimal number of clusters, we use the silhouette index and identify                         
                            k
                            =
                            3
                        
                     as the best value for the k-mean hyper-parameter for both datasets”; (EN): Using the same number of clusters for each machine is encompassed by the BRI of a global grouping structure, as written and in light of the instant specification).

Claim 13
FEDZIP teaches “The method of claim 11”, as discussed above.
FEDZIP further teaches “in which the grouping structure is local to a weight matrix at each neural network layer of the machine learning model” (page 7, section 4.2, paragraph 1, FEDZIP: “We use a statistical quantization form, in which each tensor includes biases and weights (separately for each layer of the model) is quantized by a k-mean clustering algorithm. We use clustering because the centroids of each cluster offer a reasonable reflection of the distribution for the tensor’s value”; (EN): Cluster centroids, which are computed separately for each layer, is encompassed by the BRI, as written and in light of the instant specification, of a grouping structure being local to a weight matrix at each neural network layer of the model).

Claim 15
FEDZIP teaches “The method of claim 11”, as discussed above.
FEDZIP further teaches “in which the grouping structure is different for different communication rounds of the federated learning” (page 7, section 4.2, paragraph 1, FEDZIP: “We use a statistical quantization form, in which each tensor includes biases and weights (separately for each layer of the model) is quantized by a k-mean clustering algorithm. We use clustering because the centroids of each cluster offer a reasonable reflection of the distribution for the tensor’s value”; (EN): Paragraph [0083] of the instant specification states “In some aspects of the present disclosure, different grouping patterns may be used at different communication rounds. For example, a base station may signal several different grouping patterns, and at each round, the base station indicates which grouping pattern to use. The grouping pattern may be determined deterministically as a function of the round index or other known parameters”, but does not appear to explicitly define a different grouping pattern. As such, a grouping of different numerical values would lead to updated clusters, resulting in differing calculated centroids. By determining cluster centroids for each layer of the model for every round, FEDZIP teaches the BRI, as written and in light of the instant specification, of different grouping patterns for different rounds of communication).

Claim 16
FEDZIP teaches “The method of claim 11”, as discussed above.
FEDZIP further teaches “in which the interleaving pattern to the plurality of UEs enables the plurality of UEs to interleave the gradient vector parameters prior to grouping the gradient vector parameters” (page 8, section 4.2, equation 5, FEDZIP: “
    PNG
    media_image2.png
    69
    883
    media_image2.png
    Greyscale
”; (EN): Paragraph [0084] of the instant specification states “In some aspects of the present disclosure, interleaving may be applied to the entries of the gradient vector, followed by grouping of consecutive entries. The interleaving pattern may change across rounds. In some aspects, the interleaving pattern may be indicated by the base station. In other aspects, the interleaving pattern may be derived deterministically as a function of known parameters. Interleaving may be confined within each layer or may be across layers”, but does not appear to explicitly define an interleaving pattern for the gradient values. As written and in light of the instant specification, the BRI of this limitation is directed to enabling each user equipment to order the values before grouping, such as by processing each parameter one by one when associating a representative value. The examiner further notes that the number of gradient vectors may similarly be understood as encompassed by the BRI of an interleaving pattern as written and in light of the instant specification).

Claim 17
FEDZIP teaches “The method of claim 11”, as discussed above.
FEDZIP further teaches “in which the sampling pattern to the plurality of UEs enables the plurality of UEs to sample the gradient vector parameters” (page 6, section 3.1, paragraph 4, FEDZIP: “In FedAvg, we consider the element                         
                            B
                        
                     as a finite batch (e.g.                         
                            B
                            =
                        
                     32 data points)”; (EN): Specifying a number of elements to include in the batch is encompassed by the BRI of a sampling pattern as written and in light of the instant specification).

FEDZIP, CADSGD, REN, and SURVEY
Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of FEDZIP, CADSGD, and REN, in view of “An [sic] Survey of Neural Network Compression” by James T. O’Neill, referenced herein as SURVEY.

Claim 4
FEDZIP teaches “The method of claim 1”, as discussed above.
FEDZIP further teaches “at each neural network layer of the machine learning model” (page 7, section 4.2, paragraph 1, FEDZIP: “separately for each layer of the model”). 
FEDZIP does not appear to explicitly disclose “in which the plurality of subsets each include a number of parameters, the number of parameters being local to a column of a weight matrix at each neural network layer of the machine learning model”.
However, in the same field, analogous art SURVEY provides this additional functionality by teaching “in which the plurality of subsets each include a number of parameters, the number of parameters being local to a column of a weight matrix at each neural network layer of the machine learning model” (page 31, section 4.1, paragraph 2, SURVEY: “The Khatri-Rao product between two matrices                         
                            A
                            ∈
                            
                                
                                    R
                                
                                
                                    I
                                    ×
                                    K
                                
                            
                        
                     and                         
                            B
                            ∈
                            
                                
                                    R
                                
                                
                                    J
                                    ×
                                    K
                                
                            
                        
                    … corresponds to the column-wise Kronecker product”’ (EN): A column-wise parameter is reasonably understood to be encompassed by the BRI of parameters being local to a column of a weight matrix, as written and in light of the instant specification).
FEDZIP and SURVEY are analogous art because they are from the same field of endeavor as the claimed invention, namely compression for neural network approaches to machine learning. FEDZIP teaches a method for federated learning which compresses updates via clustering with representative values, but does not appear to distinctly disclose a parameter which is local to the column of the weight matrix for each layer of the model. SURVEY provides the additional functionality by disclosing the Khatri-Rao product. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of FEDZIP with SURVEY’s method of tensor decomposition because “Overparameterized networks trained to convergence have shown impressive performance in domains such as computer vision and natural language processing. Pushing state of the art on salient tasks within these domains corresponds to these models becoming larger and more difficult for machine learning practitioners to use given the increasing memory and storage requirements, not to mention the larger carbon footprint. Thus, in recent years there has been a resurgence in model compression techniques” (page 1, ABSTRACT, SURVEY), where this method supports practitioners in “Generalizing A to higher order tensors” (page 31, section 4.1, paragraph 1, SURVEY), as suggested by SURVEY.

Claim 14
FEDZIP teaches “The method of claim 11”, as discussed above. 
FEDZIP further teaches “at each neural network layer of the machine learning model” (page 7, section 4.2, paragraph 1, FEDZIP: “separately for each layer of the model”).
FEDZIP does not appear to explicitly disclose “in which the grouping structure is local to a column of a weight matrix”.
However, in the same field, analogous art SURVEY provides this additional functionality by teaching “in which the grouping structure is local to a column of a weight matrix” (page 31, section 4.1, paragraph 2, SURVEY: “The Khatri-Rao product between two matrices                         
                            A
                            ∈
                            
                                
                                    R
                                
                                
                                    I
                                    ×
                                    K
                                
                            
                        
                     and                         
                            B
                            ∈
                            
                                
                                    R
                                
                                
                                    J
                                    ×
                                    K
                                
                            
                        
                    … corresponds to the column-wise Kronecker product”’ (EN): A column-wise parameter is reasonably understood to be encompassed by the BRI of parameters being local to a column of a weight matrix, as written and in light of the instant specification).
FEDZIP and SURVEY are analogous art because they are from the same field of endeavor as the claimed invention, namely compression for neural network approaches to machine learning. FEDZIP teaches a method for federated learning which compresses updates via clustering with representative values, but does not appear to distinctly disclose a parameter which is local to the column of the weight matrix for each layer of the model. SURVEY provides the additional functionality by disclosing the Khatri-Rao product. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of the combination of FEDZIP with SURVEY’s method of tensor decomposition because “Overparameterized networks trained to convergence have shown impressive performance in domains such as computer vision and natural language processing. Pushing state of the art on salient tasks within these domains corresponds to these models becoming larger and more difficult for machine learning practitioners to use given the increasing memory and storage requirements, not to mention the larger carbon footprint. Thus, in recent years there has been a resurgence in model compression techniques” (page 1, ABSTRACT, SURVEY), where this method supports practitioners in “Generalizing A to higher order tensors” (page 31, section 4.1, paragraph 1, SURVEY), as suggested by SURVEY.

FEDZIP, CADSGD, REN, and RADAM
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of FEDZIP, CADSGD, and REN, in view of “ON THE VARIANCE OF THE ADAPTIVE LEARNING RATE AND BEYOND” by Liyuan Liu et al., referenced herein as RADAM.

Claim 5
FEDZIP “The method of claim 1”, as discussed above.
FEDZIP does not appear to explicitly teach “to reduce a level of distortion resulting from a mismatch between computed gradients and transmitted gradients due to grouping the set of gradient vector parameters”.
However, analogous art described in FEDZIP’s other embodiments provides this additional functionality by teaching “to reduce a level of distortion resulting from a mismatch between computed gradients and transmitted gradients due to grouping the set of gradient vector parameters” (page 4, section 2.2, paragraph 4, FEDZIP: “carrying the error from each previous round (e.g. errors in the round one and two will be carried to round three) to ensure high convergence speed while still considering all gradients during model training”; (EN): A person of reasonable skill in the art will appreciate that, as written and in light of the instant specification, the BRI of reducing a level of distortion when processing gradients encompasses utilizing previous measures of error to ensure high convergence speed while processing all gradients).
FEDZIP’s embodiments are analogous art because they are from the same field of endeavor as the claimed invention, namely optimization for federated learning. The combination of the first embodiment of FEDZIP and CADSGD teaches a method for federated learning which compresses updates via clustering with representative values, but does not appear to distinctly disclose reducing a level of distortion of the gradients. FEDZIP’s other embodiment provides the additional functionality by disclosing a method which carries error from each round of communication. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of the combination of the first embodiment of FEDZIP and CADSGD with this second embodiment of FEDZIP’s method of gradient error maintenance in order “to ensure high convergence speed while still considering all gradients during model training” (page 4, section 2.2, paragraph 4, FEDZIP), as suggested by FEDZIP’s second embodiment.
FEDZIP does not appear to explicitly teach “further comprising adaptively adjusting a learning rate”.
However, analogous art RADAM provides this additional functionality by teaching “further comprising adaptively adjusting a learning rate” (page 2, section 2, paragraphs 2-3, RADAM: “Instead of setting the learning rate                         
                            
                                
                                    α
                                
                                
                                    t
                                
                            
                        
                     as a constant or in decreasing order, a learning rate warmup strategy sets                         
                            
                                
                                    α
                                
                                
                                    t
                                
                            
                        
                     as some small values in the first few steps… We observe that, without applying warmup, the gradient distribution is distorted”).
FEDZIP and RADAM are analogous art because they are from the same field of endeavor as the claimed invention, namely machine learning optimization. The combination of FEDZIP and CADSGD teaches a method for federated learning which compresses updates via clustering with representative values and distortion awareness, but does not appear to distinctly disclose adaptively adjusting a learning rate. RADAM provides the additional functionality by disclosing an adaptive learning rate. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of the combination of FEDZIP and CADSGD with RADAM’s method of adaptive learning rates because “researchers typically use different settings in different applications and have to take a trial-and-error approach, which can be very tedious and time-consuming” (page 1, section 1, paragraph 3, RADAM), and thus automated, though still accurate, methods are desired, as suggested by RADAM.

FEDZIP, CADSGD, Ren, and STACK
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of FEDZIP, REN, and CADSGD, in view of “Protocol Stack Perspective For Low Latency and Massive Connectivity in Future Cellular Networks” by Syed Waqas Haider Shah et al., referenced herein as STACK.

Claim 8
FEDZIP teaches “The method of claim 1”, as discussed above.
FEDZIP further teaches “further comprising grouping the gradient vector parameters of the machine learning model sequentially” (page 8, section 4.2, equation 5, FEDZIP: “
    PNG
    media_image2.png
    69
    883
    media_image2.png
    Greyscale
”; (EN): Paragraph [0083] of the instant specification states “The grouping of the gradient vector parameters of the machine learning model may be sequential… Alternatively, the grouping may be in accordance with a more complex policy”, but does not appear to explicitly define a sequential grouping. The examiner notes that, as written and in light of the instant specification, the language of this claim limitation is vague in that it does not distinctly specify if the sequential ordering modifies the input or the output of the grouping process claimed. The examiner notes that numerical values which are organized in a vector may be considered sequentially grouped, as a vector inherently orders values sequentially by nature of the sequential storage encoding utilized by vectors. This inherently teaches a sequentially ordered output. Further, by teaching a method which processes each parameter in the set                         
                            Δ
                            
                                
                                    w
                                
                                
                                    m
                                
                            
                        
                     is assigned to a representative value (expressed by the centroids of groups), FEDZIP teaches processing the parameters sequentially or a sequential input).
FEDZIP does not appear to explicitly teach “for a random access channel (RACH) communication round”.
However, analogous art STACK provides this additional functionality by teaching “for a random access channel (RACH) communication round” (page 5, section 3.E, paragraph 1, STACK: “In LTE, user-equipments (UEs) and loT devices contend to access data channel using physical random access channel (PRACH). The devices connected to the network can perform RACH process in contention-based, contention-free, and hybrid fashions”).
FEDZIP and STACK are analogous art because they are from the same field of endeavor as the claimed invention, namely distributed data processing and networking. The combination of FEDZIP and CADSGD teaches a method for federated learning which compresses updates via clustering with representative values, but does not appear to distinctly disclose utilizing a random access channel communication round. STACK provides the additional functionality by disclosing a protocol stack involving RACH communication. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of the combination of FEDZIP and CADSGD with STACK’s method of wireless channel communication because “This approach is suitable for delay-tolerant access and can also accommodate a large number of users/devices” (page 5, section 3.E, paragraph 1, STACK), given “as loT reaches ubiquity, the massive amount of data generated as a result, would need to be processed and stored efficiently, in order to make a profound sense out of it” (page 1, section 1, paragraph 2, STACK), as suggested by STACK.

FEDZIP, CADSGD, Ren, and LEE
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of FEDZIP, REN, and CADSGD, in view of “Protocol Stack Perspective For Low Latency and Massive Connectivity in Future Cellular Networks” by Syed Waqas Haider Shah et al., referenced herein as STACK.

Claim 11
FEDZIP teaches “A method of wireless communication for federated learning, by a network entity, comprising” (page 1, ABSTRACT, FEDZIP: “Federated learning… decentralized machine learning (especially deep learning) for wireless devices… In this work, we propose a novel framework, FedZip, that significantly decreases the size of the updates while transferring weights from the deep learning model between clients and their servers”; (EN): The instant specification does not appear to explicitly define a network entity. As written and in light of the instant specification, the BRI of a network entity encompasses FEDZIP’s server. As written and in light of the instant specification, the BRI of wireless communication includes transferring data between wireless devices).
FEDZIP further teaches “transmitting, to a plurality of user equipment (UEs), a machine learning model for federated learning” (page 3, figure 1, FEDZIP: as attached above; (EN): As written and in light of the instant specification, the BRI of transmitting a machine learning model includes broadcasting the global model).
FEDZIP further teaches “transmitting, to the plurality of UEs, a grouping structure to enable the plurality of UEs to group sets of gradient vector parameters for the machine learning model into a plurality of subsets” (page 7, section 4.2, paragraph 2, FEDZIP: “To identify the optimal number of clusters, we use the silhouette index and identify k = 3 as the best value for the k-mean hyper-parameter for both datasets”; (EN): Identifying a value for a hyperparameter defining the number of clusters is encompassed by the BRI, as written and in light of the instant specification, of transmitting a grouping structure).
FEDZIP further teaches “receiving, from each of the plurality of UEs…, representative values for each of the plurality of subsets” (page 8, algorithm 1, lines 6-7 and 20-22, FEDZIP: “for client mth                         
                            ∈
                            
                                
                                    S
                                
                                
                                    t
                                
                            
                        
                     in parallel do… ClientUpdate(m,                         
                            
                                
                                    w
                                
                                
                                    t
                                
                            
                        
                    )… encoding(                        
                            Δ
                            w
                        
                    ):… encoding(w) return… to the Server”; (EN): The encoding calculated and returned to the server for the specific machine corresponds to the representative values associated with the UE).
FEDZIP further teaches “reconstructing full dimensional gradient vectors based on the representative values” (page 8, algorithm 1, line 8, FEDZIP: “                        
                            
                                
                                    w
                                
                                
                                    t
                                    +
                                    1
                                
                                
                                    m
                                
                            
                            ←
                            d
                            e
                            c
                            o
                            d
                            i
                            n
                            g
                            
                                
                                    m
                                    s
                                    
                                        
                                            g
                                        
                                        
                                            t
                                            +
                                            1
                                        
                                        
                                            m
                                        
                                    
                                    ,
                                    
                                        
                                            w
                                        
                                        
                                            t
                                            +
                                            1
                                        
                                        
                                            m
                                        
                                    
                                
                            
                        
                    ”; (EN): As written and in light of the instant specification, the BRI of this limitation is directed to reconstructing the full dimensional gradient vector for each UE of the plurality of UEs).
FEDZIP further teaches “and updating the machine learning model based on the full dimensional gradient vectors”(page 8, algorithm 1, line 11, FEDZIP: “                        
                            
                                
                                    w
                                
                                
                                    t
                                    +
                                    1
                                
                            
                            ←
                            
                                
                                    ∑
                                    
                                        m
                                        =
                                        1
                                    
                                    
                                        M
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    m
                                                
                                            
                                        
                                        
                                            n
                                        
                                    
                                    
                                        
                                            w
                                        
                                        
                                            t
                                            +
                                            1
                                        
                                        
                                            m
                                        
                                    
                                
                            
                        
                    ”).
FEDZIP is not relied upon to explicitly teach “the grouping structure comprising at least one of a grouping pattern for a specific round of federated learning, an interleaving pattern, or a sampling pattern”. However, LEE teaches “the grouping structure comprising at least one of a grouping pattern for a specific round of federated learning, an interleaving pattern, or a sampling pattern” ([0178] That is, the device may receive a hyper parameter related to grouping of UEs participating in the federated learning from the server. The examiner notes that LEE teaches a server that sends messages to UEs containing hyper parameters relating to the grouping of UEs involved in federated learning. The examiner further notes that FEDZIP and LEE are both directed to machine learning and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified FEDZIP’s data collection to incorporate “the grouping structure comprising at least one of a grouping pattern for a specific round of federated learning, an interleaving pattern, or a sampling pattern” as taught by LEE [0178] in order to overcome the large size of the variance value of the distances calculated [0178].)
FEDZIP does not appear to explicitly disclose “receiving, from each of the plurality of UEs in an analog over the air (OTA) aggregated signal on a multiple access channel, representative values for each of the plurality of subsets”.
However, analogous art CADSGD provides this additional functionality by teaching “receiving, from each of the plurality of UEs in an analog over the air (OTA) aggregated signal on a multiple access channel, representative values for each of the plurality of subsets” (page 4, bottom paragraph, CADSGD: “We then study analog transmission from the devices to the PS motivated by the signal-superposition property of the wireless MAC… the gradient estimate sent by each device… analog over-the-air computation”; and page 7, second paragraph, CADSGD: “We remark that the goal is to recover the average of the local gradient estimates of the devices at the PS, which is a distributed lossy computation problem over a noisy MAC… an analog transmission approach, where the gradients are transmitted simultaneously over the wireless MAC in an analog fashion… Analog transmission has been well studied”; (EN): The gradient estimate sent by each device corresponds to the representative value. Averaging is a form of aggregation).
FEDZIP and CADSGD are analogous art because they are from the same field of endeavor as the claimed invention, federated learning. FEDZIP teaches receiving, from each of the plurality of UEs, representative values for each of the plurality of subsets, but does not appear to distinctly disclose receiving, from each of the plurality of UEs in an analog over the air (OTA) aggregated signal on a multiple access channel, representative values for each of the plurality of subsets as taught by CADSGD. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of FEDZIP with CADSGD’s analog transmission on a multiple access channel for aggregation because “Numerical results for the MNIST classification task show that the proposed CA-DSDG scheme improves upon the other analog and digital schemes under consideration with the same average power constraint and bandwidth resources, with the improvement more significant when the datasets across devices are non-independent and identically distributed (i. i. d.). Its performance is also shown to be robust against imperfect channel state information (CSI) at the devices… This highlights the energy efficiency of over-the-air computation, and makes it particularly attractive for FL across low-power IoT sensors” (pages 4-5, bottom paragraph of page-first paragraph of page, CADSGD), as suggested by CADSGD.

FEDZIP, CADSGD, REN, and SURVEY
Claims 18-21, 23-24, and 26-30 are rejected under 35 U.S.C. 103 as being unpatentable over FEDZIP, in view of REN, in view of SURVEY, in view of CADSGD.

Claim 18
Claim 18 is rejected based upon the same rationale as claim 1 as it is the apparatus claim corresponding to the method claim.

Claim 19
FEDZIP teaches “The apparatus of claim 18”, as discussed above.
FEDZIP further teaches “in which the plurality of subsets each include a number of parameters, the number of parameters being global to the machine learning model” (page 7, section 4.2, paragraph 2, FEDZIP: “To identify the optimal number of clusters, we use the silhouette index and identify                         
                            k
                            =
                            3
                        
                     as the best value for the k-mean hyper-parameter for both datasets”; (EN): Paragraph [0049] of the instant specification states “system parameters associated with a computational device (e.g., neural network with weights)”, but does not appear to explicitly disclose a parameter, a number of parameters, or global parameter(s). As written and in light of the instant specification, a system specification which is shared across user devices, such as the value defining the number of clusters (                        
                            k
                        
                     of FEDZIP), is encompassed by the BRI of a global parameter).

Claim 20
FEDZIP teaches “The apparatus of claim 18”, as discussed above.
FEDZIP further teaches “in which the plurality of subsets each include a number of parameters, the number of parameters being local to a weight matrix at each neural network layer of the machine learning model” (page 7, section 4.2, paragraph 1, FEDZIP: “We use a statistical quantization form, in which each tensor includes biases and weights (separately for each layer of the model) is quantized by a k-mean clustering algorithm. We use clustering because the centroids of each cluster offer a reasonable reflection of the distribution for the tensor’s value”; (EN): Cluster centroids, which are computed separately for each layer, is encompassed by the BRI, as written and in light of the instant specification, of a parameter being local to a weight matrix at each neural network layer of the model).

Claim 21
FEDZIP teaches “The apparatus of claim 18”, as discussed above.
SURVEY further teaches “in which the plurality of subsets each include a number of parameters, the number of parameters being local to a column of a weight matrix” (page 31, section 4.1, paragraph 2, SURVEY: “The Khatri-Rao product between two matrices                         
                            A
                            ∈
                            
                                
                                    R
                                
                                
                                    I
                                    ×
                                    K
                                
                            
                        
                     and                         
                            B
                            ∈
                            
                                
                                    R
                                
                                
                                    J
                                    ×
                                    K
                                
                            
                        
                    … corresponds to the column-wise Kronecker product”’ (EN): A column-wise parameter is reasonably understood to be encompassed by the BRI of parameters being local to a column of a weight matrix, as written and in light of the instant specification).
FEDZIP further teaches “at each neural network layer of the machine learning model” (page 7, section 4.2, paragraph 1, FEDZIP: “separately for each layer of the model”).

Claim 23
FEDZIP teaches “The apparatus of claim 18”, as discussed above.
SURVEY further teaches “in which the at least one processor is further configured”, as discussed above.
CADSGD further teaches “to compute a difference between true gradients and the transmitted representative values” (page 4, second paragraph of page, CADSGD: “With CA-DSGD, we exploit the similarity in the sparsity patterns of the gradient estimates at different devices to speed up the computations, where each device projects its gradient estimate to a low-dimensional vector and transmits only the important gradient entries while accumulating the error”; (EN): Accumulating an error corresponding to an estimation of reduced dimensionality gradients is encompassed by the BRI, as written and in light of the instant specification, of computing a difference between true gradient values and the transmitted values which represent the gradients).
CADSGD further teaches “and to add the difference to a next gradient vector for a second communication round of the federated learning” (pages 8-9, bottom paragraph of page-first paragraph of page, CADSGD: “the gradient estimate                         
                            
                                
                                    g
                                
                                
                                    m
                                
                            
                            
                                
                                    
                                        
                                            Θ
                                        
                                        
                                            t
                                        
                                    
                                
                            
                        
                    , computed at device                         
                            m
                        
                    , is added to the error accumulated from previous iterations”; (EN): Adding a computed measure of gradient error to the next iteration is reasonably understood to be equivalent to adding the current gradient to the computed measure of gradient error from the previous iteration).
FEDZIP and CADSGD are analogous art because they are from the same field of endeavor as the claimed invention. The combination of FEDZIP and SURVEY teaches a method for federated learning which compresses updates via clustering with representative values, but does not appear to distinctly disclose adding a distance between computed and transmitted values, or a measure of error, from previous iterations to gradient estimations in current iterations. CADSGD provides the additional functionality by disclosing an accumulation of error across gradient estimations and utilizing the accumulated error in subsequent iterations. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of the combination of FEDZIP and SURVEY with CADSGD’s method of error aggregation and adjustment because the “CA-DSGD scheme provides the flexibility of adjusting the dimension of the gradient estimate sent by each device, which is particularly important for bandwidth-limited wireless channels, where the bandwidth available for transmission may not be sufficient to send the entire gradient vector at a single time slot” (page 4, second paragraph of page, CADSGD), as suggested by CADSGD.

Claim 24
FEDZIP teaches “The apparatus of claim 18”, as discussed above.
SURVEY further teaches “in which the at least one processor is further configured” (page 4, section 1, paragraph 3, SURVEY: “Graphics Processing Units (GPUs)”).
FEDZIP further teaches “to group the gradient vector parameters of the machine learning model into different subsets with a different grouping pattern for a second communication round” (page 7, section 4.2, paragraph 1, FEDZIP: “We use a statistical quantization form, in which each tensor includes biases and weights (separately for each layer of the model) is quantized by a k-mean clustering algorithm. We use clustering because the centroids of each cluster offer a reasonable reflection of the distribution for the tensor’s value”; (EN): Paragraph [0083] of the instant specification states “In some aspects of the present disclosure, different grouping patterns may be used at different communication rounds. For example, a base station may signal several different grouping patterns, and at each round, the base station indicates which grouping pattern to use. The grouping pattern may be determined deterministically as a function of the round index or other known parameters”, but does not appear to explicitly define a different grouping pattern. As such, a grouping of different numerical values would lead to updated clusters, resulting in differing calculated centroids. By determining cluster centroids for each layer of the model for every round, FEDZIP teaches the BRI, as written and in light of the instant specification, of a different grouping pattern for the second round of communication).

Claim 26
FEDZIP teaches “The apparatus of claim 18”, as discussed above.
SURVEY further teaches “in which the at least one processor is further configured”, as discussed above
FEDZIP further teaches “to interleave the gradient vector parameters prior to grouping the gradient vector parameters” (page 8, section 4.2, equation 5, FEDZIP: “
    PNG
    media_image3.png
    69
    883
    media_image3.png
    Greyscale
”; (EN): Paragraph [0084] of the instant specification states “In some aspects of the present disclosure, interleaving may be applied to the entries of the gradient vector, followed by grouping of consecutive entries. The interleaving pattern may change across rounds. In some aspects, the interleaving pattern may be indicated by the base station. In other aspects, the interleaving pattern may be derived deterministically as a function of known parameters. Interleaving may be confined within each layer or may be across layers”, but does not appear to explicitly define interleaving the gradient values. As written and in light of the instant specification, the BRI of this limitation is directed to ordering the values before grouping, such as by processing each parameter one by one when associating a representative value).
FEDZIP further teaches “an interleaving pattern determined deterministically in accordance with a function of known parameters” (page 8, section 4.2, equation 5, FEDZIP: “
    PNG
    media_image2.png
    69
    883
    media_image2.png
    Greyscale
”; (EN): As discussed above, the instant specification does not appear to explicitly define interleaving the gradient values. Thus, as written and in light of the instant specification, this claim limitation encompasses an interleaving as a sequential processing of known model parameters, otherwise known as deterministically determined in accordance with a function of known parameters).

Claim 27
FEDZIP teaches The combination of FEDZIP and SURVEY teaches “The apparatus of claim 18”, as discussed above.
SURVEY further teaches “in which the at least one processor is further configured”, as discussed above.
FEDZIP further teaches “to sample the gradient vector parameters” (page 8, algorithm 1, lines 14-16, FEDZIP: “                        
                            B
                             
                            ←
                        
                     (split                         
                            
                                
                                    P
                                
                                
                                    m
                                
                            
                        
                     into batches of size                         
                            B
                        
                    ) for local epoch i from 1 to E do for batch                         
                            b
                             
                            ∈
                            B
                        
                     do”).
FEDZIP further teaches “a sampling pattern determined deterministically in accordance with a function of known parameters” (page 5, section 3.1, paragraph 2, FEDZIP: “Element                         
                            B
                        
                     is the local mini-batch size and describes the amount of the local dataset that is covered.                         
                            B
                            =
                            ∞
                        
                     means all data points are fed to the model”; (EN): As written and in light of the instant specification, sampling all i data elements is encompassed by the BRI of a sampling pattern determined deterministically in accordance with a function of known parameters, where the number of data elements if the know parameter).

Claim 28
Claim 28 is rejected based upon the same rationale as claim 11 as it is the apparatus claim corresponding to the method claim.
 
Claim 29
FEDZIP teaches “The apparatus of claim 28”, as discussed above.
FEDZIP further teaches “in which the grouping structure is global to the machine learning model” (page 7, section 4.2, paragraph 2, FEDZIP: “To identify the optimal number of clusters, we use the silhouette index and identify                         
                            k
                            =
                            3
                        
                     as the best value for the k-mean hyper-parameter for both datasets”; (EN): Using the same number of clusters for each machine is encompassed by the BRI of a global grouping structure, as written and in light of the instant specification).

Claim 30
FEDZIP teaches “The apparatus of claim 28”, as discussed above.
FEDZIP further teaches “in which the grouping structure is local to a weight matrix at each neural network layer of the machine learning model” (page 7, section 4.2, paragraph 1, FEDZIP: “We use a statistical quantization form, in which each tensor includes biases and weights (separately for each layer of the model) is quantized by a k-mean clustering algorithm. We use clustering because the centroids of each cluster offer a reasonable reflection of the distribution for the tensor’s value”; (EN): Cluster centroids, which are computed separately for each layer, is encompassed by the BRI, as written and in light of the instant specification, of a grouping structure being local to a weight matrix at each neural network layer of the model).

FEDZIP, Ren, SURVEY, CADSGD, and RADAM
Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of FEDZIP, REN, SURVEY, and CADSGD, and in view of RADAM.

Claim 22
FEDZIP teaches “The apparatus of claim 18”, as discussed above.
SURVEY further teaches “in which the at least one processor is further configured” (page 4, section 1, paragraph 3, SURVEY: “Graphics Processing Units (GPUs)”).
The combination of this embodiment of FEDZIP, SURVEY, and CADSGD does not appear to explicitly teach “to reduce a level of distortion resulting from a mismatch between computed gradients and transmitted gradients due to grouping the set of gradient vector parameters”.
However, analogous art described in FEDZIP’s other embodiments provides this additional functionality by teaching “to reduce a level of distortion resulting from a mismatch between computed gradients and transmitted gradients due to grouping the set of gradient vector parameters” (page 4, section 2.2, paragraph 4, FEDZIP: “carrying the error from each previous round (e.g. errors in the round one and two will be carried to round three) to ensure high convergence speed while still considering all gradients during model training”; (EN): A person of reasonable skill in the art will appreciate that, as written and in light of the instant specification, the BRI of reducing a level of distortion when processing gradients encompasses utilizing previous measures of error to ensure high convergence speed while processing all gradients).
FEDZIP’s embodiments are analogous art because they are from the same field of endeavor as the claimed invention, namely optimization for federated learning. The combination of the first embodiment of FEDZIP, SURVEY, and CADSGD teaches a method for federated learning which compresses updates via clustering with representative values, but does not appear to distinctly disclose reducing a level of distortion of the gradients. FEDZIP’s other embodiment provides the additional functionality by disclosing a method which carries error from each round of communication. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of the combination of the first embodiment of FEDZIP, SURVEY, and CADSGD with this second embodiment of FEDZIP’s method of gradient error maintenance in order “to ensure high convergence speed while still considering all gradients during model training” (page 4, section 2.2, paragraph 4, FEDZIP), as suggested by FEDZIP’s second embodiment.
The combination of FEDZIP, SURVEY, and CADSGD does not appear to explicitly teach “to adaptively adjust a learning rate”.
However, analogous art RADAM provides this additional functionality by teaching “to adaptively adjust a learning rate” (page 2, section 2, paragraphs 2-3, RADAM: “Instead of setting the learning rate                         
                            
                                
                                    α
                                
                                
                                    t
                                
                            
                        
                     as a constant or in decreasing order, a learning rate warmup strategy sets                         
                            
                                
                                    α
                                
                                
                                    t
                                
                            
                        
                     as some small values in the first few steps… We observe that, without applying warmup, the gradient distribution is distorted”).
FEDZIP and RADAM are analogous art because they are from the same field of endeavor as the claimed invention, namely machine learning optimization. The combination of FEDZIP, SURVEY, and CADSGD teaches a method for federated learning which compresses updates via clustering with representative values and distortion awareness, but does not appear to distinctly disclose adaptively adjusting a learning rate. RADAM provides the additional functionality by disclosing an adaptive learning rate. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of the combination of FEDZIP, SURVEY, and CADSGD with RADAM’s method of adaptive learning rates because “researchers typically use different settings in different applications and have to take a trial-and-error approach, which can be very tedious and time-consuming” (page 1, section 1, paragraph 3, RADAM), and thus automated, though still accurate, methods are desired, as suggested by RADAM.

FEDZIP, REN, SURVEY, CADSGD, and STACK
Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of FEDZIP, REN, SURVEY, and CADSGD in view of STACK.

Claim 25
FEDZIP teaches “The apparatus of claim 18”, as discussed above.
SURVEY further teaches “in which the at least one processor is further configured”, as discussed above.
FEDZIP further teaches “to group the gradient vector parameters of the machine learning model sequentially” (page 8, section 4.2, equation 5, FEDZIP: “
    PNG
    media_image2.png
    69
    883
    media_image2.png
    Greyscale
”; (EN): Paragraph [0083] of the instant specification states “The grouping of the gradient vector parameters of the machine learning model may be sequential… Alternatively, the grouping may be in accordance with a more complex policy”, but does not appear to explicitly define a sequential grouping. The examiner notes that, as written and in light of the instant specification, the language of this claim limitation is vague in that it does not distinctly specify if the sequential ordering modifies the input or the output of the grouping process claimed. The examiner notes that numerical values which are organized in a vector may be considered sequentially grouped, as a vector inherently orders values sequentially by nature of the sequential storage encoding utilized by vectors. This inherently teaches a sequentially ordered output. Further, by teaching a method which processes each parameter in the set                 
                    Δ
                    
                        
                            w
                        
                        
                            m
                        
                    
                
             is assigned to a representative value (expressed by the centroids of groups), FEDZIP teaches processing the parameters sequentially or a sequential input).
The combination of FEDZIP, SURVEY, and CADSGD does not appear to explicitly teach “for a random access channel (RACH) communication round”.
However, analogous art STACK provides this additional functionality by teaching “for a random access channel (RACH) communication round” (page 5, section 3.E, paragraph 1, STACK: “In LTE, user-equipments (UEs) and loT devices contend to access data channel using physical random access channel (PRACH). The devices connected to the network can perform RACH process in contention-based, contention-free, and hybrid fashions”).
FEDZIP and STACK are analogous art because they are from the same field of endeavor as the claimed invention, namely distributed data processing and networking. The combination of FEDZIP, SURVEY, and CADSGD teaches a method for federated learning which compresses updates via clustering with representative values, but does not appear to distinctly disclose utilizing a random access channel communication round. STACK provides the additional functionality by disclosing a protocol stack involving RACH communication. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of the combination of FEDZIP, SURVEY, and CADSGD with STACK’s method of wireless channel communication because “This approach is suitable for delay-tolerant access and can also accommodate a large number of users/devices” (page 5, section 3.E, paragraph 1, STACK), given “as loT reaches ubiquity, the massive amount of data generated as a result, would need to be processed and stored efficiently, in order to make a profound sense out of it” (page 1, section 1, paragraph 2, STACK), as suggested by STACK.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
The following references have been determined to be related to the application, but were not applied in any specific rejection. They are nonetheless listed below for reference.
HU (US20230106985A1)
“HU teaches A method for using federated learning to develop a machine-learning model”
BUTT (US20240152768A1)
“BUTT teaches a method for enabling/realizing efficient model training, including model collection and/or aggregation, for federated learning, including hierarchical federated learning, in a wireless communication system”
Shaloudegi (US 2021/0365841 Al)
“Shaloudegi teaches a method for implementing federated learning”
Bega (US 2022/0046410 Al)
“Bega teaches a method that enables representative user equipment (UE) sampling for UE-related analytics services”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAMCY ALGHAZZY whose telephone number is (571) 272-8824. The examiner can normally be reached Monday-Friday 8:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ RIVAS can be reached on (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAMCY ALGHAZZY/Examiner, Art Unit 2128 

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128
Read full office action
Prosecution Timeline

Apr 06, 2022
Application Filed
Jan 29, 2025
Non-Final Rejection — §101, §103
Apr 23, 2025
Response Filed
Apr 23, 2025
Examiner Interview Summary
May 01, 2025
Final Rejection — §101, §103
Jun 20, 2025
Response after Non-Final Action
Jul 02, 2025
Request for Continued Examination
Jul 08, 2025
Response after Non-Final Action
Sep 27, 2025
Non-Final Rejection — §101, §103
Dec 11, 2025
Examiner Interview Summary
Dec 11, 2025
Applicant Interview (Telephonic)
Dec 15, 2025
Response Filed
Feb 21, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/613,773
Patent 12596925
SINGLE-STAGE MODEL TRAINING FOR NEURAL ARCHITECTURE SEARCH
2y 5m to grant Granted Apr 07, 2026
18/612,881
Patent 12596922
ACCELERATING NEURAL NETWORKS IN HARDWARE USING INTERCONNECTED CROSSBARS
2y 5m to grant Granted Apr 07, 2026
19/236,733
Patent 12579408
ADAPTIVELY TRAINING OF NEURAL NETWORKS VIA AN INTELLIGENT LEARNING MANAGEMENT SYSTEM
2y 5m to grant Granted Mar 17, 2026
17/704,176
Patent 12572847
SYSTEMS AND METHODS FOR RESOURCE-AWARE MODEL RECALIBRATION
2y 5m to grant Granted Mar 10, 2026
16/678,038
Patent 12566966
TRAINING ADAPTABLE NEURAL NETWORKS BASED ON EVOLVABILITY SEARCH
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
48%
Grant Probability
49%
With Interview (+0.7%)
3y 11m
Median Time to Grant
High
PTA Risk
Based on 62 resolved cases by this examiner. Grant probability derived from career allow rate.