Office Action Analysis: 17999802 — COLLABORATIVE MACHINE LEARNING

Office Action

§101 §102 §103 §112
DETAILED ACTION
This Office Action is in response to communications filed on November 11, 2025 for Application No. 17/999,802, in which claims 60, 63-70, and 73-79 are presented for examination. The amendments filed on November 11, 2025 have been entered, where claims 60, 63-70, 73, and 75-79 are amended and claims 61-62 and 71-72 are canceled, with claims 1-59 having been previously canceled. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 74 and 79 are objected to because of the following informalities: 
“the data representation updated” should be “the data representation is updated” (Claim 74, Ln. 3).
“and” (Claim 79, Ln. 7) should be removed because the conjunctive “and” should only be used to separate the penultimate and ultimate elements in a sequence.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 77 and 78 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.

Specifically, as currently formulated, both claim 77 and 78 depend upon claim 72 (Claim 77, ln. 1, “The method of claim 72”; Claim 78, ln. 1, “The method of claim 72”). However, claim 72 is canceled. As a result, the meaning of “[t]he method”, as recited in both claim 77 and claim 78, is unclear because one of ordinary skill in the art would not be reasonably apprised of its scope. As a result, both claim 77 and claim 78 are indefinite. 
Therefore, they are rejected. Applicant should amend the claims so they are both dependent upon a claim currently presented for examination. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 60, 63-70, and 73-79 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract ideas without significantly more. 

Regarding Claim 60:
Step 1: Claim 60 is a machine claim. Therefore, Claims 60-69 are directed to a statutory category of eligible subject matter.
Step 2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. Here, claimed limitations are mental processes. Specifically, the claim recites 
“determine one or more properties associated with one or more processing nodes” (mental process – apart from the “processing nodes” themselves, amounts to the evaluating observed data to determine properties);
“determine, based on the one or more properties, one or more of the particular processing nodes” (mental process – apart from the “processing nodes” themselves, amounts to exercising judgment, based on the evaluated properties, to form an opinion);
“determine a similarity between dataset properties of a particular first processing node and corresponding dataset properties of one or more known processing nodes” (mental process – apart from the various “processing nodes”, amounts to evaluating observed data to form an opinion on similarity) and
“determine that the first processing node is to be used . . . with the known processing nodes only in response to the determined similarity being within a predetermined threshold” (mental process – apart from the various “processing nodes”, amounts to exercising judgment to form an opinion, based on the mental evaluations and a known threshold);
“in response to the determined similarity being within the predetermined threshold, update . . . based upon the first processing node” (mental process – apart from “the first processing node” itself, amounts to exercising judgment, in response to the determination that a condition is satisfied, to form an opinion that an information source should be used for updating).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“Apparatus, comprising: at least one processor; and at least one memory storing computer program code which, when executed by the at least one processor, causes the apparatus at least to . . . update the collaboratively learned model” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on the abstract idea) and
“processing nodes, the one or more processing nodes configured to utilize respective data based on a local dataset of one or more particular processing nodes for updating a collaboratively learned model . . . wherein the one or more properties associated with the one or more particular processing node are based on one or more properties of its local dataset . . . processing nodes for use in updating the learned model . . . a particular first processing node . . . one or more known processing nodes already used to update the collaboratively learned model . . . the first processing node . . . for updating the collaboratively learned model . . . the known processing nodes . . . based upon the first processing node” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“Apparatus, comprising: at least one processor; and at least one memory storing computer program code which, when executed by the at least one processor, causes the apparatus at least to . . . update the collaboratively learned model” (merely reciting instructions to apply the exception using generic computer components does not provide an inventive concept that amounts to significantly more) and
“processing nodes, the one or more processing nodes configured to utilize respective data based on a local dataset of one or more particular processing nodes for updating a collaboratively learned model . . . wherein the one or more properties associated with the one or more particular processing node are based on one or more properties of its local dataset . . . processing nodes for use in updating the learned model . . . a particular first processing node . . . one or more known processing nodes already used to update the collaboratively learned model . . . the first processing node . . . for updating the collaboratively learned model . . . the known processing nodes . . . based upon the first processing node” (merely generally linking to a particular technological environment or field of use does not provide an inventive concept that amounts to significantly more).
For the reasons above, Claim 60 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claims 61-69. The additional limitations of the dependent claims are addressed below.

Regarding Claim 63:
Step 2A Prong 1: See the rejection of Claim 60 above, which Claim 63 depends on. Here, the claim recites additional elements that are mental processes. Specifically, the claim recites
“associating one or more sub-models associated with the learned model with a respective set of one or more known processing nodes already used to update a particular one of said sub-models” (mental process – apart from the “sub-models”, “learned model”, and “processing nodes”, amounts to forming an opinion on associations of observed data) and
“wherein, responsive to identifying that the particular first processing node is not currently used to update any one of the sub-models, identifying a known processing node of the representation having the most similar dataset properties to that of the first processing node, and determining that the first processing node is subsequently to be used for updating the particular sub-model updated by said most-similar known processing node” (mental process – apart from the “sub-models” and “processing nodes”, amounts to forming an opinion on observed data, based on knowledge of its current use and similarity to other observed data).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional element:
“wherein the at least one memory storing the computer program code which, when executed by the at least one processor, further causes the apparatus at least to” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on the abstract idea);
“access a data representation” (accessing stored data is insignificant extra-solution activity that is incidental to the claimed subject matter); and
“sub-models . . . learned model . . . processing nodes . . . sub-models . . . first processing node . . . sub-models . . . processing node . . . first processing node . . . first processing node . . . sub-model . . . processing node”  (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the at least one memory storing the computer program code which, when executed by the at least one processor, further causes the apparatus at least to” (merely reciting instructions to apply the exception using generic computer components does not provide an inventive concept that amounts to significantly more);
“access a data representation” (accessing stored data in memory is well‐understood, routine, and conventional, see generally Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); see also OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93; therefore, the limitation, which is recited with a high level of generality, remains insignificant extra-solution activity even upon reconsideration); and 
“sub-models . . . learned model . . . processing nodes . . . sub-models . . . first processing node . . . sub-models . . . processing node . . . first processing node . . . first processing node . . . sub-model . . . processing node” (merely generally linking to a particular technological environment or field of use does not provide an inventive concept that amounts to significantly more).
Accordingly, Claim 63 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 64:
Step 2A Prong 1: See the rejection of Claim 63 above, which Claim 64 depends on. Here, the claim recites additional elements that are mental processes. Specifically, the claim recites
“wherein the determined sub-model is subsequently updated using data . . . and the data representation is updated to include the first processing node” (mental process – apart from the “sub-model” and “first processing node”, amounts to evaluating data to make modifications based on observed data, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional element:
“sub model . . . already used to update the sub-model” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits the abstract idea) and
“data from the first processing node and all other known processing nodes” (transmission of the data amounts to insignificant extra-solution activity that is incidental to the claimed subject matter).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“sub model . . . already used to update the sub-model” (merely generally linking to a particular technological environment or field of use does not provide an inventive concept that amounts to significantly more) and
“data from the first processing node and all other known processing nodes” (transmitting data is well‐understood, routine, and conventional, see generally Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362; see also buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014); therefore the limitation, which is recited with a high level of generality, remains insignificant extra-solution activity even upon reconsideration).
Accordingly, Claim 64 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 65:
Step 2A Prong 1: See the rejection of Claim 64 above, which Claim 65 depends on. Here, the claim recites additional elements that are mental processes. Specifically, the claim recites
“wherein the updatable data representation comprises a hierarchical representation of the known processing nodes, including a root node associated with the learned model and one or more descending levels including one or more leaf nodes associated with a respective sub-model, the one or more leaf nodes being linked to a higher-level node having the most similar dataset properties” (mental process – apart from the “processing nodes”, “learned model”, and “sub-model[s]”, amounts to exercising judgment to associate observed data in a specific hierarchy) and
“wherein identifying the known processing node of the representation having the most similar dataset properties is performed only with respect to a set of candidate nodes comprising the root node and the one or more leaf nodes” (mental process – apart from the “processing node”, amounts to evaluating a specific subset of observed data to form an opinion on similarity to another observed data object).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional element:
“processing node . . . learned model . . . sub-model . . . processing node” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“processing node . . . learned model . . . sub-model . . . processing node”  (merely generally linking to a particular technological environment or field of use does not provide an inventive concept that amounts to significantly more).
Accordingly, Claim 65 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 66:
Step 2A Prong 1: See the rejection of Claim 63 above, which Claim 66 depends on.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional element:
“wherein the data representation is stored” (storage of the data amounts to insignificant extra-solution activity that is incidental to the claimed subject matter) and
“at a centralized collaborative server for access by the one or more processing nodes” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the data representation is stored” (storing and retrieving data in memory is well‐understood, routine, and conventional, see generally Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); see also OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93; therefore the limitation, which is recited with a high level of generality, remains insignificant extra-solution activity even upon reconsideration) and
“at a centralized collaborative server for access by the one or more processing nodes” (merely generally linking to a particular technological environment or field of use does not provide an inventive concept that amounts to significantly more).
Accordingly, Claim 66 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 67:
Step 2A Prong 1: See the rejection of Claim 63 above, which Claim 67 depends on.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional element:
“wherein the data representation is stored” (storage of the data amounts to insignificant extra-solution activity that is incidental to the claimed subject matter);
“at one or more of the processing nodes” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits the abstract idea); and 
“and transmitted to other ones of the one or more processing nodes” (transmission of the data amounts to insignificant extra-solution activity that is incidental to the claimed subject matter).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“wherein the data representation is stored” (storing and retrieving data in memory is well‐understood, routine, and conventional, see generally Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); see also OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93; therefore the limitation, which is recited with a high level of generality, remains insignificant extra-solution activity even upon reconsideration);
“at one or more of the processing nodes” (merely generally linking to a particular technological environment or field of use does not provide an inventive concept that amounts to significantly more); and 
“and transmitted to other ones of the one or more processing nodes” (transmitting  data is well‐understood, routine, and conventional, see generally Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362; see also buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014); therefore the limitation, which is recited with a high level of generality, remains insignificant extra-solution activity even upon reconsideration).
Accordingly, Claim 67 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 68:
Step 2A Prong 1: See the rejection of Claim 60 above, which Claim 68 depends on. Here, the claim recites additional elements that are mental processes. Specifically, the claim recites
“wherein the similarity is determined based on a statistical distribution of data in the local datasets” (mental process – amounts to forming an opinion based on a statistical characteristic of data, which may be observed or generated with the aid of pen and paper).
Step 2A Prong 2 & Step 2B: There are no elements left for consideration of implementation within a practical application or for consideration of significantly more.
Accordingly, Claim 68 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 69:
Step 2A Prong 1: See the rejection of Claim 60 above, which Claim 69 depends on. Here, the claim recites additional elements that are mental processes. Specifically, the claim recites
“wherein performing said determinations, responsive to a request” (mental process – amounts to exercising evaluation and judgment to form an opinion in response to a specific condition).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional element:
“received from the first processing node” (amounts to insignificant extra-solution activity, merely receiving data is incidental to the subject matter claimed) and
“the request including the one or more properties of its local dataset” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional element:
“received from the first processing node” (receiving data is well‐understood, routine, and conventional, see generally Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362; see also buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014); therefore the limitation, which is recited with a high level of generality, remains insignificant extra-solution activity even upon reconsideration) and
“the request including the one or more properties of its local dataset” (merely generally linking to a particular technological environment or field of use does not provide an inventive concept that amounts to significantly more).
Accordingly, Claim 69 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 70:
Step 1: Claim 70 is a process claim. Therefore, Claims 70-78 are directed to a statutory category of eligible subject matter.
Step 2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. Here, the claim recites limitations that are substantially the same as the limitations of Claim 60, in the form of a method for determining. As a result, and as elaborated above, these method limitations are abstract ideas because they are mental processes.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“processing nodes, the one or more processing nodes configured to utilize respective data based on a local dataset of one or more particular processing nodes for updating a collaboratively learned model, wherein the one or more properties associated with the one or more particular processing node are based on one or more properties of its local dataset . . . processing nodes for use in updating the learned model . . . first processing node . . . processing nodes already used to update the collaboratively learned model . . . first processing node . . .  for updating the collaboratively learned model . . . processing nodes . . . first processing node” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits the abstract idea) and
“updating the collaboratively learned model” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“processing nodes, the one or more processing nodes configured to utilize respective data based on a local dataset of one or more particular processing nodes for updating a collaboratively learned model, wherein the one or more properties associated with the one or more particular processing node are based on one or more properties of its local dataset . . . processing nodes for use in updating the learned model . . . first processing node . . . processing nodes already used to update the collaboratively learned model . . . first processing node . . .  for updating the collaboratively learned model . . . processing nodes . . . first processing node” (merely generally linking to a particular technological environment or field of use does not provide an inventive concept that amounts to significantly more) and
“updating the collaboratively learned model” (merely reciting instructions to apply the exception using generic computer components does not provide an inventive concept that amounts to significantly more).
For the reasons above, Claim 70 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claims 71-78. The additional limitations of the dependent claims are addressed below.

Regarding Claim 73, the claim recites limitations that are all substantially the same as limitations of Claim 63, in the form of a method. The claim is also directed to performing mental processes without integration into a practical component or significantly more. 
Accordingly, Claim 73 is rejected under the same rationale. 

Regarding Claim 74, the claim recites limitations that are all substantially the same as limitations of Claim 64, in the form of a method. The claim is also directed to performing mental processes without integration into a practical component or significantly more. 
Accordingly, Claim 74 is rejected under the same rationale. 

Regarding Claim 75, the claim recites limitations that are all substantially the same as limitations of Claim 66, in the form of a method. The claim is also directed to performing mental processes without integration into a practical component or significantly more. 
Accordingly, Claim 75 is rejected under the same rationale. 

Regarding Claim 76, the claim recites limitations that are all substantially the same as limitations of Claim 67, in the form of a method. The claim is also directed to performing mental processes without integration into a practical component or significantly more. 
Accordingly, Claim 76 is rejected under the same rationale. 

Regarding Claim 77, the claim recites limitations that are all substantially the same as limitations of Claim 68, in the form of a method. The claim is also directed to performing mental processes without integration into a practical component or significantly more. 
Accordingly, Claim 77 is rejected under the same rationale. 

Regarding Claim 78, the claim recites limitations that are all substantially the same as limitations of Claim 69, in the form of a method. The claim is also directed to performing mental processes without integration into a practical component or significantly more. 
Accordingly, Claim 78 is rejected under the same rationale. 

Regarding Claim 79:
Step 1: Claim 79 is a machine claim. Therefore, it is directed to a statutory category of eligible subject matter.
Step 2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. Here, the claim recites limitations that are substantially the same as the limitations of Claim 60, in the form of a non-transitory computer-readable medium for determining. As a result, and as elaborated above, these limitations are abstract ideas because they are mental processes.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“A non- transitory computer-readable medium comprising program instructions stored thereon for performing the method of . . . updating the collaboratively learned model” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on the abstract idea) and
“processing nodes, the one or more processing nodes configured to utilize respective data based on a local dataset of one or more particular processing nodes for updating a collaboratively learned model, wherein the one or more properties associated with the one or more particular processing node are based on one or more properties of its local dataset . . . processing nodes for use in updating the learned model . . . first processing node . . . processing nodes already used to update the collaboratively learned model . . . first processing node . . . known processing nodes . . . first processing node” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“A non- transitory computer-readable medium comprising program instructions stored thereon for performing the method of . . . updating the collaboratively learned model” (merely reciting instructions to apply the exception using generic computer components does not provide an inventive concept that amounts to significantly more) and
“processing nodes, the one or more processing nodes configured to utilize respective data based on a local dataset of one or more particular processing nodes for updating a collaboratively learned model, wherein the one or more properties associated with the one or more particular processing node are based on one or more properties of its local dataset . . . processing nodes for use in updating the learned model . . . first processing node . . . processing nodes already used to update the collaboratively learned model . . . first processing node . . . known processing nodes . . . first processing node” (merely generally linking to a particular technological environment or field of use does not provide an inventive concept that amounts to significantly more).
For the reasons above, Claim 79 is rejected as being directed to an abstract idea without significantly more. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 60, 63-66, 68, 70, 73-75, 77, and 79 are rejected under 35 U.S.C. 103 as being unpatentable over Sattler et al. (hereinafter Sattler) (“Clustered Federated Learning: Model-Agnostic Distributed Multi-Task Optimization under Privacy Constraints”) in view of Li et al. (hereinafter Li) (“Learning to Detect Malicious Clients for Robust Federated Learning”).

Regarding Claim 60, Sattler teaches an apparatus, comprising: at least one processor; and at least one memory storing computer program code which, when executed by the at least one processor, causes the apparatus at least to (Pg. 7-8, Col. 2-1, Para. 3-1, “the server, while running CFL . . . At every edge  . . . ∆e are cached”, where the “server” is an apparatus, which must have a memory storing computer program code corresponding to the pseudocode algorithms for “running CFL [Clustered federated learning]” to “build a parameter tree”, store values so that “∆e are cached” and to store program instructions corresponding to pseudocode algorithms, see Pg. 6, Algo. 2-3, Pg. 8, Algo. 4-5, and must have at least one processor to execute the pseudocode algorithms, such as Pg. 8, Algo. 5, “Server does [Ln. 11-23]”, where Ln. 11-23 include computations that require a processor; see generally Pg. 5, Col. 2, Para. 1, “Since in Federated Learning it is assumed that the server has far greater computational power than the clients, the overhead of clustering will typically be negligible”):
determine one or more properties associated with one or more processing nodes (Pg. 1, Abstract, “we present Clustered Federated Learning (CFL), a novel Federated Multi-Task Learning (FMTL) framework, which . . . group[s] the client population into clusters with jointly trainable data distributions”, where the “client population” are processing nodes; Pg. 7, Para. 1, “we will compute cosine similarities between weight-updates instead of gradients according to [Equation 44]”, where the “weight-updates” are properties of the “client” processing nodes, see generally Pg. 6-7, Col. 2-1, Para. 4-1, “In Federated Learning however, due to constraints on both the memory of the client devices and their communication budged, instead commonly weight-updates as defined in (1) are computed and communicated”), 
the one or more processing nodes configured to utilize respective data based on a local dataset of one or more particular processing nodes for updating a collaboratively learned model (Pg. 7-8, Col. 2-1, Para. 2-1, “Clustered Federated Learning however is flexible enough to handle client populations that vary over time. In order to incorporate this functionality, the server, while running CFL, needs to build a parameter tree . . . [where clusters of client devices, ] cv and the corresponding stationary solution θv* obtained by running the Federated Learning Algorithm 2 on cluster cv are cached”, where “Algorithm 2” is a collaborative model updating algorithm that uses local data, see Pg. 1, Col. 1-2, Para. 2-1, “Federated Learning . . . is a distributed training framework, which allows multiple clients . . . to jointly train a single deep learning model . . . Every client then proceeds to improve the downloaded model, by performing multiple iterations of stochastic gradient descent . . . with mini-batches sampled from it’s local data Di, resulting in a weight-update vector . . . The procedure is summarized in Algorithm 2”),
wherein the one or more properties associated with the one or more particular processing node (Pg. 7, Para. 1, “In the remainder of this work we will compute cosine similarities between weight-updates instead of gradients according to [Equation 44]”) 
are based on one or more properties of its local dataset (Pg. 1, Col. 1-2, Para. 2-1, “by performing multiple iterations of stochastic gradient descent with mini-batches sampled from it’s local data Di, resulting in a weight-update vector”, where the “weight-update[s]” are computed based on “local data”, which must therefore include at least one property of local data, see also Pg. 2, Col. 2, Para. 4, “we will derive a computationally efficient tool based on the cosine similarity between the clients’ gradient updates that provably allows us to infer whether two members of the client population have the same data generating distribution” and Pg. 7, Col. 1, Para. 1, “Our experiments in section VI will demonstrate that computing cosine similarities based on weight-updates in practice surprisingly achieves even better separations than computing cosine similarities based on gradients”);
determine, based on the one or more properties, one or more of the particular processing nodes for use in updating the learned model (Pg. 7, Fig. 4, “An exemplary parameter tree created by Clustered Federated Learning. At the root node resides the conventional Federated Learning model, obtained by converging to a stationary point θ ∗ of the FL objective over all clients {1, .., m}. In the next layer, the client population has been split up into two groups, according to their cosine similarities . . . This way the new client can be moved down the tree along the path of highest similarity”, where the “group[ings]” and tree “path” of the “client” processing nodes for use in updating a learned model, which could be the “conventional Federated learning model”, is determined based on “cosine similarities”, which is based on the “weight-updates” property, see Pg. 9, Col. 2, Para. 2, “In all following experiments we will compute cosine similarities based on weight-updates instead of gradients”);
determine a similarity between dataset properties of a particular first processing node and corresponding dataset properties of one or more known processing nodes (Pg. 2, Col. 2, Para. 4, “we will derive a computationally efficient tool based on the cosine similarity between the clients’ gradient updates that provably allows us to infer whether two members of the client population have the same data generating distribution”; Pg. 7, Col. 1, Para. 1, “In the remainder of this work we will compute cosine similarities between weight-updates instead of gradients”, where “cosine similarity” is used to determine the dataset property similarity, such as “distribution”, between “weight-updates” of “clients”, which, as discussed below, includes the “new” first processing node and know processing nodes already in use)
already used to update the collaboratively learned model (Pg. 7, Col. 2, Para. 2-3, “Clustered Federated Learning however is flexible enough to handle client populations that vary over time. In order to incorporate this functionality, the server, while running CFL, needs to build a parameter tree T = (V, E) with the following properties . . .”, where new “clients” can be added to a “tree” of known and used “clients”);
determine that the first processing node is to be used for updating the collaboratively learned model with the known processing nodes (Pg. 8, Col. 1, Para. 1-2, “An exemplary parameter tree is shown in Figure 4. When a new client joins the training it can get assigned to a leaf cluster by iteratively traversing the parameter tree from the root to a leaf, always moving to the branch which contains the more similar client updates according to Algorithm 4 . . . On the path from root to leaf, the models get more specialized”, where the “new client” is determined to be used in collaboration with other processing nodes to update the “model” at the “leaf” with “more similar client[s]” that it is “assigned” to; see also Pg. 7, Fig. 4, “, the client population has been split up according to their cosine similarities”)
 only . . . [if] the determined similarity being within a predetermined threshold (Pg. 8, Col. 1, Para. 1-2, “When a new client joins the training it can get assigned to a leaf cluster by iteratively traversing the parameter tree from the root to a leaf, always moving to the branch which contains the more similar client updates”; Pg. 7, Fig. 4, “In order to quickly assign new clients to a leaf model, at each edge e of the tree the server caches the pre-split weight-updates ∆e of all clients belonging to the two different sub-branches. This way the new client can be moved down the tree along the path of highest similarity”, where the “new client” is only assigned to a particular “leaf model”, as opposed to another “leaf model”, if its update is most similar to the “pre-split weight-updates ∆e” at a particular split, which is within the broadest reasonable interpretation of a predetermined threshold because it involves a value that must be exceed for inclusion in a particular group, which is predetermined based on the existing “client devices” and “update”, which are determined prior to the “assign[ment]” of the model); and
in . . . [a subsequent action] to the determined similarity being within the predetermined threshold, update the collaboratively learned model based upon the first processing node (Pg. 8, Col. 1, Para. 2, “When a new client joins the training it can get assigned to a leaf cluster by iteratively traversing the parameter tree from the root to a leaf, always moving to the branch which contains the more similar client updates according to Algorithm 4” and Pg. 8, Col. 1, Algo. 4, where, after it is determined similarity being within the predetermined threshold the first processing node, “a new client”, “joins the training” at “to the branch which contains the more similar client updates”, and where, once added to the training, the new node updates the collaboratively learned model, see Pg. 1, Col. 1, Para. 2, “Federated Learning . . . is a distributed training framework, which allows multiple clients (typically mobile or IoT devices) to jointly train a single deep learning model on their combined data in a communication-efficient way”).
Sattler does not explicitly disclose . . . in response to . . . response . . . (where the step of determining similarity is a functional necessary condition for the steps of determining and updating; whereas the recitation of “in response to” and “response” is interpreted to require a degree of determinative effect on the steps of determining and updating, which Sattler does not explicitly disclose). 
However, Li teaches . . . [determine use of an entity for updating a collaboratively learned model only] . . . in response to [a property of the entity being within a predetermined threshold, and updating the collaboratively learned model in] . . . response . . . [to the determined similarity] . . . (Pg. 1, Col. 1, Para. 2, “Federated learning (FL) comes as a new distributed machine learning (ML) paradigm where multiple clients (e.g., mobile devices) collaboratively train an ML model without revealing their private data”, where entities, “multiple clients (e.g., mobile devices)”, are used to update a collaboratively learned model, “collaboratively train an ML model”, where the determination of whether to use the entity for updating and subsequent updating is in response to determining a property of the entity is within a predetermined threshold, see Pg. 3-4, Col. 1-2, Para. 5-2, “Remove the Malicious Updates . . . each client’s update will incur a reconstruction error. Note that malicious updates result in much larger reconstruction errors than the benign ones. This reconstruction error is the key to detect malicious updates . . . Updates with higher reconstruction errors than the threshold are deemed as malicious and are excluded from the aggregation step. The aggregation process only takes the benign updates into consideration”, where entities with similarity outside of a predetermined threshold, “Updates with higher reconstruction errors than the threshold”, are not determined to be used for or actually used for updating, “deemed as malicious and are excluded from the aggregation step”).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the determining that a first processing node will be used for updating a collaboratively learn model, and subsequently updating the collaboratively learned model, only if a determined similarity is within a predetermined threshold of Sattler with the determining whether to use an entity for updating a collaboratively learned model, and subsequently performing the updating, only  in response to a property of the entity being within a predetermined threshold of Li in order to exclude significantly dissimilar updates, which may indicate the updates are from malicious actors, from the training of the collaborative model (Li, Pg. 4, Col. 1, Para. 2, “Updates with higher reconstruction errors than the threshold are deemed as malicious and are excluded from the aggregation step”; Li, Pg. 3, Col. 1, Para. 3, “Considering that the noise generated by the malicious clients is unknown to the central server, the most effective way of eliminating the malicious impact is to exclude their updates in model aggregation, i.e., setting fa to 0. Accurately removing malicious clients calls for an accurate anomaly detection mechanism, which plays an essential role in achieving robust FL”), which will eliminate the negative impacts of  significantly dissimilar model updates and will contribute to model accuracy (Li, Pg. 6, Col. 2, Para. 3, “We have conducted extensive experiments, and the numerical results show that our method outperforms the existing defense-based methods in terms of model accuracy”; Li, Pg. 2, Col. 1, Para. 2, “by detecting and removing the malicious updates in the central server, their negative impacts can be fully eliminated”).

Regarding Claim 63, Sattler in view of Li teach the apparatus of claim 60, wherein the at least one memory storing the computer program code which, when executed by the at least one processor, further causes the apparatus at least to (Sattler, Pg. 7-8, Col. 2-1, Para. 3-1, “the server, while running CFL, needs to build a parameter tree T = (V, E) . . . At every edge  . . . ∆e are cached”, where the “server” apparatus must have at least one memory storing computer code to store values to “build a parameter tree”, store values so “∆e are cached” and to store program instructions corresponding to pseudocode algorithms, see Sattler, Pg. 6, Algo. 2-3, Pg. 8, Algo. 4-5; and where the “server” apparatus must have a processor to execute program code corresponding to the multiple pseudocode algorithms to run “CFL” and “build a parameter tree”, see Sattler, Pg. 6, Algo. 2-3, Pg. 8, Algo. 4-5, and see generally Sattler, Pg. 5, Col. 2, Para. 1, “Since in Federated Learning it is assumed that the server has far greater computational power than the clients, the overhead of clustering will typically be negligible”): 
access a data representation (Sattler, Pg. 7, Col. 2, Para. 3, “the server, while running CFL, needs to build a parameter tree”, where the “parameter tree” is a data representation, which once built is accessible to the “server”),
associating one or more sub-models associated with the learned model (Sattler, Pg. 8, Col. 1, Para. 2, “Another feature of building a parameter tree is . . . [o]n the path from root to leaf, the models get more specialized with the most general model being the FL model at the root. Depending on application and context, a CFL client could switch between models of different generality. Furthermore a parameter tree allows us to ensemble multiple models of different specificity together”, where any parent model can be considered the learned model, such as the “FL model at the root”, and any child model, such as all of the “specialized . . . models”, can be considered sub-models; see also Sattler, Pg. 7, Fig. 4)
with a respective set of one or more known processing nodes already used to update a particular one of said sub-models (Sattler, Pg. 7, Fig. 4, “At the root node resides the conventional Federated Learning model, obtained by converging to a stationary point θ∗ of the FL objective over all clients {1, .., m}. In the next layer, the client population has been split up”, where the “all clients {1, ..,m}” set and its subsets, such as “C0” or “C1” are processing nodes that are already used to update the “model[s]”), 
wherein, responsive to identifying that the particular first processing node is not currently used to update any one of the sub-models (Sattler, Pg. 8, Col. 1, Para. 1, “When a new client joins the training it can get assigned to a leaf cluster by iteratively traversing the parameter tree from the root to a leaf”, where “new” indicates the processing node was identified as not previously used and the “server” is configured for this functionality, see Sattler, Pg. 7, Col. 2, Para. 2-3, “Clustered Federated Learning however is flexible enough to handle client populations that vary over time. In order to incorporate this functionality, the server”), 
identifying a known processing node of the representation having the most similar dataset properties to that of the first processing node (Sattler, Pg. 8, Col. 1, Para. 1, “When a new client joins the training it can get assigned to a leaf cluster by iteratively traversing the parameter tree from the root to a leaf, always moving to the branch which contains the more similar client updates according to Algorithm 4”; Sattler, Pg. 7, Fig. 4, “This way the new client can be moved down the tree along the path of highest similarity”, where the endpoint of “the path of highest similarity” will be the node with the most similar dataset, see Sattler, Pg. 3, Col. 2, Para. 1, “we can distinguish clients based on their hidden data generating distribution by inspecting the cosine similarity between their gradient updates”; Sattler, Pg. 11, Col. 1, Para. 2, “the cosine similarity between the weight-updates of different clients is highly indicative of the similarity of their data distributions”), and
determining that the first processing node is subsequently to be used for updating the particular sub-model updated by said most-similar known processing node (Sattler, Pg. 8, Col. 1, Para. 1, “When a new client joins the training it can get assigned to a leaf cluster by iteratively traversing the parameter tree from the root to a leaf, always moving to the branch which contains the more similar client updates according to Algorithm 4”, where “assigned” demonstrates the “new client” is used for updating the particular sub-model associated with the “leaf cluster” that contained the most similar “client”, see also Sattler, Pg. 7, Fig. 4, “In order to quickly assign new clients to a leaf model, at each edge e of the tree the server caches the pre-split weight-updates ∆e of all clients belonging to the two different sub-branches. This way the new client can be moved down the tree along the path of highest similarity”). 

Regarding Claim 64, Sattler in view of Li teach the apparatus of claim 63, wherein the determined sub-model is subsequently updated using data from the first processing node and all other known processing nodes already used to update the sub-model (Sattler, Pg. 7, Col. 2, Para. 2-3, “Clustered Federated Learning however is flexible enough to handle client populations that vary over time . . . The tree contains a node v ∈ V for every (intermediate) cluster cv computed by CFL. Both cv and the corresponding stationary solution θv* obtained by running the Federated Learning Algorithm 2 on cluster cv are cached at node v”, where each “cluster” can include “new” and old “client population” clients, see Sattler, Pg. 7, Fig. 4, “the client population has been split . . . In order to quickly assign new clients to a leaf model”, and “Algorithm 2” is the process of “federated learning”; see Sattler, Pg. 1, Col. 1-2, Para. 2- 1, “Federated Learning . . . is a distributed training framework, which allows multiple clients . . . to jointly train a single deep learning model . . . all clients upload their computed weight-updates to the server, where they are aggregated by weighted averaging . . . The procedure is summarized in Algorithm 2”, where the sub-models most be updated using the “aggregated” “θv*” of all the similar processing nodes in order to become “more specialized”, see Sattler, Pg. 8, Col. 1, Para. 2, “On the path from root to leaf, the models get more specialized with the most general model being the FL model at the root”),
and the data representation is updated to include the first processing node (Sattler, Pg. 7, Fig. 4, “In order to quickly assign new clients to a leaf model . . . This way the new client can be moved down the tree along the path of highest similarity”, where “new clients” are added to the “tree” data representation by “assign[ing]” and “mov[ing]” the “new clients” down “the path of highest similarity”).

Regarding Claim 65, Sattler in view of Li teach the apparatus of claim 64, wherein the updatable data representation comprises a hierarchical representation of the known processing nodes, including a root node associated with the learned model and one or more descending levels including one or more leaf nodes associated with a respective sub-model (Sattler, Pg. 7, Fig. 4, “An exemplary parameter tree created by Clustered Federated Learning. At the root node resides the conventional Federated Learning model . . . In the next layer, the client population has been split up into two groups, according to their cosine similarities . . . Branching continues recursively until no stationary solution satisfies the splitting criteria. In order to quickly assign new clients to a leaf model”, where the “parameter tree” is the updatable data representation of “client” processing nodes, where the “root node” is directly associated with the learned model if it is “the conventional Federated Learning model” or indirectly associated if it is a “leaf model”, and where the “layer[s]” are descending levels of “leaf” nodes directly associated with a specialized “leaf model” that is a sub-model), 
the one or more leaf nodes being linked to a higher-level node having the most similar dataset properties, wherein identifying the known processing node of the representation having the most similar dataset properties (Sattler, Pg. 7, Fig. 4, “In order to quickly assign new clients to a leaf model . . . the new client can be moved down the tree along the path of highest similarity”, which is similarity of dataset properties, see Sattler, Pg. 11, Col. 1, Para. 2, “the cosine similarity between the weight-updates of different clients is highly indicative of the similarity of their data distributions” and see generally Sattler, Pg. 2, Col. 2, Para. 4, “cosine similarity between the clients’ gradient updates that provably allows us to infer whether two members of the client population have the same data generating distribution”)
is performed only with respect to a set of candidate nodes comprising the root node and the one or more leaf nodes (Sattler, Pg. 8, Col. 1, Para. 1-2, “When a new client joins the training it can get assigned to a leaf cluster by iteratively traversing the parameter tree from the root to a leaf, always moving to the branch which contains the more similar client updates . . . path from root to leaf”, where the “similarity” determinations are based “cache[d] . . . pre-split weight-updates ∆e” at only the root and leaf nodes, as indicated by the arrow splits, see Sattler, Pg. 7, Fig. 4, “at each edge e of the tree the server caches the pre-split weight-updates ∆e of all clients belonging to the two different sub-branches. This way the new client can be moved down the tree along the path of highest similarity”).

Regarding Claim 66, Sattler in view of Li teach the apparatus of claim 63, wherein the data representation is stored at a centralized collaborative server for access by the one or more processing nodes (Sattler, Pg. 1, Abstract, “we present Clustered Federated Learning”, where “federated learning” is defined as a method that requires data, “master model”, to be stored at a collaborative server that is accessed, “download[ed]”, by the processing nodes, “clients”, see Sattler, Pg. 1, Col. 1, Para. 2, “Federated Learning realizes this goal via an iterative three-step protocol where in every communication round t, the clients first synchronize with the server by downloading the latest master model”; Sattler, Pg. 7-8, Col. 2-1, Para. 3-1, “the server, while running CFL, needs to build a parameter tree T . . . at every edge  . . . the pre-split weight updates . . . are cached”, where the “parameter tree” must be stored, at least temporarily, at a server distinct from the processor nodes for “the server” to build it “while running CFL”, and “cache” the “weight updates”; Cf. Sattler, Pg. 6, Col. 2, Para. “In Federated Learning however, due to constraints on both the memory of the client devices and their communication budged”, where “client devices” are defined by their lack of “memory”; see Sattler, Pg. 8, Col. 1, Para. 1, “When a new client joins the training it can get assigned to a leaf cluster by iteratively traversing the parameter tree from the root to a leaf”, where the “client” processing nodes must access the data representation in order to “travers[e] the parameter tree”).

Regarding Claim 68, Sattler in view of Li teach the apparatus of claim 60, wherein the similarity is determined based on a statistical distribution of data in the local datasets (Sattler, Pg. 7, Col. 1, Para. 1, “Our experiments in section VI will demonstrate that computing cosine similarities based on weight-updates in practice surprisingly achieves even better separations than computing cosine similarities based on gradients”, where the “similarities” are determined based on “weight-updates”, which are in turn based on the “statistics of client [data]” as “data generating distribution[s]”, which is within the broadest reasonable interpretation of a statistical distribution, see Sattler, Pg. 2, Col. 1-2, Para. 3-4, “Assume a number of clients are trying to jointly train a language model for next-word prediction on private text messages. In this scenario the statistics of a client’s text messages will likely vary a lot based on demographic factors, interests, etc. For instance, text messages composed by teenagers will typically exhibit different statistics than those composed by elderly people. An insufficiently expressive model will not be able to fit the data of all clients at the same time . . . In the next section (II) we will derive a computationally efficient tool based on the cosine similarity between the clients’ gradient updates that provably allows us to infer whether two members of the client population have the same data generating distribution”).

Regarding Claim 70, Sattler teaches a method (Pg. 1, Abstract, “CFL can be viewed as a post-processing method”; Pg. 2, Col. 2, Para. 4, “we present the Clustered Federated Learning Algorithm . . . our novel method can be implemented without making modifications to the Federated Learning communication protocol . . . (section V-A). We furthermore show that our method can be implemented in a privacy preserving way (section V-B) and is flexible enough to handle client populations that vary over time (section V-C)”) . . . .
The remaining limitations are substantially the same as limitations of Claim 60, therefore it is rejected under the same rationale.

Regarding Claim 73, the additional elements of the dependent claim are substantially the same as limitations of Claim 63, therefore it is rejected under the same rationale.

Regarding Claim 74, the additional elements of the dependent claim are substantially the same as limitations of Claim 64, therefore it is rejected under the same rationale.

Regarding Claim 75, the additional elements of the dependent claim are substantially the same as limitations of Claim 66, therefore it is rejected under the same rationale.

Regarding Claim 77, the additional elements of the dependent claim are substantially the same as limitations of Claim 68, therefore it is rejected under the same rationale.

Regarding Claim 79, Sattler teaches a non- transitory computer-readable medium comprising program instructions stored thereon (Pg. 7-8, Col. 2-1, Para. 3-1, “the server, while running CFL . . . At every edge  . . . ∆e are cached”, where the “server” apparatus must have access to a non-transitory computer-readable medium with program instructions corresponding to the pseudocode algorithms for “CFL”, see Pg. 6, Algo. 2-3, Pg. 8, Algo. 4-5)
for performing the method of (Pg. 1, Abstract, “CFL can be viewed as a post-processing method”; Pg. 2, Col. 2, Para. 4, “we present the Clustered Federated Learning Algorithm . . . our novel method can be implemented without making modifications to the Federated Learning communication protocol . . . (section V-A). We furthermore show that our method can be implemented in a privacy preserving way (section V-B) and is flexible enough to handle client populations that vary over time (section V-C)”) . . . .
The remaining limitations are substantially the same as limitations of Claim 60, therefore it is rejected under the same rationale.

Claims 67 and 76 are rejected under 35 U.S.C. 103 as being unpatentable over Sattler in view of Li and Roy et al. (hereinafter Roy) (“BrainTorrent: A Peer-to-Peer Environment for Decentralized Federated Learning”).

Regarding Claim 67, Sattler in view of Li teach the apparatus of claim 63, wherein the data representation is stored at [a centralized collaborative server] (Sattler, Pg. 7-8, Col. 2-1, Para. 3-1, “the server, while running CFL, needs to build a parameter tree T . . . at every edge  . . . the pre-split weight updates . . . are cached”, where the “parameter tree” must be stored, at least temporarily, at a server distinct from the processor nodes for “the server” to build it “while running CFL”, and “cache” the “weight updates”) 
. . . [and used by] one or more of the processing nodes and . . . [the] other ones of the one or more processing nodes (Sattler, Pg. 8, Col. 1, Para. 1, “When a new client joins the training it can get assigned to a leaf cluster by iteratively traversing the parameter tree from the root to a leaf”, where the “parameter tree” is used by each processing node when it “joins” in order to determine its “assign[ment]”).
	Sattler in view of Li does not explicitly disclose . . . is stored at one or more of the processing nodes and transmitted to the other ones . . . (where data representation storage and the processing nodes are taught separately, but storage at the processing nodes and transmission between the processing nodes is not taught).
	However, Roy teaches . . . [a federated learning method (Pg. 4, Para. 1, “We introduce BrainTorrent, a peer-to-peer FL environment”) where, a data representation is stored] at one or more of the processing nodes (Pg. 5, Algo. 1, Where “Vold” is a data representation containing information of the other “client” processing nodes, as known by “client i”, and “Vnew” is a data representation of the fully up-to-date information of the other “client” processing nodes, collectively stored at the other “client” processing nodes, see “vnew ← ping_request(Ci → C)”, for use in determining whether the client should be used to update the model, see Pg. 4, Para. 1, “Unlike FLS, along with the model, each client maintains a vector v ∈ NN containing its own version and the last versions of models it used during merging . . . All clients Cj with updates Vjold < vjnew. . . send their weights Wj and the training sample size aj to Ci”)
and transmitted to other ones of the one or more processing nodes (Pg. 4, Para. 1, “A random client Ci from the environment initiates the training process. It sends out a ‘ping request’ to the rest of the clients to get their latest model versions”, where, for each “randomly selected” processing nodes, the collectively stored data representation, “Vnew”, is transmitted) . . . .
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the storage of a processing node data representation at a centralized collaborative server for use by processing nodes for federated learning of Sattler in view of Li with the storage and transmission of a data representation between processing nodes, where the data representation is used to determine whether a processing node should be used in federated learning to update a model, of Roy in order to allow access and communication of the data representation without full reliance on the collaborative server, which would reduce disruption of federated learning processes in the event of server failure (Roy, Pg. 1, Abstract, “A disadvantage of FL is the dependence on a central server, which requires all clients to agree on one trusted central body, and whose failure would disrupt the training process of all clients”).

Regarding Claim 76, the additional elements of the dependent claim are substantially the same as limitations of Claim 67, therefore it is rejected under the same rationale.

Claims 69 and 78 are rejected under 35 U.S.C. 103 as being unpatentable over Sattler in view of Li and Nishio et al. (hereinafter Nishio) (“Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge”).

Regarding Claim 69, Sattler in view of Li teach the apparatus of any of claim 60, wherein performing said determinations . . . [are based on] one or more properties of its local dataset (Sattler, Pg. 8, Col. 1, Para. 1-2, “An exemplary parameter tree is shown in Figure 4. When a new client joins the training it can get assigned to a leaf cluster by iteratively traversing the parameter tree from the root to a leaf, always moving to the branch which contains the more similar client updates according to Algorithm 4 . . . On the path from root to leaf, the models get more specialized”, where the “assign[ment]” determinations are based on the ”updates”, which are based on the local dataset, see generally Sattler, Pg. 1, Col. 2, Para. 1, “mini-batches sampled from it’s local data Di, resulting in a weight-update vector”)
Sattler in view of Li does not explicitly disclose . . . responsive to a request received from the first processing node, the request including the . . . .
	However, Nishio teaches . . . responsive to a request received from the first processing node, the request including the [one or more properties of its local dataset] (Pg. 3, Col. 1-2, Para. 6-1, “we propose the following two-step client selection scheme. First, the new Resource Request step asks random clients to inform the MEC operator of their resource information such as . . . the size of data resources relevant to the current training task (e.g., if the server is going to train a ‘dog-vs-cat’ classifier, the number of images containing dogs or cats”, where the “resource information” relating to “size of data” is within the broadest reasonable interpretation of local dataset properties). 
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the use of local dataset properties by an apparatus to make determinations for combination model learning of Sattler in view of Li, with the receiving of a request, including properties of a local dataset, from a client to a server of Nishio in order to allow the server to manage clients based on their provided dataset properties, which will accelerate performance in instances where specific client datasets are required (Nishio, Pg. 1, Abstract, “The FL protocol . . . can become inefficient when some clients are with limited computational resources (i.e., requiring longer update time) or under poor wireless channel conditions (longer upload time). Our new FL protocol, which we refer to as FedCS, mitigates this problem and performs FL efficiently while actively managing clients based on their resource conditions. Specifically, FedCS solves a client selection problem with resource constraints, which allows the server to aggregate as many client updates as possible and to accelerate performance improvement in ML models”).

Regarding Claim 78, the additional elements of the dependent claim are substantially the same as limitations of Claim 69, therefore it is rejected under the same rationale.

Response to Arguments

Applicant's arguments filed on November 21, 2025 have been fully considered. Each argument is addressed in detail below.

I.	Applicant indicates the objections to the specification should be withdrawn (Applicant’s Remarks, 11/21/2025, Pg. 1, Section “Amendments to the Specification”).

Applicant’s amendments have overcome each and every objection to the specification previously set forth in the August 22, 2025 Office Action. As a result, these objections to the specification have been withdrawn.

II.	Applicant argues the objections to the claims should be withdrawn (Applicant’s Remarks, 11/21/2025, Pg. 10, Section “Claim Objection”).

	Applicant’s amendments have overcome some, but not all, of the objections to the claims previously set forth in the August 22, 2025 Office Action. Specifically, while Claim 64 was amended to correct the minor informality of recitation of “the data representation updated” instead of “the data representation is updated”, no such amendments were submitted to correct the issue in regard to claim 74. As a result, the objection to claim 74 is maintained.
	Whereas Applicant’s amendments have overcome every other objection to the claims previously set forth in the August 22, 2025 Office Action. As a result, these objections to the claims are withdrawn.
	However, as discussed in detail above, Applicant’s amendments create additional minor informalities, which require new grounds for objection to the claims. 

III.	Applicant argues the rejections of the claims, under 35 USC § 112, should be withdrawn (Applicant’s Remarks, 11/21/2025, Pg. 10, Section “Claim Rejections - 35 USC § 112”).

Applicant’s amendments have overcome each and every rejection to the claims, under 35 USC § 112, previously set forth in the August 22, 2025 Office Action. As a result, these rejections to the claims have been withdrawn.
However, as discussed in detail above, Applicant’s amendments create additional indefiniteness, which require new grounds for rejection under 35 USC § 112. 

IV.	Applicant argues the rejections of the claims, under 35 USC § 101, should be withdrawn (Applicant’s Remarks, 11/21/2025, Pg. 11-15, Sections “Claim Rejections - 35 USC § 101”).

First, Applicant argues the claims cannot reasonably be considered mental processes (Step 2A, Prong 1). Specifically, Applicant emphasizes the recitations of a collaboratively learned model and various processing nodes demonstrates that at least the amended claims cannot be performed mentally. 
According to MPEP 2106.04(a), “A Claim That Requires a Computer May Still Recite a Mental Process . . . examiners should review the specification to determine if the claimed invention is described as a concept that is performed in the human mind and applicant is merely claiming that concept performed 1) on a generic computer, or 2) in a computer environment, or 3) is merely using a computer as a tool to perform the concept. In these situations, the claim is considered to recite a mental process”.
	Here, the recitations of a collaboratively learned model and various processing nodes require a computer environment or a computer. However, these recitations amount to merely describing generic computer components and environments that are used as a tool to perform the mental processes related to the steps of determining. Therefore, the claims still recite mental processes. 
As a result, the argument is not persuasive.

	Next, Applicant argues the claims are integrated into a practical application (Step 2A, Prong 2). Specifically, Applicant argues the amended claims take into account similarities between datasets when determining a particular processing node for use in collaborative model learning, which imposes meaningful limits on the claim that goes beyond generally linking the use of the judicial exception to a particular technological environment and constitute an improvement to a technological field. 
	In support of this position, Applicant cites excerpts from MPEP 2106.04(d)(1), which details approaches and relevant factors for analysis of integration into a practical application, including excerpts relevant for the assessment of a technological improvement. Relevant excerpts from MPEP 2106.04(d)(1) are reproduced below to facilitate discussion. Additionally, Applicant cites passages from the specification and limitations recited in the claims in order to argue the claims and the specification set forth an improvement to the technology utilized to increase convergence during training and to create a more accurate learned model.
	According to MPEP 2106.04(d)(1), “A claim reciting a judicial exception is not directed to the judicial exception if it also recites additional elements demonstrating that the claim as a whole integrates the exception into a practical application. One way to demonstrate such integration is when the claimed invention improves the functioning of a computer or improves another technology or technical field . . . The specification need not explicitly set forth the improvement, but it must describe the invention such that the improvement would be apparent to one of ordinary skill in the art. Conversely, if the specification explicitly sets forth an improvement but in a conclusory manner (i.e., a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art), the examiner should not determine the claim improves technology. Second, if the specification sets forth an improvement in technology, the claim must be evaluated to ensure that the claim itself reflects the disclosed improvement”. 
According to MPEP 2106.05(f), “Another consideration when determining whether a claim integrates a judicial exception into a practical application in Step 2A Prong Two . . . [is that] A claim having broad applicability across many fields of endeavor may not provide meaningful limitations that integrate a judicial exception into a practical application”.
Here, the Specification sets forth asserted improvements in a conclusory manner, without the detail necessary to be apparent to a person of ordinary skill in the art. For example, Applicant asserts collaborative model learning can alleviate storage concerns on local devices, without providing the details necessary to ascertain how this is carried out in application. Additionally, it is asserted that various distance metrics can be used to determine similarity between datasets, which will lead to better convergence during training and more accurate models, which can be achieved while preserving privacy. However, no additional details are provided for how this improvement is carried out or achieved in application.  Therefore, the additional elements do not constitute an improvement to a technological field.
Instead, the claims recite generic processing nodes and a collaboratively learned model. Processing nodes have broad applicability across many fields of computing endeavor. Additionally, collaboratively learned models have broad applicability across many research areas within the field of machine learning. Furthermore, processes to assess similarity of datasets and to reduce storage requirements at local devices also have broad applicability across many fields of endeavor. Therefore, the additional elements do not integrate the judicial exception into a practical application.
As a result, the argument is not persuasive.

Finally, Applicant argues the claims include elements that are significantly more than the judicial exception (Step 2B). Specifically, Applicant reasserts the claims amount to a technological improvement for the reasons outlined above.
According to MPEP 2106.05(I), “an "inventive concept" is furnished by an element or combination of elements that is recited in the claim in addition to (beyond) the judicial exception . . . Evaluating additional elements to determine whether they amount to an inventive concept requires considering them both individually and in combination to ensure that they amount to significantly more than the judicial exception itself . . . A. Relevant Considerations For Evaluating Whether Additional Elements Amount To An Inventive Concept . . . Limitations that the courts have found to qualify as "significantly more" when recited in a claim with a judicial exception include: i. Improvements to the functioning of a computer”. 
Whereas according to MPEP 2106.05(f), “Another consideration when determining whether a claim integrates a judicial exception into a practical application in Step 2A Prong Two or recites significantly more than a judicial exception in Step 2B is whether the additional elements amount to more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer . . . A claim having broad applicability across many fields of endeavor may not provide meaningful limitations that integrate a judicial exception into a practical application or amount to significantly more”.
Here, for the reasons outlined above, the claims do not amount to a technological improvement. Specifically, the additional elements and their purported benefits have broad applicability across many fields of endeavor and the additional elements recite generic computer components that do not amount to a technological improvement.
As a result, the argument is not persuasive.

V.	Applicant argues the rejections of the claims, under 35 USC § 102 and 35 USC § 103, should be withdrawn (Applicant’s Remarks, 11/21/2025, Pg. 16-19, Sections “Claim Rejections - 35 USC §§ 102 and 103” and “Patentability of the Dependent Claims”).

In response to Applicant’s amendments, the previously communicated rejections under 35 USC § 102 and 35 U.S.C. § 103, have been withdrawn. However, Applicants arguments are not persuasive in light of the new rejection, under 35 U.S.C. § 103, discussed in detail above. The new grounds of rejection add new prior art to the art to the record in order to teach the new combination of elements in the amended independent claims, which were not presented in this arrangement in any of the previously presented claims.
As a result, Applicant arguments against the previously communicated rejections under 35 USC § 102 and 35 U.S.C. § 103, which are premised on the grounds that the previous prior art of record fail to teach every element of the amended claims, are rendered moot. 
However, for clarity of the record and to expedite prosecution, any arguments with continued relevance to the new grounds of rejection are discussed below. Whereas arguments addressing whether the disclosed methods of Sattler teach elements of the amended claims required by the recitation of the limitation “. . . in response to . . . response . . .”, which this Office Action relies on Li to teach, are not relevant to the new grounds of rejection. 

Applicant argues Sattler fails to disclose a step of determining a similarity between dataset properties. Specifically, Applicant argues Sattler discloses a cosine similarity between local updates, which cannot be considered a property of the dataset because the cosine similarity is “determined by starting from the global model and performing stochastic gradient descent using samples from the dataset” (Pg. 17). Based on this assertion, Applicant reasons that Sattler cannot disclose the limitation of “determine a similarity between dataset properties of a particular first processing node and corresponding dataset properties of one or more known processing nodes already used to update the collaboratively learned model”, as recited verbatim or in essentially the same manner in each of the independent claims (see Claims 60, 70, and 79). Furthermore, Applicant argues that determining similarity between dataset properties that are not contained in local gradient updates allows for the similarity determinations to be made “without performing a computationally intensive stochastic gradient descent step, and without obtaining a global model” (Pg. 18).
According to MPEP 2111, “During patent examination, the pending claims must be given their broadest reasonable interpretation consistent with the specification” (internal quotation marks omitted) (see also Phillips v. AWH Corp., 415 F.3d 1303, 1316, 75 USPQ2d 1321, 1329 (Fed. Cir. 2005)). 
Additionally, according to MPEP 2111.01, “Under a broadest reasonable interpretation (BRI), words of the claim must be given their plain meaning, unless such meaning is inconsistent with the specification. The plain meaning of a term means the ordinary and customary meaning given to the term by those of ordinary skill in the art at the relevant time. The ordinary and customary meaning of a term may be evidenced by a variety of sources, including the words of the claims themselves, the specification, drawings, and prior art”.
Furthermore, according to MPEP 2111.01, “II. IT IS IMPROPER TO IMPORT CLAIM LIMITATIONS FROM THE SPECIFICATION Though understanding the claim language may be aided by explanations contained in the written description, it is important not to import into a claim limitations that are not part of the claim” (internal quotation marks omitted) (see also Superguide Corp. v. DirecTV Enterprises, Inc., 358 F.3d 870, 875, 69 USPQ2d 1865, 1868 (Fed. Cir. 2004)).
	Here, the point of disagreement is whether the cosine similarity computed from the local updates is within the broadest reasonable interpretation of similarity between dataset properties. Applicant argues it cannot be because “determined by starting from the global model and performing stochastic gradient descent using samples from the dataset” (Pg. 17). However, even if Applicant’s assessment of Sattler’s methods were assumed to be true, there are no positively recited elements that would preclude values that are determined from starting from a global model and performing stochastic gradient descent, using local dataset samples, from containing local dataset properties. As a result, any limitations requiring such prohibitions are not read into the claims. Instead, the relevant question is whether the local updates contain information that can reasonably be described as local dataset properties, in light of the specification and given the plain meanings of these terms. As discussed in detail above, Sattler’s local updates contain local dataset properties because the “weight-update[s]” are computed based on “local data”, which must therefore include at least one property of local data (Pg. 1, Col. 1-2, Para. 2-1, “by performing multiple iterations of stochastic gradient descent with mini-batches sampled from it’s local data Di, resulting in a weight-update vector”, see also Pg. 2, Col. 2, Para. 4, “we will derive a computationally efficient tool based on the cosine similarity between the clients’ gradient updates that provably allows us to infer whether two members of the client population have the same data generating distribution” and Pg. 7, Col. 1, Para. 1, “Our experiments in section VI will demonstrate that computing cosine similarities based on weight-updates in practice surprisingly achieves even better separations than computing cosine similarities based on gradients”).
	Furthermore, regardless of whether determining similarity using local dataset properties determined through means other than the local updates would lead to positive benefits, such as a reduction of computationally intensive gradient descent steps, the claims do not positively recite any limitations prohibiting this approach. Therefore, these limitations are not read into the claims. 
	As a result, the arguments are not persuasive. 

	Finally, Applicant argues the dependent claims are patentably distinct from the prior art of record because they dependent upon either independent claim 60 or independent claim 70, both of which applicant asserts are patentable. However, as discussed in detail above, the arguments in support of the asserted patentability of claims 60 and 70 are not persuasive.
As a result, the argument is not persuasive.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW BRYCE GOLAN whose telephone number is (571)272-5159. The examiner can normally be reached Monday through Friday, 8:00 AM to 5:00 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached at (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MATTHEW BRYCE GOLAN/Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
COLLABORATIVE MACHINE LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

COLLABORATIVE MACHINE LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email