Last updated: April 19, 2026
Application No. 17/774,201
METHODS AND APPARATUS FOR MACHINE LEARNING MODEL LIFE CYCLE

Non-Final OA §101§103
Filed
May 04, 2022
Examiner
MAIDO, MAGGIE T
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
OA Round
3 (Non-Final)
This examiner grants 64% of cases after interview

— +20.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 36 resolved cases, 2023–2026
Examiner Intelligence

MAIDO, MAGGIE T View full profile →
Grants 64% of resolved cases
Career Allow Rate
23 granted / 36 resolved
+8.9% vs TC avg
Strong +21% interview lift
Without
With
+20.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 3m
Avg Prosecution
51 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
25.6%
-14.4% vs TC avg
§103
56.1%
+16.1% vs TC avg
§102
2.6%
-37.4% vs TC avg
§112
15.3%
-24.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 36 resolved cases
Office Action

§101 §103
DETAILED ACTION

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR
1.17(e), was filed in this application after final rejection. Since this application is eligible for continued
examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the
finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's
submission filed on 22 January 2026 has been entered.

Response to Amendment
The amendment filed on 22 December 2025 has been entered.
Claims 1-9, 11-12, 47-54, 56 are pending.
Claims 1, 12, 47, 56 are amended.

 Response to Arguments
Applicant’s remarks, regarding the rejections of claims under 35 USC 101, have been fully considered.

Applicant submits that the claims are directed to specific improvements in distributed machine learning systems, including federated learning, and to techniques that control network and compute resource utilization in such systems. 
Applicant submits Claims 1 and 12 limitations are not "mental processes," because they are performed in the context of a federated learning system having multiple worker entities and require actions that determine and implement a system configuration that governs data transmission and computation across the system (e.g., selecting between centralized and federated training topologies). The claimed method therefore effects a practical application that improves the operation of the distributed computing system, particularly with respect to bandwidth usage and resource constraints.
Examiner respectfully disagrees. As outlined in the Final Office Action mailed 27 October 2025 and below, Claim 1 and similarly Claim 12, recites a judicial exception (abstract idea) under Step 2A Prong One. Further, in consideration and analysis of Claim 1 and similarly Claim 12, as a whole, to determine whether the claim integrates the recited judicial exception into a practical application under Step 2A Prong Two, Examiner submits the additional limitations are directed to mere data gathering and mere instructions indicating a field of use or technological environment in which to apply the judicial exception, as outlined in the Final Office Action mailed 27 October 2025 and below. 
The additional limitations directed to mere data gathering and mere instructions indicating a field of use or technological environment in which to apply the judicial exception, does not provide specific steps and details for improving the operation of the distributed computing system, with respect to bandwidth usage and resource constraints, but merely directs to selecting information, for collection, analysis and display, MPEP 2106.05(g)(II)(iv.), and employing generic computer functions, even when limiting the use of the idea (such as listing of plurality of different topologies) to one particular environment, does not add significantly more because the token addition to the claim does not alter or affect how training is performed, MPEP 2106.05(h), respectively. Further, the judicial exception alone cannot provide the improvement, MPEP 2106.05(a).
Finally, the claims do not include specific details or steps of how to apply the judicial exception in order to implement and improve the operation of the distributed machine learning systems as presented by Applicant, MPEP 2106.04(d)(III).
Examiner submits, that additional claim elements of the claimed invention are considered insufficient to transform a judicial exception to a patentable invention. The limitations of the claimed inventions do not appear to recite steps for a specific solution to a problem in an existing technology area, where the Applicant's Specification has set forth an improvement in technology in a non-conclusory manner. Under Step2B, Claim 1 and similarly Claim 12, as a whole, does not amount to significantly more than the exception itself (there is no inventive concept in the claim), see MPEP § 2106.05(B)(II).

Applicant submits Claims 47 and 56 recite worker-side training round behavior in a federated learning system, including evaluating model performance and making protocol decisions based on a comparison to performance following a previous training round. The claims recite additional elements that amount to significantly more than any abstract idea. In particular, the worker-centric protocol logic of independent Claims 47 and 56 requires a comparison of local performance across rounds and a corresponding control decision governing continued participation and/or transmission of model updates. This is not merely generic post-solution activity, but rather a specific distributed-training protocol that improves the functioning of the federated learning system by controlling contributions based on observed performance changes. These specific limitations, taken as a whole, provide a technical contribution to the field of federated learning that moves beyond well-understood, routine, or conventional activities.
Examiner respectfully disagrees. As outlined in the Final Office Action mailed 27 October 2025 and below, Claim 47 and similarly Claim 56, recites a judicial exception (abstract idea) under Step 2A Prong One. Further, in consideration and analysis of Claim 47 and similarly Claim 56, as a whole, to determine whether the claim integrates the recited judicial exception into a practical application under Step 2A Prong Two, Examiner submits the additional limitations are directed to mere data gathering and mere instructions indicating a field of use or technological environment in which to apply the judicial exception, as outlined in the Final Office Action mailed 27 October 2025 and below. 
The additional limitations directed to mere data gathering and mere instructions indicating a field of use or technological environment in which to apply the judicial exception, does not provide specific steps and details for improving the functioning of the federated learning system by controlling contributions based on observed performance changes, but merely directs to generic receiving and transmitting of model weights, which are well-understood, routine, conventional activity MPEP 2106.05(d)(II)(iv.), and employing generic computer functions, even when limiting the use of the idea (such as training the neural network model in the federal learning entity) to one particular environment, does not add significantly more because the token addition to the claim does not alter or affect how training is performed, MPEP 2106.05(h), respectively. Further, the judicial exception alone cannot provide the improvement, MPEP 2106.05(a).
Finally, the claims do not include specific details or steps of how to apply the judicial exception in order to implement and improve the operation of the distributed machine learning systems as presented by Applicant, MPEP 2106.04(d)(III).
Examiner submits, that additional claim elements of the claimed invention are considered insufficient to transform a judicial exception to a patentable invention. The limitations of the claimed inventions do not appear to recite steps for a specific solution to a problem in an existing technology area, where the Applicant's Specification has set forth an improvement in technology in a non-conclusory manner. Under Step2B, Claim 47 and similarly Claim 56, as a whole, does not amount to significantly more than the exception itself (there is no inventive concept in the claim), see MPEP § 2106.05(B)(II).
The rejections of Claims 1, 12, 47, 56, under 35 USC 101, have been maintained. Rejections of Claims 2-9, 11, 48-54, under 35 USC 101, which depend directly or indirectly from Claims 1, 12, 47, 56, have been maintained.

Applicant’s remarks, regarding the rejections of claims under 35 USC 103, have been fully considered.

Applicant submits Claims 1 and 12 have been amended to recite that the plurality of different topologies comprises at least one centralized training topology and at least one federated training topology. Applicant submits the cited references, alone or in combination, fail to teach or suggest generating candidate topologies that include both of these distinct training topology types, and further fail to teach or suggest selecting between these topology types.
Applicant submits Claims 47 and 56 have been amended to recite that in response to determining that the performance of the model has degraded relative to the previous performance, determining whether to participate in future rounds of training in the federated learning system. Applicant submits McMahan, alone or in combination with Szeto, fails to teach or suggest this autonomous participation control logic.
Applicant’s arguments have been considered, but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 56 is objected to because of the following informalities: “the federated learning system” in line 19 should be “the federated learning entity”. Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-9, 11-12, 47-54, 56 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception, abstract idea, without significantly more.
Step 1: This part of the eligibility analysis evaluates whether the claim(s) falls within any statutory
category. MPEP 2106.03:
According to the first part of the Alice analysis, in the instant case, the claims were determined
to be directed to one of the four statutory categories: an article of manufacture, a method/process (Claims 1, 47), a machine/system/product (Claims 2-9, 11-12, 48-54, 56), and a composition of matter. Based on the claims being determined to be within of the four categories (i.e., process, machine, manufacture, or composition of matter), (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea).
Step 2A Prong One: This part of the eligibility analysis evaluates whether the claim(s) recites a
judicial exception. 
Regarding independent claims 1, 12, 47, 56, the claims recite a judicial exception (i.e., an abstract idea enumerated in the 2019 PEG) without significantly more (Step-2A: Prong One). The applicant's claim limitations under broadest reasonable interpretation covers activities classified under mental processes - concepts performed in the human mind (including an observation, evaluation, judgment, opinion) (see MPEP § 2106.04(a)(2), subsection Ill) and the 2019 PEG. As evaluated below:

Claims 1, 12:
“for respective ones of the plurality of topologies, estimating a cost of operating the federated learning system using the topology” (mental process of evaluation) 
“selecting a subset of the topologies based on the estimated cost” (mental process of judgement) 
“determining whether the topologies in the selected subset of topologies can be instantiated” (mental process of judgement) 
“selecting a topology for implementation based on the determination of whether the topologies in the selected subset of topologies can be instantiated” (mental process of judgement) 
If the identified limitation(s) falls within at least one of the groupings of abstract ideas, it is
reasonable to conclude that the claim(s) recites an abstract idea in Step 2A Prong One.
Step 2A Prong Two: This part of the eligibility analysis evaluates whether the claim(s) as a whole integrates the recited judicial exception into a practical application of the exception. As evaluated below:
“generating a plurality of different topologies for a federated learning system, wherein the federated learning system includes a plurality of worker entities”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
“wherein the plurality of different topologies comprises at least one centralized training topology and at least one federated training topology”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea when considered as an ordered combination and as a whole.
Step 2B: This part of the eligibility analysis evaluates whether the claim, as a whole, amounts to
significantly more than the recited exception, i.e., whether any additional element, or combination of
additional elements, adds an inventive concept to the claim. MPEP 2106.05.
First, the additional elements considered as part of the preamble and the additional elements
directed to the use of computer technology are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because they generally link the judicial exception to
the technology environment, see MPEP 2106.05(h).
Second, the additional elements directed to mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Third, the claims are directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception. The courts have found these types of limitations insufficient to transform the judicial exception to a patentable invention, see MPEP 2106.05(g).
Lastly, the claims directed to data gathering activity as noted above, are deemed directed to an insignificant extra-solution activity. The courts have found these types of limitations insufficient to
qualify as "significantly more", see MPEP 2106.05(g).
Furthermore, when considering evidence in view of Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018), see USPTO Berkheimer Memorandum (April 2018). Examiner notes Berkheimer: Option 2 - A citation to one or more of the court decisions discussed in MPEP § 2106.05(d}(II} as noting the well understood, routine, conventional nature of the additional element (s) (e.g., limitations directed to mere data gathering):
The courts have recognized the following computer functions as well understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity, see MPEP 2106.05(d).
The additional limitations, as analyzed, failed to integrate a judicial exception into a practical application at Step 2A and provide an inventive concept in Step 2B, per the analysis above. Thus, considering the additional elements individually and in combination and the claims as a whole, the additional elements do not provide significantly more than the abstract idea. This claim is not patent eligible. Therefore, in examining elements as recited by the limitations individually and as an ordered combination, as a whole, claims 1, 12 do not recite what the courts have identified as "significantly more".

Claims 47, 56:
“evaluating performance of the model” (mental process of evaluation) 
“determining whether the performance of the model has improved or degraded relative to a previous performance of the model following a previous training round of training neural networks in the federated learning entity” (mental process of judgement) 
“in response to determining that the performance of the model has degraded relative to the previous performance, determining whether to participate in future rounds of training in the federated learning system” (mental process of judgement)
If the identified limitation(s) falls within at least one of the groupings of abstract ideas, it is
reasonable to conclude that the claim(s) recites an abstract idea in Step 2A Prong One.
Step 2A Prong Two: This part of the eligibility analysis evaluates whether the claim(s) as a whole integrates the recited judicial exception into a practical application of the exception. As evaluated below:
“in response to determining that the performance of the model has improved relative to the previous performance of the model following the previous training rounds, transmitting the modified model weights to the master entity for federation by the master entity and selecting the modified model weights for use in operating the neural network”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
“receiving a set of model weights from a master entity”
“training a neural network model using the set of model weights to obtain a set of modified model weights”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea when considered as an ordered combination and as a whole.
Step 2B: This part of the eligibility analysis evaluates whether the claim, as a whole, amounts to
significantly more than the recited exception, i.e., whether any additional element, or combination of
additional elements, adds an inventive concept to the claim. MPEP 2106.05.
First, the additional elements considered as part of the preamble and the additional elements
directed to the use of computer technology are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because they generally link the judicial exception to
the technology environment, see MPEP 2106.05(h).
Second, the additional elements directed to mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Third, the claims are directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception. The courts have found these types of limitations insufficient to transform the judicial exception to a patentable invention, see MPEP 2106.05(g).
Lastly, the claims directed to data gathering activity as noted above, are deemed directed to an insignificant extra-solution activity. The courts have found these types of limitations insufficient to
qualify as "significantly more", see MPEP 2106.05(g).
Furthermore, when considering evidence in view of Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018), see USPTO Berkheimer Memorandum (April 2018). Examiner notes Berkheimer: Option 2 - A citation to one or more of the court decisions discussed in MPEP § 2106.05(d}(II} as noting the well understood, routine, conventional nature of the additional element (s) (e.g., limitations directed to mere data gathering):
The courts have recognized the following computer functions as well understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity, see MPEP 2106.05(d).
The additional limitations, as analyzed, failed to integrate a judicial exception into a practical application at Step 2A and provide an inventive concept in Step 2B, per the analysis above. Thus, considering the additional elements individually and in combination and the claims as a whole, the additional elements do not provide significantly more than the abstract idea. This claim is not patent eligible. Therefore, in examining elements as recited by the limitations individually and as an ordered combination, as a whole, claims 47, 56 do not recite what the courts have identified as "significantly more".

Furthermore, regarding dependent claims 2-9, 11 which depend from claim 12, claims 48-54 which depend from claim 56, the claims are directed to a judicial exception (i.e., an abstract idea enumerated in the 2019 PEG, a law of nature, or a natural phenomenon) without significantly more as highlighted below in the claim limitations by evaluating the claim limitations under the Step2A and 2B:

Claim 2:
Incorporates the rejection of claim 12.
“estimating the cost of operating the federated learning system using a selected topology comprises estimating a cost of data transfer associated with the selected topology” (mental process of evaluation)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 3:
Incorporates the rejection of claim 12.
“estimating the cost of operating the federated learning system using a selected topology comprises estimating a computer performance metric associated with the selected topology” (mental process of evaluation)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 4:
Incorporates the rejection of claim 3.
“estimating the computer performance metric comprises estimating, for a layer in a neural network, a number of floating point operations per second, FLOPS, needed based on a scalar product operation on the layer” (mental process of evaluation)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 5:
Incorporates the rejection of claim 4.
“estimating the number of FLOPS associated with the selected topology comprises estimating the number of FLOPS according to the following formula for a layer in the neural network: 2*m*N(l+1)*[Nl+1] FLOPS per layer where m is a number of samples in a data set processed by the neural network and N(1) is a layer size of the neural network” (mental process of evaluation using abstract idea of mathematical function)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 6:
Incorporates the rejection of claim 12.
“estimating an availability of resources needed to implement the network footprint associated with the topology” (mental process of evaluation)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
	“for selected topologies, generating a network footprint associated with the topology”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool or directed to instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 7:
Incorporates the rejection of claim 3.
“estimating an availability of resources needed to obtain the computer performance metric associated with the selected topology” (mental process of evaluation)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 8:
Incorporates the rejection of claim 12.
“wherein the topology comprises one of a centralized training topology, an isolated training topology, a federated training topology, a round robin topology and a hierarchical topology”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 9:
Incorporates the rejection of claim 12.
“determining that the selected topology is no longer sustainable” (mental process of judgement) 
“selecting a different topology for instantiation” (mental process of judgement) 
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
“instantiating the different topology; wherein the different topology is selected based on a cost of the different topology and an ability of the different topology to be instantiated”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions to implement an abstract idea on a computer or mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 11:
Incorporates the rejection of claim 12.
“determining that the selected topology is no longer sustainable comprises determining that the selected topology is no longer sustainable based on cost and/or available resources” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 48:
Incorporates the rejection of claim 56.
“evaluating the performance of the model comprises evaluating the performance of the model against local validation data that is local to a worker entity” (mental process of evaluation)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 49:
Incorporates the rejection of claim 56.
“in response to determining that the performance of the model has degraded relative to the previous performance of the model following the previous training round, determining whether a level of performance degradation is less than a threshold” (mental process of judgement)
“selecting the modified model weights for use in operating the neural network” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
“in response to determining that the level of performance degradation is less than the threshold, transmitting the modified model weights to the master entity for federation by the master entity”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool or directed to instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 50:
Incorporates the rejection of claim 56.
“in response to determining that the level of performance degradation is greater than a threshold” (mental process of judgement)
“selecting the previous set of model weights for use in operating the neural network” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
“transmitting a previous set of model weights generated in connection with the previous training round to the master entity for federation by the master entity”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool or directed to instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 51:
Incorporates the rejection of claim 56.
“in response determining that the level of performance has degraded relative to the previous performance of the model following the previous training round, determining whether or not to participate in future rounds of training neural networks in the federated learning system” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 52:
Incorporates the rejection of claim 56.
“evaluating the performance of the neural network using the final model weights” (mental process of evaluation) 
“evaluating the performance of the neural network using the modified final model weights” (mental process of evaluation)
“comparing performance of the neural network using the final model weights to performance of the neural network using the modified final neural network weights” (mental process of evaluation)
“selecting a set of neural network weights from among the final neural network weights and the modified final neural network weights in response to the comparison” (mental process of judgement) 
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
“receiving a set of final model weights from the master entity”
“re-training the neural network starting with the final model weights to obtain modified final model weights”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions to implement an abstract idea on a computer or mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 53:
Incorporates the rejection of claim 56.
“comparing performance of the neural network using final model weights to performance of the neural network using modified final neural network weights and intermediate neural network weights generated during a federated learning round” (mental process of evaluation)
“selecting a set of neural network weights from among the final neural network weights, the modified final neural network weights and the intermediate neural network weights in response to the comparison” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f). 
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim 54:
Incorporates the rejection of claim 56.
“combining final neural network weights with modified final neural network weights and/or one or more sets of intermediate neural network weights to obtain a combined set of neural network weights”
“operating the neural network using the combined set of neural network weights”
“wherein combining first neural network weights with the modified final neural network weights and/or one or more sets of intermediate neural network weights comprises generating a weighted average of the first neural network weights and the modified final neural network weights and/or one or more sets of intermediate neural network weights”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

The dependent claims as analyzed above, do not recite limitations that integrated the judicial exception into a practical application. In addition, the claim limitations do not include additional elements that are sufficient to amount to significantly more than the judicial exception (Step-2B). Therefore, the claims do not recite any limitations, when considered individually or as a whole, that recite what have the courts have identified as "significantly more", see MPEP 2106.05; and therefore, as a whole the claims are not patent eligible. As shown above, the dependent claims do not provide any additional elements that when considered individually or as an ordered combination, amount to significantly more than the abstract idea identified. Therefore, as a whole, the dependent claims do not recite what have the courts have identified as "significantly more" than the recited judicial exception. Therefore, claims 2-9, 11, 48-54 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception and does not recite, when claim elements are examined individually and as a whole, elements that the courts have identified as "significantly more" than the recited judicial exception.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-2, 9, 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (U.S. Pre-Grant Publication No. 20190312772, hereinafter ‘Zhao'), in view of Zeng et al. (NPL: "Resource Management at the Network Edge: A Deep Reinforcement Learning Approach", hereinafter 'Zeng') and He et al. (NPL: "Central Server Free Federated Learning over Single-sided Trust Social Networks", hereinafter 'He'). 

Regarding claim 1, Zhao teaches A method of generating a federated learning topology including a plurality of worker entities for training a neural network, the method comprising ([0085] In this regard, the system memory 710 resources, local storage resources 730, and other memory or storage media as described herein, which have program code and data tangibly embodied thereon, are examples of what is more generally referred to herein as “processor-readable storage media” that store executable program code of one or more software programs. Articles of manufacture comprising such processor-readable storage media are considered embodiments of the invention. An article of manufacture may comprise, for example, a storage device such as a storage disk, a storage array or an integrated circuit containing memory. The term “article of manufacture” as used herein should be understood to exclude transitory, propagating signals.; [0086] The processors 702 may comprise one or more processors that are configured to process program instructions and data to execute a native operating system (OS) and applications that run on the GPU server node 700. For example, the processors 702 may comprise one or more central processing units (CPUs), a microprocessor, a microcontroller, an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), and other types of processors, as well as portions or combinations of such processors.): 
generating a plurality of different topologies for a federated learning system, wherein the federated learning system includes a plurality of worker entities ([0014] Illustrative embodiments of the invention will now be explained in further detail with regard to systems and methods to provide generating a plurality of different topologies for a federated learning system topology-aware provisioning of computing resources (e.g., hardware accelerator resources such as GPU device) in a wherein the federated learning system distributed heterogeneous computing environment. As explained in further detail below, systems and methods for dynamically scheduling and provisioning computing resources in a heterogeneous server cluster are configured to maintain information regarding the hardware connection topology of server nodes within a heterogeneous cluster, as well as current bandwidth usage information regarding intra-node and inter-node communication includes a plurality of worker entities links of the server nodes, and utilize such information to provision computing devices (e.g., GPUs) in a way that optimizes communication bus and networking resources (mitigates or eliminates waste of network resources), and which optimally utilizes bidirectional connection topologies, in a balanced manner, to mitigate communication bottlenecks between computing resources.),
Zhao fails to teach wherein the plurality of different topologies comprises at least one centralized training topology and at least one federated training topology; for respective ones of the plurality of topologies, estimating a cost of operating the federated learning system using the topology; selecting a subset of the topologies based on the estimated cost; determining whether the topologies in the selected subset of topologies can be instantiated; and selecting a topology for implementation based on the determination of whether the topologies in the selected subset of topologies can be instantiated.
Zeng teaches for respective ones of the plurality of topologies, estimating a cost of operating the federated learning system using the topology ([RL-Based Algorithm Design, pg. 30] Hence, the state space S consists of all the possible combinations of v and (u1, u2, u3, …, ui , …). A: Suppose that the VM is located in vt = BSi at time t, and it can migrate to vt+1 = BSj . We can simply regard the action at as “move to BSj ,” denoted by at ! BSj, Hence, we have A = {BSk|k = 1, 2, 3, …} as the VM placement candidate set. r: The agent, after taking action at on state st at time t, shall receive a reward as r(st , at ) = 1/(W(st , at ) + M(st , at )), where W(st , at ) and M(st , at ) are the for respective ones of the plurality of topologies, estimating a cost of operating the federated learning system using the topology communication cost for the data transmission and VM migration during time slot t, respectively. Obviously, maximizing the reward is equivalent to minimizing the overall cost.); 
selecting a subset of the topologies based on the estimated cost ([The Inference Phase, pg. 31-32] Therefore, the VM shall always be located in the base station that can minimize the data transfer cost from the MCS data sources to the VM. Although our algorithm performs the same as the two other competitors, it proves that our algorithm indeed observes this phenomenon and can automatically make the right decision at runtime. This is further verified in the probabilistic movement case, in which we notice that the VM location is frequently changed at runtime. Due to the uneven distribution of users in the network, the VM shall always find the right location to selecting a subset of the topologies based on the estimated cost minimize both the VM migration cost and the data transferring cost. In this case, the decision shall not only be made on current state but shall look ahead to take future possible states into consideration. The definition of the reward in our DQN-based algorithm already considers this issue and therefore can compromise between the VM migration cost and the data transfer cost. Sometimes even the immediate cost in a time slot is larger than the competitors’, so choosing the right location saves VM migration cost in the future. As a result, in the long run, our framework can finally outperform the competitors.); 
determining whether the topologies in the selected subset of topologies can be instantiated; and selecting a topology for implementation based on the determination of whether the topologies in the selected subset of topologies can be instantiated ([Framework Overview, pg. 28] The action executor on each controllable edge computing element (e.g., edge server, base-station, user equipment, router, and so on) can communicate with the RL-based controller to determining whether the topologies in the selected subset of topologies can be instantiated obtain the control decisions, and accordingly execute the derived action. Once the action is taken, the executor selecting a topology for implementation based on the determination of whether the topologies in the selected subset of topologies can be instantiated calculates the reward obtained on each network node and reports it back to the RL-based controller to update the control agent so as to make it more intelligent and efficient.).
Zhao and Zeng are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Zhao, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Zeng to Zhao before the effective filing date of the claimed invention in order to efficiently manage the resources at the network edge (cf. Zeng, [Abstract, pg. 26] We think it is desirable to introduce a model-free approach that can fit the network dynamics well without any prior knowledge. To this end, we introduce a model-free DRL approach to efficiently manage the resources at the network edge. Following the design principle of DRL, we design and implement a mobility-aware data processing service migration management agent. The experiments show that our agent can automatically learn the user mobility pattern and accordingly control the service migration among the edge servers to minimize the operational cost at runtime. Some potential future research challenges are also presented.).
He teaches wherein the plurality of different topologies comprises at least one centralized training topology and at least one federated training topology

    PNG
    media_image1.png
    235
    645
    media_image1.png
    Greyscale

([1 Introduction] Federated learning has been well recognized as a framework able to protect data privacy (Konecny et al., 2016; Smith et al., 2017a; Yang et al., 2019). State-of-the-art federated learning adopts the centralized network architecture where a centralized node collects the gradients sent from child agents to update the global model.; To further protect the data privacy and avoid the communication bottleneck, the decentralized architecture has been recently proposed (Vanhaesebrouck et al., 2017; Bellet et al., 2018), where the centralized node has been removed, and each node only communicates with its neighbors (with mutual trust) by exchanging their local models.; As shown in Fig. 1 above, He teachs a plurality of different topologies comprises at least one centralized training topology Fig. 1 (a) and at least one federated training topology Fig. 1 (b) and (c));
Zhao, Zeng, and He are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Zhao and Zeng, it would have been obvious for a person of ordinary skill in the art to apply the teachings of He to Zhao before the effective filing date of the claimed invention in order to benefit from communication with trusted users in the federated learning scenario, building upon the fundamental algorithm framework and theoretical guarantees for federated learning in the generic social network scenario (cf. He, [Abstract] Federated learning has become increasingly important for modern machine learning, especially for data privacy-sensitive scenarios. Existing federated learning mostly adopts the central server-based architecture or centralized architecture. However, in many social network scenarios, centralized federated learning is not applicable (e.g., a central agent or server connecting all users may not exist, or the communication cost to the central server is not affordable). In this paper, we consider a generic setting: 1) the central server may not exist, and 2) the social network is unidirectional or of single-sided trust (i.e., user A trusts user B but user B may not trust user A). We propose a central server free federated learning algorithm, named On line Push-Sum (OPS) method, to handle this challenging but generic scenario. A rigorous regret analysis is also provided, which shows very interesting results on how users can benefit from communication with trusted users in the federated learning scenario. This work builds upon the fundamental algorithm framework and theoretical guarantees for federated learning in the generic social network scenario.).

Regarding claim 2, Zhao, as modified by Zeng and He, teaches The computing device of claim 12.
Zeng teaches wherein estimating the cost of operating the federated learning system using a selected topology comprises estimating a cost of data transfer associated with the selected topology ([RL-Based Algorithm Design, pg. 30] Hence, the state space S consists of all the possible combinations of v and (u1, u2, u3, …, ui , …). A: Suppose that the VM is located in vt = BSi at time t, and it can migrate to vt+1 = BSj . We can simply regard the action at as “move to BSj ,” denoted by at ! BSj, Hence, we have A = {BSk|k = 1, 2, 3, …} as the VM placement candidate set. r: The agent, after taking action at on state st at time t, shall receive a reward as r(st , at ) = 1/(W(st , at ) + M(st , at )), where W(st , at ) and M(st , at ) are the communication estimating a cost of data transfer associated with the selected topology cost for the data transmission and VM migration during time slot t, respectively. Obviously, maximizing the reward is equivalent to minimizing the overall cost.).
Zhao, Zeng, and He are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 9, Zhao, as modified by Zeng and He, teaches The computing device of claim 12.
Zhao teaches the operations further comprising: determining that the selected topology is no longer sustainable; selecting a different topology for instantiation; and instantiating the different topology; wherein the different topology is selected based on a cost of the different topology and an ability of the different topology to be instantiated ([0029] As explained in further detail below, the computing determining that the selected topology is no longer sustainable resource scheduling and provisioning module 142 will access the information (connection topology and performance metrics) within the topology database during a provisioning operation, to dynamically identify and allocate a set of accelerator devices (e.g., GPU devices) which can be provisioned for a given job.; [0021] In one embodiment, the computing resource scheduling and provisioning module 142 implements methods to selecting a different topology for instantiation; and instantiating the different topology perform a topology-aware resource provisioning process (e.g., FIG. 6) which dynamically schedules and provisions hardware accelerator resources (e.g., GPU resources) for pending jobs over one or more of the GPU server nodes 160-1, 160-2, . . . , 160-n in the GPU server cluster 160 to execute HPC workloads associated with service requests received from the client systems 110. The computing resource scheduling and provisioning module 142 will allocate either a single GPU server node or multiple GPU server nodes within the cluster of GPU server nodes 160 to handle a given service request wherein the different topology is selected based on a cost of the different topology and an ability of the different topology to be instantiated depending on, e.g., the available GPU devices and processing resources of the GPU server nodes, the nature of the GPU processing tasks associated with the service request, and other factors as discussed below.).
Zhao, Zeng, and He are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 11, Zhao, as modified by Zeng and He, teaches The computing device of claim 12.
Zhao teaches wherein determining that the selected topology is no longer sustainable comprises determining that the selected topology is no longer sustainable based on cost and/or available resources ([0015] FIG. 1 is a high-level schematic illustration of a system 100 which comprises a computing service platform that is configured to provide topology-aware provisioning of computing resources in a distributed heterogeneous computing environment, according to an embodiment of the invention.; [0031] The resource usage database 148 maintains information regarding current bus networking usage in terms of bandwidth (MB/s). The bandwidth usage of communication links between provisioned accelerator devices executing a pending job can be continually measured/tracked and periodically reported by the reporting agents 162 (e.g., every 5 second). The computing resource scheduling and provisioning module 142 is configured to consider the current status of bus/networking connection usage (bandwidth) to fully utilize bidirectional bus/networking between provisioned devices.; [0061] To address these issues in a shared, heterogeneous computing environment, systems and methods according to embodiments of the invention are provided to intelligently and dynamically provision accelerator devices (e.g., GPU device) in a way that optimizes resource usage. The term “dynamically” as used herein refers to determining that the selected topology is no longer sustainable comprises provisioning functionalities that include (1) determining a current interconnection topology and current bandwidth usage of computing resources over a server cluster, and (2) utilizing performance scores of different topologies in conjunction with heuristic rules to determine an optimal set of accelerator devices to provision for a given HPC job. As demonstrated in further detail below, provisioning methods are configured to dynamically schedule and provision a set of accelerator devices (e.g., GPU devices) for a given job such that all or most of the accelerator devices within the set belong to a same interconnect domain, to thereby determining that the selected topology is no longer sustainable based on cost and/or available resources optimize performance and resource usage, while avoiding the scheduling and provisioning of a set of accelerator devices for the given job, which would require cross-domain interconnections, and result in potential waste of resources and degraded performance.).
Zhao, Zeng, and He are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 12, Zhao teaches A computing device, comprising: a processing circuit; and a memory coupled to the processing circuit and comprising non-transitory computer readable program instructions that, when executed by the processing circuit, cause the computing device to perform operations of ([0085] In this regard, the system memory 710 resources, local storage resources 730, and other memory or storage media as described herein, which have program code and data tangibly embodied thereon, are examples of what is more generally referred to herein as “processor-readable storage media” that store executable program code of one or more software programs. Articles of manufacture comprising such processor-readable storage media are considered embodiments of the invention. An article of manufacture may comprise, for example, a storage device such as a storage disk, a storage array or an integrated circuit containing memory. The term “article of manufacture” as used herein should be understood to exclude transitory, propagating signals.; [0086] The processors 702 may comprise one or more processors that are configured to process program instructions and data to execute a native operating system (OS) and applications that run on the GPU server node 700. For example, the processors 702 may comprise one or more central processing units (CPUs), a microprocessor, a microcontroller, an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), and other types of processors, as well as portions or combinations of such processors.): 
generating a plurality of different topologies for a federated learning system, wherein the federated learning system includes a plurality of worker entities ([0014] Illustrative embodiments of the invention will now be explained in further detail with regard to systems and methods to provide generating a plurality of different topologies for a federated learning system topology-aware provisioning of computing resources (e.g., hardware accelerator resources such as GPU device) in a wherein the federated learning system distributed heterogeneous computing environment. As explained in further detail below, systems and methods for dynamically scheduling and provisioning computing resources in a heterogeneous server cluster are configured to maintain information regarding the hardware connection topology of server nodes within a heterogeneous cluster, as well as current bandwidth usage information regarding intra-node and inter-node communication includes a plurality of worker entities links of the server nodes, and utilize such information to provision computing devices (e.g., GPUs) in a way that optimizes communication bus and networking resources (mitigates or eliminates waste of network resources), and which optimally utilizes bidirectional connection topologies, in a balanced manner, to mitigate communication bottlenecks between computing resources.), 
Zhao fails to teach wherein the plurality of different topologies comprises at least one centralized training topology and at least one federated training topology; for respective ones of the plurality of topologies, estimating a cost of operating the federated learning system using the topology; selecting a subset of the topologies based on the estimated cost; determining whether the topologies in the selected subset of topologies can be instantiated; and selecting a topology for implementation based on the determination of whether the topologies in the selected subset of topologies can be instantiated.
Zeng teaches for respective ones of the plurality of topologies, estimating a cost of operating the federated learning system using the topology ([RL-Based Algorithm Design, pg. 30] Hence, the state space S consists of all the possible combinations of v and (u1, u2, u3, …, ui , …). A: Suppose that the VM is located in vt = BSi at time t, and it can migrate to vt+1 = BSj . We can simply regard the action at as “move to BSj ,” denoted by at ! BSj, Hence, we have A = {BSk|k = 1, 2, 3, …} as the VM placement candidate set. r: The agent, after taking action at on state st at time t, shall receive a reward as r(st , at ) = 1/(W(st , at ) + M(st , at )), where W(st , at ) and M(st , at ) are the for respective ones of the plurality of topologies, estimating a cost of operating the federated learning system using the topology communication cost for the data transmission and VM migration during time slot t, respectively. Obviously, maximizing the reward is equivalent to minimizing the overall cost.); 
selecting a subset of the topologies based on the estimated cost ([The Inference Phase, pg. 31-32] Therefore, the VM shall always be located in the base station that can minimize the data transfer cost from the MCS data sources to the VM. Although our algorithm performs the same as the two other competitors, it proves that our algorithm indeed observes this phenomenon and can automatically make the right decision at runtime. This is further verified in the probabilistic movement case, in which we notice that the VM location is frequently changed at runtime. Due to the uneven distribution of users in the network, the VM shall always find the right location to selecting a subset of the topologies based on the estimated cost minimize both the VM migration cost and the data transferring cost. In this case, the decision shall not only be made on current state but shall look ahead to take future possible states into consideration. The definition of the reward in our DQN-based algorithm already considers this issue and therefore can compromise between the VM migration cost and the data transfer cost. Sometimes even the immediate cost in a time slot is larger than the competitors’, so choosing the right location saves VM migration cost in the future. As a result, in the long run, our framework can finally outperform the competitors.); 
determining whether the topologies in the selected subset of topologies can be instantiated; and selecting a topology for implementation based on the determination of whether the topologies in the selected subset of topologies can be instantiated ([Framework Overview, pg. 28] The action executor on each controllable edge computing element (e.g., edge server, base-station, user equipment, router, and so on) can communicate with the RL-based controller to determining whether the topologies in the selected subset of topologies can be instantiated obtain the control decisions, and accordingly execute the derived action. Once the action is taken, the executor selecting a topology for implementation based on the determination of whether the topologies in the selected subset of topologies can be instantiated calculates the reward obtained on each network node and reports it back to the RL-based controller to update the control agent so as to make it more intelligent and efficient.).
Zhao and Zeng are combinable for the same rationale as set forth above with respect to claim 1.
He teaches wherein the plurality of different topologies comprises at least one centralized training topology and at least one federated training topology

    PNG
    media_image1.png
    235
    645
    media_image1.png
    Greyscale

([1 Introduction] Federated learning has been well recognized as a framework able to protect data privacy (Konecny et al., 2016; Smith et al., 2017a; Yang et al., 2019). State-of-the-art federated learning adopts the centralized network architecture where a centralized node collects the gradients sent from child agents to update the global model.; To further protect the data privacy and avoid the communication bottleneck, the decentralized architecture has been recently proposed (Vanhaesebrouck et al., 2017; Bellet et al., 2018), where the centralized node has been removed, and each node only communicates with its neighbors (with mutual trust) by exchanging their local models.; As shown in Fig. 1 above, He teachs a plurality of different topologies comprises at least one centralized training topology Fig. 1 (a) and at least one federated training topology Fig. 1 (b) and (c));
	Zhao, Zeng, and He are combinable for the same rationale as set forth above with respect to claim 1.

Claims 3-5, 7 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao, in view of Zeng, He, and further in view of Chen et al. (NPL: "MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning", hereinafter 'Chen1'). 

Regarding claim 3, Zhao, as modified by Zeng and He, teaches The computing device of claim 12.
Zhao, as modified by Zeng and He, fails to teach wherein estimating the cost of operating the federated learning system using a selected topology comprises estimating a computer performance metric associated with the selected topology.
Chen1 teaches wherein estimating the cost of operating the federated learning system using a selected topology comprises estimating a computer performance metric associated with the selected topology ([1. Introduction] In this work, we try to keep the dense topology of the network while maintaining the computational cost within a small budget. To this end, we innovate a network structure named MUlti-layer Feature Federation Network (MuffNet). In the MuffNet, the output channels of each convolutional layer are split into non-overlapped groups. The input channels are concatenated from channel groups of multiple previous layers. Each output channel is served as the input of a higher layer only once. The topology is illustrated in Figure 5. In this way, we guarantee that the computational cost will not blow up by the dense connection.; [Computational Metric] The estimating a computer performance metric associated with the selected topology FLOP is widely adopted in previous works to measure the computational cost of convolutional neural network. Some works find that smaller FLOP does not imply faster inference [60, 1] since the hardware implementation various from device to device. In this work we suggest to use FLOP due to three reasons: 1) In most cases FLOP is positively related to the final performance on hardware; 2) The hardware-dependent metrics cannot be fairly compared across different types of devices; 3) Testing network performance on hardware usually requires engineering efforts not affordable to everyone. Therefore, using FLOP is a more meaningful way to compare performance among different methods on different devices.).
Zhao, Zeng, He, and Chen1 are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Zhao, Zeng, and He, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Chen1 to Zhao before the effective filing date of the claimed invention in order to better approximate network ability in frequency domain under given computation budget (cf. Chen1, [1. Introduction] Comparing to previous works that sparsify network links between layers, the MuffNet does not compromise the link density for computational efficiency. We argue that our solution has theoretical advantages. We study the approximation ability of network in the frequency domain. It is proved in [39] that a deep convolutional network behaves like a low-pass filter. The MuffNet ensembles convolutional sub-networks of all depths such that both low and high frequencies could pass through the structure. This makes the MuffNet a better universal functional approximator under the given computational budget. From this viewpoint, the link-sparsification approach does not improve the network spectrum in proportion since it only ensembles a few subnetworks of the selected depths.).

Regarding claim 4, Zhao, as modified by Zeng, He and Chen1, teaches The computing device of claim 3.
Chen1 teaches wherein estimating the computer performance metric comprises estimating, for a layer in a neural network, a number of floating point operations per second, FLOPS, needed based on a scalar product operation on the layer ([Structure FLOPs] Given Mi,j , we could estimating the computer performance metric comprises estimating, for a layer in the neural network, a number of floating point operations per second, FLOPS, needed based on a scalar product operation on the layer derive its FLOPs as following. Suppose the feature map size of the input layer is si ×si and that of the output layer is sj ×sj . Define mi as the number of input channels and mj as the number of output channels. For the normal block and the residual block, FLOPi = s 2 i (m2 i + 9mi + mimj ) . For the reduction block, FLOPi = s 2 i ( 1 2 m2 i + 1 2 mimjj )+ s 2 j ( 75 2 m2 i + 53 2 mimj + 13m2 j ). The total FLOPs of the structure parameterized by M is FLOP(M) = L −1 i=1 FLOPi .).
Zhao, Zeng, He, and Chen1 are combinable for the same rationale as set forth above with respect to claim 3.

Regarding claim 5, Zhao, as modified by Zeng, He, and Chen1, teaches The computing device of claim 4.
Chen1 teaches wherein estimating the number of FLOPS associated with the selected topology comprises estimating the number of FLOPS according to the following formula for a layer in the neural network:

    PNG
    media_image2.png
    94
    697
    media_image2.png
    Greyscale

([Structure FLOPs] Given Mi,j , we could derive its FLOPs as following. Suppose the feature map size of the input layer is si ×si and that of the output layer is sj ×sj . Define mi as the number of input channels and mj as the number of output channels. For the normal block and the residual block, FLOPi = s 2 i (m2 i + 9mi + mimj ) . For the reduction block, FLOPi = s 2 i ( 1 2 m2 i + 1 2 mimjj )+ s 2 j ( 75 2 m2 i + 53 2 mimj + 13m2 j ) .The total FLOPs of the structure parameterized by M is FLOP(M) = L −1 i=1 FLOPi .)
Zhao, Zeng, He, and Chen1 are combinable for the same rationale as set forth above with respect to claim 3.

Regarding claim 7, Zhao, as modified by Zeng, He, and Chen1, teaches The computing device of claim 3.
Zeng teaches the operations further comprising: estimating an availability of resources needed to obtain the computer performance metric associated with the selected topology ([RL and Its Applications, pg. 27] Any RL algorithm consists of two phases: the training phase and the inference phase. The training phase is to tell the agent which action shall be taken under a given environment from a series of trials. Imagine a baby agent is to control the edge computing platform (environment) toward the goal of improving the user experience (reward). The baby agent will first look around and construct its own representation of the environment as the state, for example, estimating an availability of resources needed to obtain the computer performance metric associated with the selected topology current available resource amount, user demands, and service latency. Sadly, the curious baby agent has no knowledge about what to do and will start to explore the environment by making random decisions (actions). Then these actions will be carried out and applied to the network, like selecting the associated edge server, scheduling user tasks, and allocating edge resources.; [Experiments and Discussions, pg. 30] To verify the feasibility of our framework and the efficiency of our proposed DQN-based algorithm, we have conducted extensive simulation-based experiments. We initialize an edge computing platform consisting of 50 servers on different base stations in a randomly generated topology. The communication cost between any servers are set as the number of hops between them. There are 500 users who randomly move between these base stations.).
Zhao, Zeng, He, and Chen1 are combinable for the same rationale as set forth above with respect to claim 3.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao, in view of Zeng, He, and further in view of Chen et al. (U.S. Pre-Grant Publication No. 20210182077, hereinafter, hereinafter 'Chen2'). 

Regarding claim 8, Zhao, as modified by Zeng and He, teaches The computing device of claim 12.
Zhao, as modified by Zeng and He, fails to teach wherein the topology comprises one of a centralized training topology, an isolated training topology, a federated training topology, a round robin topology and a hierarchical topology.
Chen2 teaches wherein the topology comprises one of a centralized training topology, an isolated training topology, a federated training topology, a round robin topology and a hierarchical topology ([0693] When the neural network operation device includes a plurality of the computation devices, the plurality of the computation devices are connected to each other in a specific structure and transfer data to each other, where; [0694] through an express external device interconnection bus, in other words, a PCIE bus, the plurality of the computation devices are interconnected and transfer data to each other to support large scale neural network operations; the plurality of the computation devices share a same control system, or have separate control systems; the plurality of the computation devices share a memory, or have their own memories; and an the topology comprises one of a centralized training topology, an isolated training topology, a federated training topology, a round robin topology and a hierarchical topology interconnection method of the plurality of the computation devices can be any interconnection topology.; [2500] The interconnection module 4 is configured to connect the primary operation module and the secondary operation modules, and can be implemented into different interconnection topologies (such as tree structure, ring structure, grid structure, hierarchical interconnection, bus structure, etc.).).
Zhao, Zeng, He, and Chen2 are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Zhao, Zeng, and He, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Chen2 to Zhao before the effective filing date of the claimed invention in order to improve the transaction data processing speed of the interconnection circuit, achieving good data flow control and improving the data throughput rate in interconnection circuit (cf. Chen2, [0064] Therefore, the present disclosure can select a corresponding transfer channel for multiple transaction data arriving at the aggregation nodes in the interconnection circuit according to their destinations, and can arbitrate the data transfer requests competing for the same transfer channel at the same time, thereby improving the transaction data processing speed of the interconnection circuit, achieving good data flow control and improving the data throughput rate in interconnection circuit.).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao, in view of Zeng, He, and further in view of Wang et al. (NPL: "CMFL: Mitigating Communication Overhead for Federated Learning", hereinafter 'Wang'). 

Regarding claim 6, Zhao, as modified by Zeng and He, teaches The computing device of claim 12.
Zhao, as modified by Zeng and He, fails to teach the operations further comprising: for selected topologies, generating a network footprint associated with the topology; and estimating an availability of resources needed to implement the network footprint associated with the topology.
Wang teaches the operations further comprising: for selected topologies, generating a network footprint associated with the topology; and estimating an availability of resources needed to implement the network footprint associated with the topology (As shown in image below, Wang teaches generating a network footprint associated with the topology (4) and estimating availability of resources needed to implement the network footprint.; [B. Simulation of Federated Multi-Task Learning, pg. 961]; CMFL provides general improvements for almost all the follow-up FL designs in further reducing their network footprint. To illustrate this, we applied CMFL to recently proposed MOCHA [12] and developed the following simulation. Specifically, CMFL identifies local updates’ relevance in MOCHA’s federated multi-Task learning by locally calculating the changing of the global matrix based on the local update and the record of the relationship matrix among clients.; [Communication overhead., pg. 963] Fig. 7a depicts the communication overhead under various prediction accuracies. We can see a similar trend to Fig. 4b, i.e., CMFL continuously outperforms Gaia, substantially reducing the uploading rounds. To better illustrate CMFL’s efficiency, we estimating availability of resources needed to implement the network footprint measured the consumed network footprint during the learning procedure among these schemes, as shown in Fig. 7b. Specifically, CMFL reduces the size of the uploaded data by 7.1x, 6.4x and 6.9x given the three learning accuracy values, respectively.).

    PNG
    media_image3.png
    801
    488
    media_image3.png
    Greyscale


Zhao, Zeng, He, and Wang are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Zhao, Zeng, and He, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Wang to Zhao before the effective filing date of the claimed invention in order to improve communication efficiency for almost all of the existing federated learning schemes (cf. Wang, [Abstract, pg. 954] By avoiding uploading those irrelevant updates to the server, CMFL can substantially reduce the communication overhead while still guaranteeing the learning convergence. CMFL is shown to achieve general improvement in communication efficiency for almost all of the existing federated learning schemes. We evaluate CMFL through extensive simulations and EC2 emulations. Compared with vanilla Federated Learning, CMFL yields 13.97x communication efficiency in terms of the reduction of network footprint. When applied to Federated Multi-Task Learning, CMFL improves the communication efficiency by 5.7x with 4% higher prediction accuracy.).

Claims 47-53, 56 are rejected under 35 U.S.C. 103 as being unpatentable over McMahan et al. (U.S. Pre-Grant Publication No. 20190227980, hereinafter ‘McMahan'), in view of Szeto et al. (U.S. Pre-Grant Publication No. 20180018590, hereinafter ‘Szeto') and Duan et al. (NPL: "Astraea: Self-balancing Federated Learning for Improving Classification Accuracy of Mobile Deep Learning Applications", hereinafter 'Duan'). 

Regarding claim 47, McMahan teaches A method of training a neural network at a worker entity in a federated learning system, the method comprising ([0004] One example aspect of the present disclosure is directed to a computing system. The computing system can include one or more server computing devices. The one or more server computing devices can include one or more processors and one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors cause the one or more server computing devices to perform operations.): 
receiving a set of model weights from a master entity ([0023] A machine-learned model can then be provided to the selected client computing devices. For example, one or more server computing devices can receiving a set of model weights from a master entity provide a global set of parameters for a machine-learned model to each selected client computing device.);
training the neural network model using the set of model weights to obtain a set of modified model weights ([0024] In some implementations, after receiving the machine-learned model, each selected client computing device can then determine a local update based at least in part on a local dataset stored locally on the selected client computing device and provide the local update to the one or more server computing devices.; [0025] For example, in some implementations, determining the local update can include training the machine-learned model based at least in part on the local dataset to generate a locally-trained model. In some implementations, the machine-learned model can be trained via stochastic gradient descent. For example, each selected client computing device can perform some number of mini-batch stochastic gradient descent steps to generate the training the neural network model using the set of model weights to obtain a set of modified model weights locally-trained model, such as updated local values for the global set of parameters for the machine-learned model.);
evaluating performance of the model; determining whether the performance of the model has improved or degraded relative to a previous performance of the model following a previous training round of training neural networks in the federated learning system ([0026] In some implementations, determining the local update can include evaluating performance of the model determining a difference between the locally-trained model and the machine-learned model. For example, in some implementations, the difference between the locally-trained model and the machine-learned model can be determined by determining whether the performance of the model has improved or degraded relative to a previous performance of the model following a previous training round of training neural networks in the federated learning system determining a difference between the global set of parameters for the machine-learned model provided by the one or more server computing devices and the updated local values for the global set of parameters determined by training the machine-learned model with the local dataset.); 
in response to determining that the performance of the model has improved relative to the previous performance of the model following the previous training rounds, transmitting the modified model weights to the master entity for federation by the master entity ([0030] Each client computing device can then in response to determining that the performance of the model has improved relative to the previous performance of the model following the previous training rounds provide the local update to the one or more server computing devices. For example, in some implementations, the local update can be clipped before transmitting the modified model weights to the master entity for federation by the master entity being provided to the one or more server computing devices, as described herein. In various implementations, the local update can be provided as one or more vectors, matrices, parameters, or other formats, and may be encoded before being provided to the one or more server computing devices.) and
McMahan fails to teach selecting the modified model weights for use in operating the neural network, and in response to determining that the performance of the model has degraded relative to the previous performance, determining whether to participate in future rounds of training in the federated learning system.
Szeto teaches selecting the modified model weights for use in operating the neural network ([0104] At operation 560, the modeling engine calculates a model similarity score as a function of the proxy model parameters and actual model parameters. As discussed above, the selecting the modified model weights parameters can be compared pairwise considering that each model is built from the same implementation of the machine learning algorithm and considering that the proxy data has similar features as the private data. In addition to using the proxy and actual model parameters, the modeling engine can also use other factors available in calculating the similarity score. Example additional factors can include accuracies of the model, cross fold validation, accuracy gain, sensitivities, specificities, distributions of the pairwise comparisons (e.g., average value, distributions about zero, etc.). In some embodiments, the actual private data training set can be used to cross-validate the proxy model. If the accuracy of the predictions from the trained proxy model on the actual private data training set is sufficiently high (e.g., within 10%, 5%, 1%, or closer), then the trained proxy model could be considered similar to the trained actual model. Further, if the similarity score fails to satisfy similarity criteria (e.g., falls below a threshold, etc.), then the modeling engine can repeat operations 540 through 560.; [0107] In other embodiments, the global modeling engine also transmits the trained global model back to one or more of the private data servers. The private data servers can then for use in operating the neural network leverage the global trained model to conduct local prediction studies in support of local clinical decision making workflows. In addition, the private data servers can also use the global model as a foundation for continued online learning. Thus, the global model becomes a basis for continued machine learning as new private data becomes available. As new data becomes available, method 500 can be repeated to improve the global modeling engine.), and
McMahan and Szeto are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of McMahan, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Szeto to McMahan before the effective filing date of the claimed invention in order to construct communication channels among computing devices over a network to exchange machine learning data while respecting data privacy of the underlying raw data (cf. Szeto, [0029] One should appreciate that the disclosed techniques provide many advantageous technical effects including construction of communication channels among computing devices over a network to exchange machine learning data while respecting data privacy of the underlying raw data. The computing devices are able to exchange “learned” information or knowledge among each other without comprising privacy. More specifically, rather than transmitting private or secured data to remote computing devices, the disclosed private data servers attempt to “learn” information automatically about the local private data via computer-based implementations of one or more machine learning algorithms. The learned information is then exchanged with other computers lacking authorization to access the private data. Further, it should be appreciated that the technical effects include computationally building trained proxy models from distributed, private data and their corresponding data distributions.).
Duan teaches in response to determining that the performance of the model has degraded relative to the previous performance, determining whether to participate in future rounds of training in the federated learning system ([B. Effect of Accuracy] The experimental results on imbalanced EMNIST are shown in Figure6. In the first 100 rounds, the training of model converges faster and the accuracy of the model increases with the increase of c. However, in response to determining that the performance of the model has degraded relative to the previous performance after 150 rounds, the accuracy is slightly reduced, especially for the models trained with a large c. For example, the accuracy is reduced from 79.03% to 77.79% when c=100 and γ=20. It means that the CNN models are over-training and suffered from overfitting. determining whether to participate in future rounds of training in the federated learning system In order to remedy the loss of accuracy caused by overfitting, we can use the regularization strategy early stopping [30], in which optimization is halted based on the performance on a validation set, during training. Furthermore, experimental results show that a larger γ does not help improving the accuracy of the model.).
McMahan, Szeto, and Duan are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of McMahan and Szeto, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Duan to McMahan before the effective filing date of the claimed invention in order to relieve global imbalance by runtime data augmentation, and for averaging the local imbalance, rescheduling the training of clients based on KullbackLeibler divergence (KLD) of their data distribution (cf. Duan, [Abstract] Federated learning (FL) is a distributed deep learning method which enables multiple participants, such as mobile phones and IoT devices, to contribute a neural network model while their private training data remains in local devices. This distributed approach is promising in the edge computing system where have a large corpus of decentralized data and require high privacy. However, unlike the common training dataset, the data distribution of the edge computing system is imbalanced which will introduce biases in the model training and cause a decrease in accuracy of federated learning applications. In this paper, we demonstrate that the imbalanced distributed training data will cause accuracy degradation in FL. To counter this problem, we build a self-balancing federated learning framework call Astraea, which alleviates the imbalances by 1) Global data distribution based data augmentation, and 2) Mediator based multi-client rescheduling. The proposed framework relieves global imbalance by runtime data augmentation, and for averaging the local imbalance, it creates the mediator to reschedule the training of clients based on KullbackLeibler divergence (KLD) of their data distribution. Compared with FedAvg, the state-of-the-art FL algorithm, Astraea shows +5.59% and +5.89% improvement of top-1 accuracy on the imbalanced EMNIST and imbalanced CINIC-10 datasets, respectively. Meanwhile, the communication traffic of Astraea can be 92% lower than that of FedAvg.).

Regarding claim 48, McMahan, as modified by Szeto and Duan, teaches The federated learning entity of claim 56.
Szeto teaches wherein evaluating the performance of the model comprises evaluating the performance of the model against local validation data that is local to a worker entity ([0121] The disclosed approach of distributed, online machine learning can leverage numerous techniques for validating trained models. One approach includes a first private data server sending its trained actual model to other private data servers. The other private data servers can then wherein evaluating the performance of the model comprises evaluating the performance of the model against local validation data that is local to the worker entity validate the trained actual model on their own local data and send the results back to the first private data server. Additionally, a global modeling engine could also execute one or more cross-fold validation steps on the trained actual models using the global collection of aggregated proxy data. The reverse is also true. The global modeling engine can send the global mode to one or more private data servers to have the global model validated on each private data server's local data. One should appreciate that the validation of the various models is to be performed on data sets selected according to the same data selection requirements to ensure a proper analysis.).
McMahan, Szeto, and Duan are combinable for the same rationale as set forth above with respect to claim 47.

Regarding claim 49, McMahan, as modified by Szeto and Duan, teaches The federated learning entity of claim 56.
Szeto teaches the operations further comprising: in response to determining that the performance of the model has degraded relative to the previous performance of the model following the previous training round, determining whether a level of performance degradation is less than a threshold ([0104] In addition to using the proxy and actual model parameters, the modeling engine can also use other factors available in calculating the similarity score. Example additional factors can include accuracies of the model, cross fold validation, accuracy gain, sensitivities, specificities, distributions of the pairwise comparisons (e.g., average value, distributions about zero, etc.). In some embodiments, the actual private data training set can be used to cross-validate the proxy model. If the accuracy of the predictions from the trained proxy model on the actual private data training set is sufficiently high (e.g., within 10%, 5%, 1%, or closer), then the trained proxy model could be considered similar to the trained actual model. Further, if the in response to determining that the performance of the model has degraded relative to the previous performance of the model following the previous training round, determining whether a level of performance degradation is less than a threshold similarity score fails to satisfy similarity criteria (e.g., falls below a threshold, etc.), then the modeling engine can repeat operations 540 through 560.); and
in response to determining that the level of performance degradation is less than the threshold, transmitting the modified model weights to the master entity for federation by the master entity and selecting the modified model weights for use in operating the neural network ([0105] in response to determining that the level of performance degradation is less than the threshold Under the condition that the similarity score satisfies similarity criteria, the modeling engine can proceed to operation 570. Operation 570 includes transmitting the modified model weights to the master entity for federation by the master entity transmitting the set of proxy data, possibly along with other information, over the network to at least one non-private computing device. The non-private computing device could be a centralized hub that selecting the modified model weights for use in operating the neural network aggregates proxy data from private servers or peer hubs or a combination of both. The proxy data can be transmitted over the network as a file (e.g., HDF5), serialized in a mark-up language (e.g., XML, YAML, JSON, etc.), a zip archive, or other format. Additional information beyond the proxy data can also be sent to the remote computing device, a global modeling engine or peer machine for example, including the actual model parameters, proxy model parameters, data distributions, similarity score(s), or other information. Providing the model parameters enables the remote computing device to re-instantiate the trained models and conduct localized validation of the work performed by the private data server's modeling engine. One should note that the actual private data is not transmitted thereby respecting privacy.).
McMahan, Szeto, and Duan are combinable for the same rationale as set forth above with respect to claim 47.

Regarding claim 50, McMahan, as modified by Szeto and Duan, teaches The federated learning entity of claim 56.
Szeto teaches the operations further comprising: in response to determining that the level of performance degradation is greater than a threshold, transmitting a previous set of model weights generated in connection with the previous training round to the master entity for federation by the master entity, and selecting the previous set of model weights for use in operating the neural network ([0050] As proxy data 260 is generated and relayed to the global model server 130, the global model server aggregates the data and generates an updated global model. Once the global model is updated, it can be determined whether the updated global model is an improvement over the previous version of the global model. If the updated global model is an improvement (e.g., the predictive accuracy is improved), new parameters may be provided to the private data servers via the updated model instructions 230. At the private data server 124, the in response to determining that the level of performance degradation is greater than the threshold performance of the trained actual model (e.g., whether the model improves or worsens) can be evaluated to determine whether the models instructions provided by the updated global model result in an improved trained actual model. transmitting a previous set of model weights generated in connection with the previous training round to the master entity for federation by the master entity, and selecting the previous set of model weights for use in operating the neural network Parameters associated with various machine learning model versions may be stored so that earlier machine learning models may be later retrieved, if needed.).
McMahan, Szeto, and Duan are combinable for the same rationale as set forth above with respect to claim 47.

Regarding claim 51, McMahan, as modified by Szeto and Duan, teaches The federated learning entity of claim 56.
Szeto teaches the operations further comprising, in response determining that the level of performance has degraded relative to the previous performance of the model following the previous training round, determining whether or not to participate in future rounds of training neural networks in the federated learning system ([0068] For example, the response determining that the level of performance has degraded relative to the previous performance of the model following the previous training round, determining whether or not to participate in future rounds of training neural networks in the federated learning system conditions could include a number of iterations or epochs to execute on the training data, learning rates, convergence requirements, time limits for training, initial conditions, sensitivity, specificity or other types of conditions that are required or optional. Convergence requirements can include first order derivatives such as “rates of change”, second order derivatives such as “acceleration”, or higher order time derivatives or even higher order derivatives of other dimensions in the attribute space of the data, etc.).
McMahan, Szeto, and Duan are combinable for the same rationale as set forth above with respect to claim 47.

Regarding claim 52, McMahan, as modified by Szeto and Duan, teaches The federated learning entity of claim 56.
Szeto teaches the operations further comprising: receiving a set of final model weights from the master entity; evaluating the performance of the neural network using the final model weights; re-training the neural network starting with the final model weights to obtain modified final model weights; evaluating the performance of the neural network using the modified final model weights; comparing performance of the neural network using the final model weights to performance of the neural network using the modified final neural network weights; and selecting a set of neural network weights from among the final neural network weights and the modified final neural network weights in response to the comparison ([0050] As proxy data 260 is generated and relayed to the global model server 130, the global model server aggregates the data and generates an updated global model. Once the global model is updated, it can be determined whether the updated global model is an improvement over the previous version of the global model. If the updated global model is an improvement (e.g., the predictive accuracy is improved), receiving a set of final model weights from the master entity new parameters may be provided to the private data servers via the updated model instructions 230. At the private data server 124, the evaluating the performance of the neural network using the final model weights performance of the trained actual model (e.g., comparing performance of the neural network using the final model weights to performance of the neural network using the modified final neural network weights; and whether the model improves or worsens) can be evaluating the performance of the neural network using the modified final model weights evaluated to determine whether the models instructions provided by the updated global model result in an re-training the neural network starting with the final model weights to obtain modified final model weights improved trained actual model. selecting a set of neural network weights from among the final neural network weights and the modified final neural network weights in response to the comparison Parameters associated with various machine learning model versions may be stored so that earlier machine learning models may be later retrieved, if needed.).
McMahan, Szeto, and Duan are combinable for the same rationale as set forth above with respect to claim 47.

Regarding claim 53, McMahan, as modified by Szeto and Duan, teaches The federated learning entity of claim 56.
Szeto teaches the operations further comprising: comparing performance of the neural network using final model weights to performance of the neural network using modified final neural network weights and intermediate neural network weights generated during a federated learning round; and selecting a set of neural network weights from among the final neural network weights, the modified final neural network weights and the intermediate neural network weights in response to the comparison ([0099] Upon reception of the package, the modeling engine can, if configured to do so, execute training in a secured container. In other embodiments, the model instructions provide a pointer to a locally stored implementation of the machine learning algorithm. Further, the model instructions can include additional information that permit the modeling engine to complete its local training tasks including similarity criteria, similarity score definition, query conversion instructions for selecting private data from a local database, a pre-trained model as a base-line, or other information.; [0100] Operation 520 includes the modeling engine creating the trained actual model according to the model instructions and as a function of at least some of the local private data by training the implementation of the machine learning algorithm on the local private data.; [0104] At operation 560, the modeling engine calculates a model similarity score as a function of the proxy model parameters and actual model parameters. As discussed above, the parameters can be comparing performance of the neural network using the final model weights to performance of the neural network using the modified final neural network weights and intermediate sets of neural network weights generated compared pairwise considering that each model is built from the same implementation of the machine learning algorithm and considering that the proxy data has similar features as the private data. In addition to using the proxy and actual model parameters, the modeling engine can also use other factors available in calculating the similarity score. Example additional factors can include accuracies of the model, cross fold validation, accuracy gain, sensitivities, specificities, distributions of the pairwise comparisons (e.g., average value, distributions about zero, etc.). In some embodiments, the actual private data training set can be used to cross-validate the proxy model. If the accuracy of the predictions from the trained proxy model on the actual private data training set is sufficiently high (e.g., within 10%, 5%, 1%, or closer), then the trained proxy model could be considered similar to the trained actual model. Further, if the similarity score fails to satisfy similarity criteria (e.g., falls below a threshold, etc.), then the modeling engine can repeat operations 540 through 560.; [0107] In other embodiments, the global modeling engine also transmits the trained global model back to one or more of the private data servers. The private data servers can then selecting a set of neural network weights from among the final neural network weights, the modified final neural network weights and the intermediate neural network weights in response to the comparison leverage the global trained model to conduct local prediction studies in support of local clinical decision making workflows. In addition, the private data servers can also use the global model as a foundation for continued online learning. Thus, the global model becomes a basis for continued machine learning as new private data becomes available. during a federated learning round As new data becomes available, method 500 can be repeated to improve the global modeling engine.).
McMahan, Szeto, and Duan are combinable for the same rationale as set forth above with respect to claim 47.

Regarding claim 56, McMahan teaches A federated learning entity, comprising: a processing circuit; and a memory coupled to the processing circuit and comprising non-transitory computer readable program instructions that, when executed by the processing circuit, cause the federated learning entity to perform operations of ([0004] One example aspect of the present disclosure is directed to a computing system. The computing system can include one or more server computing devices. The one or more server computing devices can include one or more processors and one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors cause the one or more server computing devices to perform operations.): 
receiving a set of model weights from a master entity ([0023] A machine-learned model can then be provided to the selected client computing devices. For example, one or more server computing devices can receiving a set of model weights from a master entity provide a global set of parameters for a machine-learned model to each selected client computing device.);
training a neural network model using the set of model weights to obtain a set of modified model weights ([0024] In some implementations, after receiving the machine-learned model, each selected client computing device can then determine a local update based at least in part on a local dataset stored locally on the selected client computing device and provide the local update to the one or more server computing devices.; [0025] For example, in some implementations, determining the local update can include training the machine-learned model based at least in part on the local dataset to generate a locally-trained model. In some implementations, the machine-learned model can be trained via stochastic gradient descent. For example, each selected client computing device can perform some number of mini-batch stochastic gradient descent steps to generate the training the neural network model using the set of model weights to obtain a set of modified model weights locally-trained model, such as updated local values for the global set of parameters for the machine-learned model.);
evaluating performance of the model; determining whether the performance of the model has improved or degraded relative to a previous performance of the model following a previous training round of training neural networks in the federated learning entity ([0026] In some implementations, determining the local update can include evaluating performance of the model determining a difference between the locally-trained model and the machine-learned model. For example, in some implementations, the difference between the locally-trained model and the machine-learned model can be determined by determining whether the performance of the model has improved or degraded relative to a previous performance of the model following a previous training round of training neural networks in the federated learning entity determining a difference between the global set of parameters for the machine-learned model provided by the one or more server computing devices and the updated local values for the global set of parameters determined by training the machine-learned model with the local dataset.); 
in response to determining that the performance of the model has improved relative to the previous performance of the model following the previous training rounds, transmitting the modified model weights to the master entity for federation by the master entity ([0030] Each client computing device can then in response to determining that the performance of the model has improved relative to the previous performance of the model following the previous training rounds provide the local update to the one or more server computing devices. For example, in some implementations, the local update can be clipped before transmitting the modified model weights to the master entity for federation by the master entity being provided to the one or more server computing devices, as described herein. In various implementations, the local update can be provided as one or more vectors, matrices, parameters, or other formats, and may be encoded before being provided to the one or more server computing devices.) and
McMahan fails to teach selecting the modified model weights for use in operating the neural network; and in response to determining that the performance of the model has degraded relative to the previous performance, determining whether to participate in future rounds of training in the federated learning system.
Szeto teaches selecting the modified model weights for use in operating the neural network ([0104] At operation 560, the modeling engine calculates a model similarity score as a function of the proxy model parameters and actual model parameters. As discussed above, the selecting the modified model weights parameters can be compared pairwise considering that each model is built from the same implementation of the machine learning algorithm and considering that the proxy data has similar features as the private data. In addition to using the proxy and actual model parameters, the modeling engine can also use other factors available in calculating the similarity score. Example additional factors can include accuracies of the model, cross fold validation, accuracy gain, sensitivities, specificities, distributions of the pairwise comparisons (e.g., average value, distributions about zero, etc.). In some embodiments, the actual private data training set can be used to cross-validate the proxy model. If the accuracy of the predictions from the trained proxy model on the actual private data training set is sufficiently high (e.g., within 10%, 5%, 1%, or closer), then the trained proxy model could be considered similar to the trained actual model. Further, if the similarity score fails to satisfy similarity criteria (e.g., falls below a threshold, etc.), then the modeling engine can repeat operations 540 through 560.; [0107] In other embodiments, the global modeling engine also transmits the trained global model back to one or more of the private data servers. The private data servers can then for use in operating the neural network leverage the global trained model to conduct local prediction studies in support of local clinical decision making workflows. In addition, the private data servers can also use the global model as a foundation for continued online learning. Thus, the global model becomes a basis for continued machine learning as new private data becomes available. As new data becomes available, method 500 can be repeated to improve the global modeling engine.); and
McMahan and Szeto are combinable for the same rationale as set forth above with respect to claim 47.
Duan teaches in response to determining that the performance of the model has degraded relative to the previous performance, determining whether to participate in future rounds of training in the federated learning system ([B. Effect of Accuracy] The experimental results on imbalanced EMNIST are shown in Figure6. In the first 100 rounds, the training of model converges faster and the accuracy of the model increases with the increase of c. However, in response to determining that the performance of the model has degraded relative to the previous performance after 150 rounds, the accuracy is slightly reduced, especially for the models trained with a large c. For example, the accuracy is reduced from 79.03% to 77.79% when c=100 and γ=20. It means that the CNN models are over-training and suffered from overfitting. determining whether to participate in future rounds of training in the federated learning system In order to remedy the loss of accuracy caused by overfitting, we can use the regularization strategy early stopping [30], in which optimization is halted based on the performance on a validation set, during training. Furthermore, experimental results show that a larger γ does not help improving the accuracy of the model.).
McMahan, Szeto, and Duan are combinable for the same rationale as set forth above with respect to claim 47.

Claim 54 is rejected under 35 U.S.C. 103 as being unpatentable over McMahan, in view of Szeto, Duan, and further in view of Samek et al. (U.S. Pre-Grant Publication No. 20210065002, hereinafter ‘Samek'). 

Regarding claim 54, McMahan, as modified by Szeto and Duan, teaches The federated learning entity of claim 56.
Szeto teaches the operations further comprising: combining final neural network weights with modified final neural network weights and/or one or more sets of intermediate neural network weights to obtain a combined set of neural network weights ([0050] Once the global model is updated, it can be determined whether the updated global model is an improvement over the previous version of the global model. If the updated global model is an improvement (e.g., the predictive accuracy is improved), new parameters may be provided to the private data servers via the updated model instructions 230. At the private data server 124, the performance of the trained actual model (e.g., whether the model improves or worsens) can be evaluated to determine whether the models instructions provided by the updated global model result in an improved trained actual model. Parameters associated with various machine learning model versions may be stored so that earlier machine learning models may be later retrieved, if needed.; [0081] The non-private computing device that receives the knowledge can then aggregate the knowledge with knowledge gained from other private data servers 224. One should appreciate that the non-private computing device (see FIG. 1, non-private computing device 130) could also be a different private data server in the ecosystem, a centralized machine learning hub or service, a global modeling engine, a cloud-based service, or other type of computing device suitably configured to receive the data. From the perspective of a central modeling service operating as the non-private computing device, the central modeling service can aggregate all the proxy data sets as a new aggregated training data set to create a trained global aggregated model. The aggregated model can then be transmitted back to interested stakeholders, private data server 224 for example, for use as a classifier or predictor of patient treatments and outcomes.; [0108] However, by combining the final neural network weights with the modified final neural network weights and/or one or more sets of intermediate neural network weights to obtain a combined set of neural network weights identifying which parameters are most predictive using the machine learning systems as described herein, data sets having in common these key predictive parameters may be combined. In other embodiments, model instructions may be modified, e.g., limited to include key predictive features, and used to regenerate proxy data, proxy data distributions, and other types of learned information. This regenerated information can then be sent to the global model server, where it is aggregated.); and
McMahan, as modified by Szeto and Duan, fails to teach operating the neural network using the combined set of neural network weights; wherein combining first neural network weights with the modified final neural network weights and/or one or more sets of intermediate neural network weights comprises generating a weighted average of the first neural network weights and the modified final neural network weights and/or one or more sets of intermediate neural network weights.
Samek teaches operating the neural network using the combined set of neural network weights; wherein combining first neural network weights with the modified final neural network weights and/or one or more sets of intermediate neural network weights comprises generating a weighted average of the first neural network weights and the modified final neural network weights and/or one or more sets of intermediate neural network weights ([0044] In step 38, the server 12 then operating the neural network using the combined set of neural network weights merges all the parameterization updates received from the clients 14, the combining first neural network weights with the modified final neural network weights and/or one or more sets of intermediate neural network weights comprises generating a weighted average of the first neural network weights and the modified final neural network weights and/or one or more sets of intermediate neural network weights merging representing a kind of averaging such as by use of a weighted average with the weights considering, for instance, the amount of training data using which the parameterization update of a respective client has been obtained in step 34. The parameterization update thus obtained at step 38 at this end of cycle i indicates the parameterization setting for the download 32 at the beginning of the subsequent cycle i+1.).
McMahan, Szeto, Duan, and Samek are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of McMahan, Szeto, and Duan, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Samek to McMahan before the effective filing date of the claimed invention in order to perform the download of information on parameterization settings to the individual clients/nodes by downloading merged parameterization updates resulting from merging the parameterization updates of the clients in each cycle to improve efficiency of distributed learning scenarios (cf. Samek, [0018] In accordance with a further aspect of the present application, distributed learning scenarios, irrespective of being of the federated or data-parallel learning type, are made more efficient by performing the download of information on parameterization settings to the individual clients/nodes by downloading merged parameterization updates resulting from merging the parameterization updates of the clients in each cycle and, additionally, performing this download of merged parameterization updates using lossy coding of an accumulated merge parameterization update.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Zhu et al. (NPL: “Multi-objective Evolutionary Federated Learning”) teaches a scalable method for encoding network connectivity adapted to federated learning to enhance the efficiency in evolving deep neural networks.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAGGIE MAIDO whose telephone number is (703) 756-1953. The examiner can normally be reached M-Th: 6am - 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MM/Examiner, Art Unit 2129                                                                                                                                                                                              
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

May 04, 2022
Application Filed
Apr 25, 2025
Non-Final Rejection — §101, §103
Aug 04, 2025
Response Filed
Oct 23, 2025
Final Rejection — §101, §103
Dec 22, 2025
Response after Non-Final Action
Jan 22, 2026
Request for Continued Examination
Jan 28, 2026
Response after Non-Final Action
Mar 03, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/330,099
Patent 12602603
MULTI-AGENT INFERENCE
2y 5m to grant Granted Apr 14, 2026
17/392,319
Patent 12596933
CONTEXT-AWARE ENTITY LINKING FOR KNOWLEDGE GRAPHS TO SUPPORT DECISION MAKING
2y 5m to grant Granted Apr 07, 2026
17/062,058
Patent 12579463
GENERATIVE REASONING FOR SYMBOLIC DISCOVERY
2y 5m to grant Granted Mar 17, 2026
17/659,028
Patent 12579452
EVALUATION SCORE DETERMINATION MACHINE LEARNING MODELS WITH DIFFERENTIAL PERIODIC TIERS
2y 5m to grant Granted Mar 17, 2026
17/212,022
Patent 12566941
EXTENSION OF EXISTING NEURAL NETWORKS WITHOUT AFFECTING EXISTING OUTPUTS
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
64%
Grant Probability
85%
With Interview (+20.7%)
4y 3m
Median Time to Grant
High
PTA Risk
Based on 36 resolved cases by this examiner. Grant probability derived from career allow rate.