DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This action is in response to the submission filed 16 December 2025 for application 17/963,496. Currently claim 12 is canceled. Claims 1, 13, 14, and 17 are amended. Claims 1-11 and 13-21 are pending and have been examined.
The §112(b) rejection of claim 20 has been withdrawn in view of the amendments made.
Response to Arguments
Regarding applicant’s arguments, filed 16 December 2025, see pages 7-11, with respect to Claims being rejected under 35 USC § 101, the Applicant respectfully disagrees. Without conceding anything with respect to the Office Action's analysis under Step 2A, Prong One of the Alice/Mayo framework, the claims are believed to encompass patent eligible subject matter for at least the reasons that each of the independent claims, when looked at as a whole, integrates the subject matter into a practical application and therefore should be found to be patent eligible at least under Step 2A, Prong Two. Under Step 2A, Prong Two, it is determined whether the judicial exception is integrated into a practical application.
On Page 8 Applicant submits that any additional elements reflect an improvement in the functioning of a computer, or an improvement to other technology or technical field. The claims of the present application relate to the field of machine learning, and in particular federated learning in which a plurality of client devices collaborate to learn a global model. For greater clarity, the independent claims have been amended to positively recite "training a global model based on the global predictive posterior to obtain a trained global model".
On Page 8, Applicant argues that as described in paragraphs [0005]-[0013] of the application as filed, embodiments of the present claims provide improvements in performance of Bayesian inference in a federated machine learning system. Such improvements may include reduced communication costs, reduced time and computational resources and/or increased accuracy of predictions. Paragraphs [0085]-[0091] further discuss the details for obtaining the trained global model, based on an aggregation of local predictive posteriors from a plurality of client computing systems. Embodiments of the claimed subject matter were evaluated and found to provide improvements over baseline techniques (see discussion in paragraphs [0144]-[0148]. Accordingly, the claimed subject matter clearly provides for improvements in machine learning.
Lastly, on Page 11, Applicant argues that in view of the ARP decision of Ex Pare Desjardins et al., the present claims are clearly directed to improved methods and systems for Bayesian federated learning, which reflect various technical advantages and improvements in the field of machine learning and that integrate the claimed subject matter into a practical application. Therefore, the present claims should be found to be patent eligible at least under Step 2A, Prong Two.
Examiners response: Applicant’s arguments have been fully considered but they are not persuasive. Examiner respectfully disagrees that the claims integrate the abstract idea into a practical application because the improvement here is in the method of generating the global predictive posterior which is abstract because under its broadest reasonable interpretation it recites a mathematical calculation. Hence, it covers mathematical concepts but for the recitation of generic computer components and falls within the “Mathematical concepts” grouping of abstract ideas.
Although there is a training step, it is identified as an additional element in Step 2A, prong 2 and is recited to merely apply the abstract idea (global predictive posterior) to a global model to get a trained global model as the very last step of claim 1. The limitations prior to the training step recite a method for calculating the global predictive posterior and does not recite any training. Hence, training is like a black box with no recitation of the actual details of the training itself. As explained in MPEP 2106.05(f), a claim that generically recites an effect of the judicial exception or claims every mode of accomplishing that effect, amounts to a claim that is merely adding the words "apply it" to the judicial exception. Mere instructions to apply an exception cannot provide an inventive concept and does not amount to significantly more than the judicial exception.
Lastly, Applicant themselves acknowledge on Page 8 ((last but one paragraph) that the present claims provide improvements in performance of Bayesian inference. Bayesian inference is a mathematical concept and they acknowledge that the improvement is in the abstract idea. As disclosed in MPEP 2106.05(a) it is important to note, the judicial exception alone cannot provide the improvement. Hence, the claims are not patent eligible.
Regarding applicant’s arguments, filed 16 December 2025, see pages 11 and 12, with respect to Claims being rejected under 35 USC § 103, Applicant argues that in contrast, the present claims describe an approach based on predictive posteriors, including "obtaining a local predictive posterior of each client computing system of a plurality of client computing systems" and "aggregating the local predictive posteriors of the computing system and the plurality of client computing systems to generate a global predictive posterior" as recited in claim 17. It should be appreciated that a predictive posterior is not equivalent to a weight space posterior.
While a weight space posterior is a distribution over the weights of the model, a predictive posterior refers to a distribution over the predictions generated by the model. Considering the large number of weights in many deep learning models (e.g., millions or billions of weights), the approach based on weight space posterior becomes impractical to implement for large models. However, the prediction space may be smaller and thus the use of predictive posteriors allows for practical implementation and scalability.
Kassab fails to disclose any federated learning system based on predictive posteriors, and in particular fails to disclose at least "obtaining a local predictive posterior of each client computing system of a plurality of client computing systems" and "aggregating the local predictive posteriors of the computing system and the plurality of client computing systems to generate a global predictive posterior", as recited in claim 17. The Applicant notes that independent claims
1 and 14 recite similar elements that also are not disclosed by Kassab.
For at least the foregoing reasons, claims 17-19 are not disclosed by Kassab. The Applicant respectfully asks that the rejection under 35 U.S.C. 102 be withdrawn.
Examiners response: Applicant’s arguments have been fully considered but they are not persuasive. Examiner respectfully disagrees that the claims under 35 USC § 102 can be withdrawn because the reference Kassab teaches each and every element of the amended claim 17 as shown below in the detailed rejection. The Abstract of Kassab states that DSVGD maintains a number of non-random and interacting particles at a central server to represent the current iterate of the model global posterior. Furthermore, Page 10, Paragraph 2 of Kassab states that this result hinges on the fact that Bayesian learning provides a predictive distribution that is a more accurate estimate of the ground-truth posterior distribution. Also, Page 18 states, A.3.1 PREDICTIVE DISTRIBUTION FOR BAYESIAN LOGISTIC REGRESSION WITH SVGD AND DSVGD. A.3.2 PREDICTIVE DISTRIBUTION FOR BAYESIAN NEURAL NETWORKS WITH SVGD AND DSVGD. When taken all together under the broadest reasonable interpretation, Kassab teaches both global posteriors and predictive distributions. Hence, this clearly shows that Kassab teaches each and every limitation of claim 17.
Furthermore, claims 18, 19, and 21 depend from independent claim 17, and are rejected under 35 USC § 102 for the same reason.
Regarding applicant’s arguments, filed 16 December 2025, see pages 12 and 13, with respect to Claims being rejected under 35 USC § 103, Applicant argues that as explained previously, Kassab fails to disclose any predictive posterior, and in particular fails to disclose "processing the model space prior and the local model to generate a local predictive posterior" and "aggregating the local predictive posteriors of the plurality of client computing systems to generate a global predictive posterior" as recited in claim 1. Claim 14 recites similar elements.
The deficiencies of Kassab are not remedied by further combination with Rollo and/or Pastore. For example, neither Rollo nor Pastore disclose the use of any predictive posterior in Bayesian federated learning. Thus, the combination of Kassab with Rollo and/or Pastore still fails to disclose "processing the model space prior and the local model to generate a local predictive posterior" and "aggregating the local predictive posteriors of the plurality of client computing systems to generate a global predictive posterior".
Accordingly, claim 1 is not disclosed by Kassab, Rollo and/or Pastore, whether taken alone or in any combination. Similar, claims 14 and 17 are not disclosed by Kassab, Rollo and/or Pastore, whether taken alone or in any combination. Claims 1, 14 and 17, and their respective dependent claims, are therefore allowable over disclosed by Kassab, Rollo and/or Pastore, whether taken alone or in any combination. The Applicant respectfully asks that the rejections under 35 U.S.C. 103 be withdrawn.
Examiners response: Applicant’s arguments have been fully considered but they are not persuasive. Examiner respectfully disagrees that the claims under 35 USC § 102 can be withdrawn because the reference Kassab teaches each and every element of the amended claim 17 as shown below in the detailed rejection. The Abstract of Kassab states that DSVGD maintains a number of non-random and interacting particles at a central server to represent the current iterate of the model global posterior. Furthermore, Page 10, Paragraph 2 of Kassab states that this result hinges on the fact that Bayesian learning provides a predictive distribution that is a more accurate estimate of the ground-truth posterior distribution. Also, Page 18 states, A.3.1 PREDICTIVE DISTRIBUTION FOR BAYESIAN LOGISTIC REGRESSION WITH SVGD AND DSVGD. A.3.2 PREDICTIVE DISTRIBUTION FOR BAYESIAN NEURAL NETWORKS WITH SVGD AND DSVGD. When taken all together under the broadest reasonable interpretation, Kassab teaches both global posteriors and predictive distributions. Furthermore, Rollo is relied upon to teach only the “processing a local dataset of the client computing system to adjust one or more of the plurality of learnable parameters of the local model” limitation. Hence, this clearly shows that the combination of Kassab and Rollo teaches each and every limitation of claim 1 and 14 as shown in the detailed rejection below.
Claim Objections
Claim 21 is objected to because of the following informalities: Claim 21 is not a sentence because it is missing a period at the end. Appropriate correction is required.
Claim 17 is objected to because of the following informalities: The phrase “training global model…” is awkwardly worded. Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1 – 11 and 13-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed towards abstract ideas without significantly more.
Regarding claims 1-13 and 20:
According to the first step (Step 1) of the 101 analysis, claims 1-13 and 20 are directed to a method for Bayesian federated learning (process) and falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Regarding claim 1:
In the next step (Step 2A, prong 1) of the analysis, the limitation of:
processing a local dataset of the client computing system to adjust one or more of the plurality of learnable parameters of the local model;
Under the broadest reasonable interpretation, the above limitation is a process step that covers mental process including an observation, evaluation, judgment or opinion that could be performed in the mind or with the aid of pencil and paper but for the recitation of a generic computer component. If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.
In the same step (Step 2A, prong 1) of the analysis, the limitations of:
and processing the model space prior and the local model to generate a local predictive posterior;
and aggregating the local predictive posteriors of the plurality of client computing systems to generate a global predictive posterior.
under the broadest reasonable interpretation, are process steps that recite mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, the limitation:
comprising: at each client computing system of a plurality of client computing systems:
training a global model based on the global predictive posterior to obtain a trained global model.
is considered to be an additional element and it does not integrate the abstract idea into a practical application because the additional element is recited so generically (no details whatsoever are provided other than that it is a method that comprises at each client computing system of a plurality of client computing systems and training a global model based on the global predictive posterior to obtain a trained global model) that it represents no more than mere instructions to apply the judicial exception on a computer. As discussed in MPEP 2106.05(f), mere instructions to implement an abstract idea on a computer as a tool to perform an abstract idea is not indicative of integration into a practical application.
In the same step (Step 2A, prong 2) of the analysis, the limitation:
obtaining a model space prior comprising a prior probability distribution over a plurality of learnable parameters of a local model of the client computing system;
is considered to be an additional element and as recited represent insignificant extra-solution activity because it is mere data gathering. See MPEP 2106.05(g), discussing limitations that the Federal Circuit has considered to be insignificant extra-solution activity.
In the last step (Step 2B) of the analysis, the additional element of “comprising: at each client computing system of a plurality of client computing systems; and training a global model based on the global predictive posterior to obtain a trained global model” does not amount to significantly more than the judicial exceptions. As explained with respect to Step 2A Prong Two, a method that comprises at each client computing system of a plurality of client computing systems and training a global model based on the global predictive posterior to obtain a trained global model, is at best the equivalent of merely adding the words “apply it” to the judicial exception. See MPEP 2106.05(f). Mere instructions to apply an exception cannot provide an inventive concept and does not amount to significantly more than the judicial exception.
In the same step (Step 2B) of the analysis, as discussed above the additional element of obtaining a model space prior comprising a prior probability distribution over a plurality of learnable parameters of a local model of the client computing system, which is recited at a high level of generality and amounts to extra-solution activity of receiving data i.e. pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory"). These limitations therefore remain insignificant extra-solution activity even upon reconsideration, and do not amount to significantly more. The claim is not patent eligible.
Regarding claim 2:
In the next step (Step 2A, prong 1) of the analysis, the limitation of:
wherein: the local predictive posterior is generated using a Markov Chain Monte Carlo algorithm to process the model space prior and the local model.
under the broadest reasonable interpretation, is a process step that recites mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding claim 3:
In the same step (Step 2A, prong 2) of the analysis, the limitation:
wherein: obtaining the model space prior comprises receiving the model space prior from a server that computes the model space prior
is considered to be an additional element and as recited represent insignificant extra-solution activity because it is mere data gathering. See MPEP 2106.05(g), discussing limitations that the Federal Circuit has considered to be insignificant extra-solution activity.
In the last step (Step 2B) of the analysis, as discussed above the additional element of wherein: obtaining the model space prior comprises receiving the model space prior from a server that computes the model space prior, which is recited at a high level of generality and amounts to extra-solution activity of receiving data i.e. pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory"). These limitations therefore remain insignificant extra-solution activity even upon reconsideration, and do not amount to significantly more. The claim is not patent eligible.
Regarding claim 4:
In the next step (Step 2A, prong 2) of the analysis, the limitation:
wherein: the model space prior is a predetermined model space prior
is considered to be an additional element and it does not integrate the abstract idea into a practical application because the additional element is recited so generically (no details whatsoever are provided other than that it is a method wherein: the model space prior is a predetermined model space prior) that it represents no more than mere instructions to apply the judicial exception on a computer. As discussed in MPEP 2106.05(f), mere instructions to implement an abstract idea on a computer as a tool to perform an abstract idea is not indicative of integration into a practical application.
In the same step (Step 2A, prong 2) of the analysis, the limitation:
and obtaining the model space prior comprises retrieving the predetermined model space prior from a memory of the client computing system.
is considered to be an additional element and as recited represent insignificant extra-solution activity because it is mere data gathering. See MPEP 2106.05(g), discussing limitations that the Federal Circuit has considered to be insignificant extra-solution activity.
In the next step (Step 2B) of the analysis, the additional element of wherein: the model space prior is a predetermined model space prior, does not amount to significantly more than the judicial exceptions. As explained with respect to Step 2A Prong Two, a method wherein: the model space prior is a predetermined model space prior, is at best the equivalent of merely adding the words “apply it” to the judicial exception. See MPEP 2106.05(f). Mere instructions to apply an exception cannot provide an inventive concept and does not amount to significantly more than the judicial exception. The claim is not patent eligible.
In the same step (Step 2B) of the analysis, as discussed above the additional element of and obtaining the model space prior comprises retrieving the predetermined model space prior from a memory of the client computing system, which is recited at a high level of generality and amounts to extra-solution activity of receiving data i.e. pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory"). These limitations therefore remain insignificant extra-solution activity even upon reconsideration, and do not amount to significantly more. The claim is not patent eligible.
Regarding claim 5:
In the next step (Step 2A, prong 2) of the analysis, the limitation:
wherein: aggregating the local predictive posteriors comprises: sending the local predictive posteriors of the plurality of client computing systems to a server to generate the global predictive posterior.
is considered to be an additional element and as recited represents insignificant extra-solution activity that is merely transmitting data, because it is a mere nominal or tangential addition to the claim and is therefore not indicative of integration into a practical application. See MPEP 2106.05(g).
In the last step (Step 2B) of the analysis, the recitation of wherein: aggregating the local predictive posteriors comprises: sending the local predictive posteriors of the plurality of client computing systems to a server to generate the global predictive posterior, limitation amounts to insignificant extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data transmission (see MPEP 2106.05(d)). The courts have similarly found limitations directed to receiving or transmitting data over a network, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "Receiving or transmitting data over a network."). These limitations therefore remain insignificant extra-solution activity even upon reconsideration, and do not amount to significantly more. Even when considered in combination, these additional elements represent mere instructions to apply an exception and insignificant extra-solution activity, which cannot provide an inventive concept. The claim is not patent eligible.
Regarding claim 6:
In the next step (Step 2A, prong 1) of the analysis, the limitation of:
wherein: aggregating the local predictive posteriors comprises: and processing, at the first client computing system, the plurality of local predictive posteriors to generate the global predictive posterior.
under the broadest reasonable interpretation, is a process step that recites mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis,
receiving, at a first client computing system of the plurality of client computing systems, the local predictive posteriors of the plurality of client computing systems;
is considered to be an additional element and as recited represent insignificant extra-solution activity because it is mere data gathering. See MPEP 2106.05(g), discussing limitations that the Federal Circuit has considered to be insignificant extra-solution activity.
In the next step (Step 2B) of the analysis, as discussed above the additional element of receiving, at a first client computing system of the plurality of client computing systems, the local predictive posteriors of the plurality of client computing systems, which is recited at a high level of generality and amounts to extra-solution activity of receiving data i.e. pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory"). These limitations therefore remain insignificant extra-solution activity even upon reconsideration, and do not amount to significantly more. The claim is not patent eligible.
Regarding claim 7:
In the next step (Step 2A, prong 1) of the analysis, the limitation of:
wherein: the local predictive posterior comprises a plurality of posterior probability samples over a corresponding plurality of query inputs.
under the broadest reasonable interpretation, is a process step that recites mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding claim 8:
In the next step (Step 2A, prong 2) of the analysis,
wherein: the plurality of query inputs used by each client computing system are obtained from a shared data set;
and each client computing system obtains the shared data set from a server.
is considered to be an additional element and as recited represent insignificant extra-solution activity because it is mere data gathering. See MPEP 2106.05(g), discussing limitations that the Federal Circuit has considered to be insignificant extra-solution activity.
In the next step (Step 2B) of the analysis, as discussed above the additional elements of wherein: the plurality of query inputs used by each client computing system are obtained from a shared data set; and each client computing system obtains the shared data set from a server, which is recited at a high level of generality and amounts to extra-solution activity of receiving data i.e. pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory"). These limitations therefore remain insignificant extra-solution activity even upon reconsideration, and do not amount to significantly more. The claim is not patent eligible.
Regarding claim 9:
In the next step (Step 2A, prong 1) of the analysis, the limitation of:
wherein: aggregating the local predictive posteriors comprises using Gaussian approximation to:
for each client computing system, process the respective local predictive posterior to estimate a respective sample mean and covariance;
and process the sample means and covariances for the plurality of client computing systems to estimate a mean and covariance of the global predictive posterior.
under the broadest reasonable interpretation, is a process step that recites mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding claim 10:
In the next step (Step 2A, prong 1) of the analysis, the limitation of:
wherein: the global predictive posterior comprises a regression prediction;
and processing the sample means and covariances comprises: averaging the sample means using a weight based on the covariances.
under the broadest reasonable interpretation, is a process step that recites mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding claim 11:
In the next step (Step 2A, prong 1) of the analysis, the limitation of:
wherein: aggregating the local predictive posteriors comprises using a Kernel Density Estimator to:
for each client computing system, process a plurality of samples of the respective local predictive posterior to estimate a density of the respective local predictive posterior;
and process the estimated densities for the plurality of client computing systems, using an optimization algorithm, to estimate the global predictive posterior.
under the broadest reasonable interpretation, is a process step that recites mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding claim 13:
In the next step (Step 2A, prong 2) of the analysis, the limitation of:
wherein: training the global model comprises training the global model to approximate the global predictive posterior, on a server, using knowledge distillation;
is considered to be an additional element and it does not integrate the abstract idea into a practical application because the additional element is recited so generically (no details whatsoever are provided other than that it is a method wherein: training the global model comprises training the global model to approximate the global predictive posterior, on a server, using knowledge distillation) that it represents no more than mere instructions to apply the judicial exception on a computer. As discussed in MPEP 2106.05(f), mere instructions to implement an abstract idea on a computer as a tool to perform an abstract idea is not indicative of integration into a practical application.
In the same step (Step 2A, prong 2) of the analysis, the limitation of:
and the method further comprises communicating the trained global model to each client computing system of the plurality of client computing systems.
is considered to be an additional element and as recited represents insignificant extra-solution activity that is merely transmitting data, because it is a mere nominal or tangential addition to the claim and is therefore not indicative of integration into a practical application. See MPEP 2106.05(g).
In the last step (Step 2B) of the analysis, the additional element does not amount to significantly more than the judicial exceptions. As explained with respect to Step 2A Prong Two, the method wherein: training the global model comprises training the global model to approximate the global predictive posterior, on a server, using knowledge distillation, is at best the equivalent of merely adding the words “apply it” to the judicial exception. See MPEP 2106.05(f). Mere instructions to apply an exception cannot provide an inventive concept and does not amount to significantly more than the judicial exception.
In the same step (Step 2B) of the analysis, the recitation of, and the method further comprises communicating the trained global model to each client computing system of the plurality of client computing systems, limitation amounts to insignificant extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data transmission (see MPEP 2106.05(d)). The courts have similarly found limitations directed to receiving or transmitting data over a network, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "Receiving or transmitting data over a network."). These limitations therefore remain insignificant extra-solution activity even upon reconsideration, and do not amount to significantly more. Even when considered in combination, these additional elements represent mere instructions to apply an exception and insignificant extra-solution activity, which cannot provide an inventive concept. The claim is not patent eligible.
Regarding claims 14-16:
According to the first step (Step 1) of the 101 analysis, claims 14-16 are directed to a computing system (machine) and falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Regarding claim 14:
In the next step (Step 2A, prong 1) of the analysis, the limitation of:
processing a local dataset to adjust one or more of the plurality of learnable parameters of the local model;
Under the broadest reasonable interpretation, the above limitation is a process step that covers mental process including an observation, evaluation, judgment or opinion that could be performed in the mind or with the aid of pencil and paper but for the recitation of a generic computer component. If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.
In the same step (Step 2A, prong 1) of the analysis, the limitations of:
processing the model space prior and the local model to generate a local predictive posterior;
and aggregating the local predictive posteriors of the computing system and the plurality of client computing systems to generate a global predictive posterior.
under the broadest reasonable interpretation, are process steps that recite mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, the limitation:
A computing system comprising:
a processing device; and
a memory storing thereon:
a local model comprising a plurality of learnable parameters;
a local dataset;
and machine-executable instructions which, when executed by the processing device, cause the computing system to perform Bayesian federated learning by:
training a global model based on the global predictive posterior to obtain a trained global model.
is considered to be an additional element and it does not integrate the abstract idea into a practical application because the additional element is recited so generically (no details whatsoever are provided other than that it is a computing system comprising: a processing device; and a memory storing thereon: a local model comprising a plurality of learnable parameters; a local dataset; and machine-executable instructions which, when executed by the processing device, cause the computing system to perform Bayesian federated learning by: training a global model based on the global predictive posterior to obtain a trained global model) that it represents no more than mere instructions to apply the judicial exception on a computer. As discussed in MPEP 2106.05(f), mere instructions to implement an abstract idea on a computer as a tool to perform an abstract idea is not indicative of integration into a practical application.
In the same step (Step 2A, prong 2) of the analysis, the limitation:
obtaining a model space prior comprising a prior probability distribution over a plurality of learnable parameters;
obtaining a local predictive posterior of each client computing system of a plurality of client computing systems;
is considered to be an additional element and as recited represent insignificant extra-solution activity because it is mere data gathering. See MPEP 2106.05(g), discussing limitations that the Federal Circuit has considered to be insignificant extra-solution activity.
In the last step (Step 2B) of the analysis, the additional element of “A computing system comprising: a processing device; and a memory storing thereon: a local model comprising a plurality of learnable parameters; a local dataset; and machine-executable instructions which, when executed by the processing device, cause the computing system to perform Bayesian federated learning by: training a global model based on the global predictive posterior to obtain a trained global model” does not amount to significantly more than the judicial exceptions. As explained with respect to Step 2A Prong Two, a computing system comprising: a processing device; and a memory storing thereon: a local model comprising a plurality of learnable parameters; a local dataset; and machine-executable instructions which, when executed by the processing device, cause the computing system to perform Bayesian federated learning by: training a global model based on the global predictive posterior to obtain a trained global model, is at best the equivalent of merely adding the words “apply it” to the judicial exception. See MPEP 2106.05(f). Mere instructions to apply an exception cannot provide an inventive concept and does not amount to significantly more than the judicial exception.
In the same step (Step 2B) of the analysis, as discussed above the additional elements of obtaining a model space prior comprising a prior probability distribution over a plurality of learnable parameters; obtaining a local predictive posterior of each client computing system of a plurality of client computing systems, which is recited at a high level of generality and amounts to extra-solution activity of receiving data i.e. pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory"). These limitations therefore remain insignificant extra-solution activity even upon reconsideration, and do not amount to significantly more. The claim is not patent eligible.
Regarding claim 15:
In the next step (Step 2A, prong 1) of the analysis, the limitation of:
wherein: aggregating the local predictive posteriors comprises using Gaussian approximation to:
for the computing system and each client computing system, process the respective local predictive posterior to estimate a respective sample mean and covariance;
and process the sample means and covariances for the plurality of client computing systems to estimate a mode and covariance of the global predictive posterior.
under the broadest reasonable interpretation, is a process step that recites mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding claim 16:
In the next step (Step 2A, prong 1) of the analysis, the limitation of:
wherein: aggregating the local predictive posteriors comprises using a Kernel Density Estimator to:
for each client computing system, process a plurality of samples of the respective local predictive posterior to estimate a density of the respective local predictive posterior;
and process the estimated densities for the plurality of client computing systems, using an optimization algorithm, to estimate the global predictive posterior.
under the broadest reasonable interpretation, is a process step that recites mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding claims 17-19:
According to the first step (Step 1) of the 101 analysis, claims 17-19 are directed to a server comprising a processing device and a memory (manufacture) and falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Regarding claim 17:
In the next step (Step 2A, prong 1) of the analysis, the limitation of:
and aggregating the local predictive posteriors of the computing system and the plurality of client computing systems to generate a global predictive posterior.
under the broadest reasonable interpretation, are process steps that recite mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, the limitation:
A server comprising:
a processing device;
and a memory storing thereon machine-executable instructions which, when executed by the processing device, cause the server to perform Bayesian federated learning by:
training global model based on the global predictive posterior to obtain a trained global model.
is considered to be an additional element and it does not integrate the abstract idea into a practical application because the additional element is recited so generically (no details whatsoever are provided other than that it is a server comprising: a processing device; and a memory storing thereon machine-executable instructions which, when executed by the processing device, cause the server to perform Bayesian federated learning by: training global model based on the global predictive posterior to obtain a trained global model) that it represents no more than mere instructions to apply the judicial exception on a computer. As discussed in MPEP 2106.05(f), mere instructions to implement an abstract idea on a computer as a tool to perform an abstract idea is not indicative of integration into a practical application.
In the same step (Step 2A, prong 2) of the analysis, the limitation:
obtaining a local predictive posterior of each client computing system of a plurality of client computing systems;
is considered to be an additional element and as recited represent insignificant extra-solution activity because it is mere data gathering. See MPEP 2106.05(g), discussing limitations that the Federal Circuit has considered to be insignificant extra-solution activity.
In the last step (Step 2B) of the analysis, the additional elements of “A server comprising: a processing device; and a memory storing thereon machine-executable instructions which, when executed by the processing device, cause the server to perform Bayesian federated learning by: training global model based on the global predictive posterior to obtain a trained global model” does not amount to significantly more than the judicial exceptions. As explained with respect to Step 2A Prong Two, a server comprising: a processing device; and a memory storing thereon machine-executable instructions which, when executed by the processing device, cause the server to perform Bayesian federated learning, is at best the equivalent of merely adding the words “apply it” to the judicial exception. See MPEP 2106.05(f). Mere instructions to apply an exception cannot provide an inventive concept and does not amount to significantly more than the judicial exception.
In the same step (Step 2B) of the analysis, as discussed above the additional element of obtaining a local predictive posterior of each client computing system of a plurality of client computing systems, which is recited at a high level of generality and amounts to extra-solution activity of receiving data i.e. pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory"). These limitations therefore remain insignificant extra-solution activity even upon reconsideration, and do not amount to significantly more. The claim is not patent eligible.
Regarding claim 18:
In the next step (Step 2A, prong 1) of the analysis, the limitation of:
wherein: aggregating the local predictive posteriors comprises using Gaussian approximation to:
for each client computing system, process the respective local predictive posterior to estimate a respective sample mean and covariance;
and process the sample means and covariances for the plurality of client computing systems to estimate a mean and covariance of the global predictive posterior.
under the broadest reasonable interpretation, is a process step that recites mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding claim 19:
In the next step (Step 2A, prong 1) of the analysis, the limitation of:
wherein: aggregating the local predictive posteriors comprises using a Kernel Density Estimator to:
for each client computing system, process a plurality of samples of the respective local predictive posterior to estimate a density of the respective local predictive posterior;
and process the estimated densities for the plurality of client computing systems, using an optimization algorithm, to estimate the global predictive posterior.
under the broadest reasonable interpretation, is a process step that recites mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding claim 20:
In step (Step 2A, prong 2) of the analysis, the limitation of:
A non-transitory processor-readable medium having machine-executable instructions stored thereon which, when executed by a processing device of a computing system, cause the computing system to perform the method of claim 1.
is considered to be an additional element and it does not integrate the abstract idea into a practical application because the additional element is recited so generically (no details whatsoever are provided other than that it is a non-transitory processor-readable medium having machine-executable instructions stored thereon which, when executed by a processing device of a computing system, cause the computing system to perform the method of claim 1) that it represents no more than mere instructions to apply the judicial exception on a computer. As discussed in MPEP 2106.05(f), mere instructions to implement an abstract idea on a computer as a tool to perform an abstract idea is not indicative of integration into a practical application.
In the last step (Step 2B) of the analysis, the additional element does not amount to significantly more than the judicial exceptions. As explained with respect to Step 2A Prong Two, a non-transitory processor-readable medium having machine-executable instructions stored thereon which, when executed by a processing device of a computing system, cause the computing system to perform the method of claim 1, is at best the equivalent of merely adding the words “apply it” to the judicial exception. See MPEP 2106.05(f). Mere instructions to apply an exception cannot provide an inventive concept and does not amount to significantly more than the judicial exception. The claim is not patent eligible.
Regarding claim 21:
In the next step (Step 2A, prong 2) of the analysis, the limitation of:
wherein training the global model comprises training the global model to approximate the global predictive posterior, on a server, using knowledge distillation;
is considered to be an additional element and it does not integrate the abstract idea into a practical application because the additional element is recited so generically (no details whatsoever are provided other than that it is a method wherein training the global model comprises training the global model to approximate the global predictive posterior, on a server, using knowledge distillation) that it represents no more than mere instructions to apply the judicial exception on a computer. As discussed in MPEP 2106.05(f), mere instructions to implement an abstract idea on a computer as a tool to perform an abstract idea is not indicative of integration into a practical application.
In the same step (Step 2A, prong 2) of the analysis, the limitation of:
and wherein the instructions further cause the server to: communicate the trained global model to each client computing system of the plurality of client computing systems.
is considered to be an additional element and as recited represents insignificant extra-solution activity that is merely transmitting data, because it is a mere nominal or tangential addition to the claim and is therefore not indicative of integration into a practical application. See MPEP 2106.05(g).
In the last step (Step 2B) of the analysis, the additional element does not amount to significantly more than the judicial exceptions. As explained with respect to Step 2A Prong Two, the method wherein training the global model comprises training the global model to approximate the global predictive posterior, on a server, using knowledge distillation, is at best the equivalent of merely adding the words “apply it” to the judicial exception. See MPEP 2106.05(f). Mere instructions to apply an exception cannot provide an inventive concept and does not amount to significantly more than the judicial exception.
In the same step (Step 2B) of the analysis, the recitation of, and wherein the instructions further cause the server to: communicate the trained global model to each client computing system of the plurality of client computing systems, limitation amounts to insignificant extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data transmission (see MPEP 2106.05(d)). The courts have similarly found limitations directed to receiving or transmitting data over a network, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "Receiving or transmitting data over a network."). These limitations therefore remain insignificant extra-solution activity even upon reconsideration, and do not amount to significantly more. Even when considered in combination, these additional elements represent mere instructions to apply an exception and insignificant extra-solution activity, which cannot provide an inventive concept. The claim is not patent eligible.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 17-19 and 21 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kassab et al (FEDERATED GENERALIZED BAYESIAN LEARNING VIA DISTRIBUTED STEIN VARIATIONAL GRADIENT DESCENT, 2021).
Regarding claim 17:
Kassab teaches: A server comprising: a processing device; and a memory storing thereon machine-executable instructions which, when executed by the processing device, cause the server to perform Bayesian federated learning by ([Page 1, Last but one Paragraph] This paper introduces a trustworthy solution that is able to reduce the number of communication rounds via a non-parametric variational inference-based implementation of federated Bayesian learning. [Page 31, Last Paragraph, Section C.2 SOFTWARE DETAILS] We implement all experiments in PyTorch (Paszke et al., 2019) Version 10.3.1. Our experiments and code are based on the original SVGD experiments and code available at: https://github.com/DartML/Stein-Variational-Gradient-Descent. More specifically, DSVGD can be easily obtained by running SVGD twice at each scheduled agent and suitably adjusting its target distribution. Our code is attached with the supplementary materials):
obtaining a local predictive posterior of each client computing system of a plurality of client computing systems ([Abstract] DSVGD is shown to compare favorably to benchmark frequentist and Bayesian federated learning strategies in terms of accuracy and scalability with respect to the number of agents, while also providing well-calibrated, and hence trustworthy, predictions. [Page 2, Paragraph 1] Figure 1: Federated learning across K agents equipped with local datasets and assisted by a central server: (a) in DVI agents exchange the current model posterior q(i)() with the server. [Page 2, Last Paragraph] We consider the federated learning set-up in Fig. 1, where each agent k = 1. [Page 10, Paragraph 2] This result hinges on the fact that Bayesian learning provides a predictive distribution that is a more accurate estimate of the ground-truth posterior distribution. [Page 18] A.3.1 PREDICTIVE DISTRIBUTION FOR BAYESIAN LOGISTIC REGRESSION WITH SVGD AND DSVGD. A.3.2 PREDICTIVE DISTRIBUTION FOR BAYESIAN NEURAL NETWORKS WITH SVGD AND DSVGD. Note: K agents corresponds to plurality of client computing systems and k=1 corresponds to a first client computing system among them):
and aggregating the local predictive posteriors of the computing system and the plurality of client computing systems to generate a global predictive posterior ([Abstract] DSVGD maintains a number of non-random and interacting particles at a central server to represent the current iterate of the model global posterior. [Page 2, Last Paragraph] We consider the federated learning set-up in Fig. 1, where each agent k = 1; : : : ;K has a distinct local dataset with associated training loss Lk() for model parameter . The agents communicate through a central node with the goal of computing the global posterior distribution q() over the shared model parameter 2 Rd for some prior distribution p0());
and training global model based on the global predictive posterior to obtain a trained global model ([Abstract] DSVGD maintains a number of non-random and interacting particles at a central server to represent the current iterate of the model global posterior. [Page 1, Section I, Paragraph 1] Federated learning refers to the collaborative training of a machine learning model across agents with distinct data sets. [Page 1, Last Paragraph] Federated Bayesian learning has the general aim of computing the global posterior distribution in the model parameter space. [Page 2, Last Paragraph] We consider the federated learning set-up in Fig. 1, where each agent k = 1; : : : ;K has a distinct local dataset with associated training loss Lk() for model parameter . The agents communicate through a central node with the goal of computing the global posterior distribution).
Regarding claim 18:
Kassab teaches: The server of claim 17, wherein: aggregating the local predictive posteriors comprises using Gaussian approximation to: for each client computing system, process the respective local predictive posterior to estimate a respective sample mean and covariance ([Page 19, Paragraph 3] Communication load. Using DSVGD, the communication load between a scheduled agent and the central server is of the order O(Nd) since N particles of dimensions d need to be exchanged at each communication round. In contrast, the communication load of PVI depends on the selected parametrization. For instance, one can use PVI with a fully factorized Gaussian approximate posterior, which requires only 2d parameters to be shared with the server, namely mean and variance of each of the d parameters at the price of having lower accuracy);
and process the sample means and covariances for the plurality of client computing systems to estimate a mean and covariance of the global predictive posterior ([Page 17, Paragraph 1] We show here that PVI with a Gaussian variational posterior q(j) = N(j2; 2Id) of fixed covariance 2Id and mean 2 parametrized by natural parameter can be recovered as a special case of U-DSVGD. To elaborate, consider U-DSVGD with one particle 1 (i.e., N = 1), an RKHS kernel that satisfies rk(; ) = 0 and k(; ) = 1 (the RBF kernel is an example of such kernel) and an isotropic Gaussian kernel K(; (i) 1 ) = N(j(i) 1 ; 2Id) of bandwidth used for computing the KDE of the global posterior using the particles).
Regarding claim 19:
Kassab teaches: The server of claim 17, wherein: aggregating the local predictive posteriors comprises using a Kernel Density Estimator to: for each client computing system, process a plurality of samples of the respective local predictive posterior to estimate a density of the respective local predictive posterior ([Page 4, Section 4.2, Paragraph 1] SVGD tackles the minimization of the (scaled) free energy functional D(q()jj~p()), for an unnormalized target distribution ~p(), over a non-parametric generalized posterior q() defined over the model parameters 2 Rd. The posterior q() is represented by a set of particles fngNn =1, with n 2 Rd. In practice, an approximation of q() can be obtained from the particles fngN n=1 through a Kernel Density Estimator (KDE) as q() = N1PN n=1 K(; n) for some kernel function K(; ));
and process the estimated densities for the plurality of client computing systems, using an optimization algorithm, to estimate the global predictive posterior ([Page 6, Paragraph 2] Following SVGD, the update (9) is optimized to maximize the steepest descent decrease of the Kullback–Leibler (KL) divergence between the approximate global posterior q[l] -() encoded via particles f[l] n gN n=1 and the tilted distribution ~p(i) k () in (11) (see Fig. 1(b), step 2 )).
Regarding claim 21:
Kassab teaches: The server of claim 17, wherein training the global model comprises training the global model to approximate the global predictive posterior, on a server, using knowledge distillation ([Page 19, Paragraph 2] Furthermore, the L0 distillation iterations in the second loop can be performed by the scheduled agent after it has sent its global particles to the central server. This enables the pipelining of the second loop with the operations at the server and at other agents, which can potentially reduce the wall-clock time per communication round. [Page 23, Paragraph 1] Figure 9: KL divergence between exact and approximate global posteriors);
and wherein the instructions further cause the server to: communicate the trained global model to each client computing system of the plurality of client computing systems ([Page 1, Section 1, Paragraph 1] Federated learning refers to the collaborative training of a machine learning model across agents/ Note: Agents correspond to plurality of client computing systems).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-6, 9-11, 13-16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kassab et al (FEDERATED GENERALIZED BAYESIAN LEARNING VIA DISTRIBUTED STEIN VARIATIONAL GRADIENT DESCENT, 2021) in view of Rollo et al (Preliminary conclusions about federated learning applied to clinical data, 2021).
Regarding claim 1:
Kassab teaches: A method for Bayesian federated learning, comprising: at each client computing system of a plurality of client computing systems ([Page 1, Last but one Paragraph] This paper introduces a trustworthy solution that is able to reduce the number of communication rounds via a non-parametric variational inference-based implementation of federated Bayesian learning. [Page 2, Figure 1] Federated learning across K agents. Note: K agents corresponds to a plurality of client computing systems):
obtaining a model space prior comprising a prior probability distribution over a plurality of learnable parameters of a local model of the client computing system ([Page 2, Section 2] We consider the federated learning set-up in Fig. 1, where each agent k = 1; : : : ;K has a distinct local dataset with associated training loss Lk() for model parameter . The agents communicate through a central node with the goal of computing the global posterior distribution q() over the shared model parameter 2 Rd for some prior distribution p0(). [Page 4, Section 4.2, Paragraph 1] the model parameters O E Rd);
and processing the model space prior and the local model to generate a local predictive posterior ([Page 2, Paragraph 1] Figure 1: Federated learning across K agents equipped with local datasets and assisted by a central server: (a) in DVI agents exchange the current model posterior q(i)() with the server, while (b) in DSVGD agents exchange particles fngNn=1 providing a non-parametric estimate of the posterior. [Page 3, Paragraph 3] computing the optimal posterior qopt() in a distributed manner is that each agent k is only aware of its local loss Lk());
and aggregating the local predictive posteriors of the plurality of client computing systems to generate a global predictive posterior ([Abstract] DSVGD maintains a number of non-random and interacting particles at a central server to represent the current iterate of the model global posterior. [Page 2, Last Paragraph] We consider the federated learning set-up in Fig. 1, where each agent k = 1; : : : ;K has a distinct local dataset with associated training loss Lk() for model parameter . The agents communicate through a central node with the goal of computing the global posterior distribution q() over the shared model parameter 2 Rd for some prior distribution p0());
and training a global model based on the global predictive posterior to obtain a trained global model ([Abstract] DSVGD maintains a number of non-random and interacting particles at a central server to represent the current iterate of the model global posterior. [Page 1, Section I, Paragraph 1] Federated learning refers to the collaborative training of a machine learning model across agents with distinct data sets. [Page 1, Last Paragraph] Federated Bayesian learning has the general aim of computing the global posterior distribution in the model parameter space. [Page 2, Last Paragraph] We consider the federated learning set-up in Fig. 1, where each agent k = 1; : : : ;K has a distinct local dataset with associated training loss Lk() for model parameter . The agents communicate through a central node with the goal of computing the global posterior distribution).
However, Kassab does not explicitly disclose: processing a local dataset of the client computing system to adjust one or more of the plurality of learnable parameters of the local model.
Rollo teaches, in an analogous system: processing a local dataset of the client computing system to adjust one or more of the plurality of learnable parameters of the local model ([Page 2, Abstract] This procedure shares the fundamental approach of FL which consists of performing some local processing. [Page 39, Paragraph 5] Usually we will present in terms of the adjustment of certain θ parameters: [Page 76, Paragraph 2] where θ are the learnable parameters).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Kassab to incorporate the teachings of Rollo to use processing a local dataset of the client computing system to adjust one or more of the plurality of learnable parameters of the local model. One would have been motivated to do this modification because doing so would give the benefit of posing the problem as an optimization problem as taught by Rollo [Page 39, Paragraph 5].
Regarding claim 2:
The system of Kassab and Rollo teaches: The method of claim 1 (as shown above).
Kassab further teaches: wherein: the local predictive posterior is generated using a Markov Chain Monte Carlo algorithm to process the model space prior and the local model ([Page 2, Paragraph 2] (DSGLD), which is an MC sampling technique that maintains a number of Markov chains. [Page 30, Paragraph 3] DSGLD is implemented by splitting theN particles among the K agents. More specifically, when scheduled, each agent runs dN=Ke Markov chains).
Regarding claim 3:
The system of Kassab and Rollo teaches: The method of claim 1 (as shown above).
Kassab further teaches: wherein: obtaining the model space prior comprises receiving the model space prior from a server that computes the model space prior ([Page 2, Last Paragraph] The agents communicate through a central node with the goal of computing the global posterior distribution q() over the shared model parameter 2 Rd for some prior distribution p0(). Note: Also see Figure 1 showing receiving from the server).
Regarding claim 4:
The system of Kassab and Rollo teaches: The method of claim 1 (as shown above).
Kassab further teaches: wherein: the model space prior is a predetermined model space prior ([Page 1, Section 1, Paragraph 1] In this paper, we are specifically interested in a small-scale federated learning setting consisting of mobile or embedded devices, each having a limited data set and running a small-sized model due to their constrained memory. [Page 1, Last 2 paragraphs] This paper introduces a trustworthy solution that is able to reduce the number of communication rounds via a non-parametric variational inference-based implementation of federated Bayesian learning. Federated Bayesian learning has the general aim of computing the global posterior distribution in the model parameter space. [Page 2, Section 2] The agents communicate through a central node with the goal of computing the global posterior distribution q() over the shared model parameter 2 Rd for some prior distribution p0. Note: In this paper, we are specifically interested in a small-scale federated learning setting consisting of mobile or embedded devices, each having a limited data set and running a small-sized model due to their constrained memory corresponds to a predetermined model space);
and obtaining the model space prior comprises retrieving the predetermined model space prior from a memory of the client computing system ([Page 1, Section 1, Paragraph 1] In this paper, we are specifically interested in a small-scale federated learning setting consisting of mobile or embedded devices, each having a limited data set and running a small-sized model due to their constrained memory. [Page 2, Section 2] The agents communicate through a central node with the goal of computing the global posterior distribution q() over the shared model parameter 2 Rd for some prior distribution p0. Note: In this paper, we are specifically interested in a small-scale federated learning setting consisting of mobile or embedded devices, each having a limited data set and running a small-sized model due to their constrained memory corresponds to a predetermined model space. Also see Figure 1 showing the retrieving from an agent corresponding to retrieving from a memory of the client computing system).
Regarding claim 5:
The system of Kassab and Rollo teaches: The method of claim 1 (as shown above).
Kassab further teaches: wherein: aggregating the local predictive posteriors comprises: sending the local predictive posteriors of the plurality of client computing systems to a server to generate the global predictive posterior ([Page 2, Last Paragraph] The agents communicate through a central node with the goal of computing the global posterior distribution q() over the shared model parameter 2 Rd for some prior distribution p0(). Note: Also see Figure 1 showing sending and receiving from the agents (clients) and server).
Regarding claim 6:
The system of Kassab and Rollo teaches: The method of claim 1 (as shown above).
Kassab further teaches: wherein: aggregating the local predictive posteriors comprises: receiving, at a first client computing system of the plurality of client computing systems, the local predictive posteriors of the plurality of client computing systems ([Page 2, Paragraph 1] Figure 1: Federated learning across K agents equipped with local datasets and assisted by a central server: (a) in DVI agents exchange the current model posterior q(i)() with the server. [Page 2, Last Paragraph] We consider the federated learning set-up in Fig. 1, where each agent k = 1. Note: K agents corresponds to plurality of client computing systems and k=1 corresponds to a first client computing system among them);
and processing, at the first client computing system, the plurality of local predictive posteriors to generate the global predictive posterior ([Page 2, Last Paragraph] We consider the federated learning set-up in Fig. 1, where each agent k = 1; : : : ;K has a distinct local dataset with associated training loss Lk() for model parameter . The agents communicate through a central node with the goal of computing the global posterior distribution q() over the shared model parameter 2 Rd for some prior distribution p0(). Note: K agents corresponds to plurality of client computing systems and k=1 corresponds to a first client computing system among them).
Regarding claim 9:
The system of Kassab and Rollo teaches: The method of claim 1 (as shown above).
Kassab further teaches: wherein: aggregating the local predictive posteriors comprises using Gaussian approximation to: for each client computing system, process the respective local predictive posterior to estimate a respective sample mean and covariance ([Page 19, Paragraph 3] Communication load. Using DSVGD, the communication load between a scheduled agent and the central server is of the order O(Nd) since N particles of dimensions d need to be exchanged at each communication round. In contrast, the communication load of PVI depends on the selected parametrization. For instance, one can use PVI with a fully factorized Gaussian approximate posterior, which requires only 2d parameters to be shared with the server, namely mean and variance of each of the d parameters at the price of having lower accuracy);
and process the sample means and covariances for the plurality of client computing systems to estimate a mean and covariance of the global predictive posterior ([Page 17, Paragraph 1] We show here that PVI with a Gaussian variational posterior q(j) = N(j2; 2Id) of fixed covariance 2Id and mean 2 parametrized by natural parameter can be recovered as a special case of U-DSVGD. To elaborate, consider U-DSVGD with one particle 1 (i.e., N = 1), an RKHS kernel that satisfies rk(; ) = 0 and k(; ) = 1 (the RBF kernel is an example of such kernel) and an isotropic Gaussian kernel K(; (i) 1 ) = N(j(i) 1 ; 2Id) of bandwidth used for computing the KDE of the global posterior using the particles).
Regarding claim 10:
The system of Kassab and Rollo teaches: The method of claim 9 (as shown above).
Kassab further teaches: wherein: the global predictive posterior comprises a regression prediction ([Page 17, Section A.2, Paragraph 1] We show here that PVI with a Gaussian variational posterior q(j) = N(j2; 2Id) of fixed covariance 2Id and mean 2 parametrized by natural parameter can be recovered as a special case of U-DSVGD. To elaborate, consider U-DSVGD with one particle 1 (i.e., N = 1), an RKHS kernel that satisfies rk(; ) = 0 and k(; ) = 1 (the RBF kernel is an example of such kernel) and an isotropic Gaussian kernel K(; (i)1 ) = N(j(i) 1 ; 2Id) of bandwidth used for computing the KDE of the global posterior using the particles [Page 17, Section A.3, Paragraph 3] To compute ^p(x), we need the predictive probability p(ytjxt) for all samples t 2 [1; T]. This can be obtained by marginalizing the data likelihood with respect to the weights vector w. This marginalization is generally intractable but can be approximated for both Bayesian logistic regression and Bayesian Neural Networks as detailed in Sec. A.3.1 and Sec. A.3.2);
and processing the sample means and covariances comprises: averaging the sample means using a weight based on the covariances ([Page 9, Section Bayesian logistic regression] The model parameters = [w; log()] include the regression weights w 2 Rd along with the logarithm of a precision parameter . The prior is given as p0(w; ) = p0(wj)p0(), with p0(wj) = N(wj0; 1Id) and p0() = Gamma(ja; b) with a = 1 and b = 0:01. The local training loss Lk() at each agent k is given as Lk() = P(xk;yk)2Dk l(xk; yk;w), where Dk is the dataset at agent k with covariates xk 2 Rd and label yk 2 f1; 1g, and the loss function l(xk; yk;w) is the cross-entropy. Point decisions are taken based on the maximum of the average predictive distribution. [Page 17, Section A.2, Paragraph 1] We show here that PVI with a Gaussian variational posterior q(j) = N(j2; 2Id) of fixed covariance 2Id and mean 2 parametrized by natural parameter can be recovered as a special case of U-DSVGD).
Regarding claim 11:
The system of Kassab and Rollo teaches: The method of claim 1 (as shown above).
Kassab further teaches: wherein: aggregating the local predictive posteriors comprises using a Kernel Density Estimator to: for each client computing system, process a plurality of samples of the respective local predictive posterior to estimate a density of the respective local predictive posterior ([Page 4, Section 4.2, Paragraph 1] SVGD tackles the minimization of the (scaled) free energy functional D(q()jj~p()), for an unnormalized
target distribution ~p(), over a non-parametric generalized posterior q() defined over the model parameters 2 Rd. The posterior q() is represented by a set of particles fngNn =1, with n 2 Rd. In practice, an approximation of q() can be obtained from the particles fngN n=1 through a Kernel Density Estimator (KDE) as q() = N1PN n=1 K(; n) for some kernel function K(; ));
and process the estimated densities for the plurality of client computing systems, using an optimization algorithm, to estimate the global predictive posterior ([Page 6, Paragraph 2] Following SVGD, the update (9) is optimized to maximize the steepest descent decrease of the Kullback–Leibler (KL) divergence between the approximate global posterior q[l] -() encoded via particles f[l] n gN n=1 and the tilted distribution ~p(i) k () in (11) (see Fig. 1(b), step 2 )).
Regarding claim 13:
The system of Kassab and Rollo teaches: The method of claim 1 (as shown above).
Kassab further teaches: wherein: training the global model comprises training the global model to approximate the global predictive posterior, on a server, using knowledge distillation ([Page 19, Paragraph 2] Furthermore, the L0 distillation iterations in the second loop can be performed by the scheduled agent after it has sent its global particles to the central server. This enables the pipelining of the second loop with the operations at the server and at other agents, which can potentially reduce the wall-clock time per communication round. [Page 23, Paragraph 1] Figure 9: KL divergence between exact and approximate global posteriors);
and the method further comprises communicating the trained global model to each client computing system of the plurality of client computing systems ([Page 1, Section 1, Paragraph 1] Federated learning refers to the collaborative training of a machine learning model across agents/ Note: Agents correspond to plurality of client computing systems).
Regarding claim 14:
Kassab teaches: A computing system comprising: a processing device; and a memory storing thereon ([Page 31, Last Paragraph, Section C.2 SOFTWARE DETAILS] We implement all experiments in PyTorch. Note: This shows that the software is being run on a computer with a processor and memory):
a local model comprising a plurality of learnable parameters ([Page 2, Section 2] We consider the federated learning set-up in Fig. 1, where each agent k = 1; : : : ;K has a distinct local dataset with associated training loss Lk() for model parameter . The agents communicate through a central node with the goal of computing the global posterior distribution q() over the shared model parameter 2 Rd for some prior distribution p0(). [Page 4, Section 4.2, Paragraph 1] the model parameters O E Rd);
a local dataset ([Page 2, Section 2, Last Paragraph] has a distinct local dataset);
and machine-executable instructions which, when executed by the processing device, cause the computing system to perform Bayesian federated learning by ([Page 1, Last but one Paragraph] This paper introduces a trustworthy solution that is able to reduce the number of communication rounds via a non-parametric variational inference-based implementation of federated Bayesian learning. [Page 31, Last Paragraph, Section C.2 SOFTWARE DETAILS] We implement all experiments in PyTorch (Paszke et al., 2019) Version 10.3.1. Our experiments and code are based on the original SVGD experiments and code available at: https://github.com/DartML/Stein-Variational-Gradient-Descent. More specifically, DSVGD can be easily obtained by running SVGD twice at each scheduled agent and suitably adjusting its target distribution. Our code is attached with the supplementary materials):
obtaining a model space prior comprising a prior probability distribution over a plurality of learnable parameters of a local model of the client computing system ([Page 2, Section 2] We consider the federated learning set-up in Fig. 1, where each agent k = 1; : : : ;K has a distinct local dataset with associated training loss Lk() for model parameter . The agents communicate through a central node with the goal of computing the global posterior distribution q() over the shared model parameter 2 Rd for some prior distribution p0(). [Page 4, Section 4.2, Paragraph 1] the model parameters O E Rd);
and processing the model space prior and the local model to generate a local predictive posterior ([Page 2, Paragraph 1] Figure 1: Federated learning across K agents equipped with local datasets and assisted by a central server: (a) in DVI agents exchange the current model posterior q(i)() with the server, while (b) in DSVGD agents exchange particles fngNn=1 providing a non-parametric estimate of the posterior. [Page 3, Paragraph 3] computing the optimal posterior qopt() in a distributed manner is that each agent k is only aware of its local loss Lk());
obtaining a local predictive posterior of each client computing system of a plurality of client computing systems (Page 2, Paragraph 1] Figure 1: Federated learning across K agents equipped with local datasets and assisted by a central server: (a) in DVI agents exchange the current model posterior q(i)() with the server. [Page 2, Last Paragraph] We consider the federated learning set-up in Fig. 1, where each agent k = 1. Note: K agents corresponds to plurality of client computing systems and k=1 corresponds to a first client computing system among them);
and aggregating the local predictive posteriors of the computing system and the plurality of client computing systems to generate a global predictive posterior ([Abstract] DSVGD maintains a number of non-random and interacting particles at a central server to represent the current iterate of the model global posterior. [Page 2, Last Paragraph] We consider the federated learning set-up in Fig. 1, where each agent k = 1; : : : ;K has a distinct local dataset with associated training loss Lk() for model parameter . The agents communicate through a central node with the goal of computing the global posterior distribution q() over the shared model parameter 2 Rd for some prior distribution p0()).
However, Kassab does not explicitly disclose: processing a local dataset of the client computing system to adjust one or more of the plurality of learnable parameters of the local model.
Rollo teaches, in an analogous system: processing the local dataset to adjust one or more of the plurality of learnable parameters of the local model ([Page 2, Abstract] This procedure shares the fundamental approach of FL which consists of performing some local processing. [Page 39, Paragraph 5] Usually we will present in terms of the adjustment of certain θ parameters: [Page 76, Paragraph 2] where θ are the learnable parameters).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computing system of Kassab to incorporate the teachings of Rollo to use processing a local dataset to adjust one or more of the plurality of learnable parameters of the local model. One would have been motivated to do this modification because doing so would give the benefit of posing the problem as an optimization problem as taught by Rollo [Page 39, Paragraph 5].
Regarding claim 15:
The system of Kassab and Rollo teaches: The computing system of claim 14 (as shown above).
Kassab further teaches: wherein: aggregating the local predictive posteriors comprises using Gaussian approximation to: for the computing system and each client computing system, process the respective local predictive posterior to estimate a respective sample mean and covariance ([Page 19, Paragraph 3] Communication load. Using DSVGD, the communication load between a scheduled agent and the central server is of the order O(Nd) since N particles of dimensions d need to be exchanged at each communication round. In contrast, the communication load of PVI depends on the selected parametrization. For instance, one can use PVI with a fully factorized Gaussian approximate posterior, which requires only 2d parameters to be shared with the server, namely mean and variance of each of the d parameters at the price of having lower accuracy);
and process the sample means and covariances for the plurality of client computing systems to estimate a mode of the global predictive posterior ([Page 9, Last Paragraph] It is important to note that, in general, most benefits of the proposed scheme appear to be obtained when the particles cover the main modes of the posterior. [Page 17, Paragraph 1] We show here that PVI with a Gaussian variational posterior q(j) = N(j2; 2Id) of fixed covariance 2Id and mean 2 parametrized by natural parameter can be recovered as a special case of U-DSVGD. To elaborate, consider U-DSVGD with one particle 1 (i.e., N = 1), an RKHS kernel that satisfies rk(; ) = 0 and k(; ) = 1 (the RBF kernel is an example of such kernel) and an isotropic Gaussian kernel K(; (i) 1 ) = N(j(i) 1 ; 2Id) of bandwidth used for computing the KDE of the global posterior using the particles. [Page 24, Section B.2, Paragraph 2] We see that, as in the 1-D case and in contrast to parametric methods PVI and GVI, non-parametric methods SVGD and DSVGD are able to capture the different modes of the posterior, obtaining lower values for the KL divergence between the approximate and exact posterior).
Regarding claim 16:
The system of Kassab and Rollo teaches: The computing system of claim 14 (as shown above).
Kassab further teaches: wherein: aggregating the local predictive posteriors comprises using a Kernel Density Estimator to: for each client computing system, process a plurality of samples of the respective local predictive posterior to estimate a density of the respective local predictive posterior ([Page 4, Section 4.2, Paragraph 1] SVGD tackles the minimization of the (scaled) free energy functional D(q()jj~p()), for an unnormalized target distribution ~p(), over a non-parametric generalized posterior q() defined over the model parameters 2 Rd. The posterior q() is represented by a set of particles fngNn =1, with n 2 Rd. In practice, an approximation of q() can be obtained from the particles fngN n=1 through a Kernel Density Estimator (KDE) as q() = N1PN n=1 K(; n) for some kernel function K(; ));
and process the estimated densities for the plurality of client computing systems, using an optimization algorithm, to estimate the global predictive posterior ([Page 6, Paragraph 2] Following SVGD, the update (9) is optimized to maximize the steepest descent decrease of the Kullback–Leibler (KL) divergence between the approximate global posterior q[l] -() encoded via particles f[l] n gN n=1 and the tilted distribution ~p(i) k () in (11) (see Fig. 1(b), step 2 )).
Regarding claim 20:
The system of Kassab and Rollo teaches: Perform the steps of the method of claim 1 (as shown above).
Kassab further teaches: A non-transitory processor-readable medium having machine-executable instructions stored thereon which, when executed by a processing device of a computing system, cause the computing system to perform the steps of the method of claim 1 ([Page 31, Last Paragraph, Section C.2 SOFTWARE DETAILS] We implement all experiments in PyTorch (Paszke et al., 2019) Version 10.3.1. Our experiments and code are based on the original SVGD experiments and code available at: https://github.com/DartML/Stein-Variational-Gradient-Descent. More specifically, DSVGD can be easily obtained by running SVGD twice at each scheduled agent and suitably adjusting its target distribution. Our code is attached with the supplementary materials).
Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Kassab et al (FEDERATED GENERALIZED BAYESIAN LEARNING VIA DISTRIBUTED STEIN VARIATIONAL GRADIENT DESCENT, 2021) in view of Rollo et al (Preliminary conclusions about federated learning applied to clinical data, 2021) and further in view of Pastore et al (US 20220383132 A1).
Regarding claim 7:
The system of Kassab and Rollo teaches: The method of claim 1 (as shown above).
Kassab further teaches: wherein: the local predictive posterior comprises a plurality of posterior probability samples ([Page 2, Section 2] We consider the federated learning set-up in Fig. 1, where each agent k = 1; : : : ;K has a distinct local dataset with associated training loss Lk() for model parameter . The agents communicate through a central node with the goal of computing the global posterior distribution q() over the shared model parameter 2 Rd for some prior distribution p0(). Note: Also see equation (1) that shows the summation and probabilities).
However, the system of Kassab and Rollo does not explicitly disclose: over a corresponding plurality of query inputs.
Pastore teaches, in an analogous system: over a corresponding plurality of query inputs ([0032] For federated learning, the aggregator may issue a query to all available parties in the federated learning system, e.g., the aggregator may issue a query to each of the first, second, and third computers 102a, 102b, 102c in the networked computer environment 100 shown in FIGS. 1 and 5).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Kassab and Rollo to incorporate the teachings of Pastore to use over a corresponding plurality of query inputs. One would have been motivated to do this modification because doing so would give the benefit of sending a query to all available parties in the federated learning system as taught by Pastore [0032].
Regarding claim 8:
The system of Kassab and Rollo teaches: The method of claim 7 (as shown above).
Kassab further teaches: inputs used by each client computing system are obtained from a shared data set; and each client computing system obtains the shared data set from a server ([Page 2, Section 2] We consider the federated learning set-up in Fig. 1, where each agent k = 1; : : : ;K has a distinct local dataset with associated training loss Lk() for model parameter . The agents communicate through a central node with the goal of computing the global posterior distribution q() over the shared model parameter 2 Rd for some prior distribution p0()).
However, the system of Kassab and Rollo does not explicitly disclose: wherein: the plurality of query inputs.
Pastore teaches, in an analogous system: wherein: the plurality of query inputs ([0032] For federated learning, the aggregator may issue a query to all available parties in the federated learning system, e.g., the aggregator may issue a query to each of the first, second, and third computers 102a, 102b, 102c in the networked computer environment 100 shown in FIGS. 1 and 5).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Kassab and Rollo to incorporate the teachings of Pastore to use the plurality of query inputs. One would have been motivated to do this modification because doing so would give the benefit of sending a query to all available parties in the federated learning system as taught by Pastore [0032].
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Thorgeirsson et al (Probabilistic Predictions with Federated Learning, 2020) discloses uncertainty in the aggregation step of the algorithm by treating the set of local weights as a posterior distribution for the weights of the global model. We compare our approach to state-of-the-art Bayesian and non-Bayesian probabilistic learning algorithms. By applying proper scoring rules to evaluate the predictive distributions, we show that our approach can achieve similar performance as the benchmark would achieve in a non-distributed setting.
Wang et al (FEDERATED LEARNING WITH MATCHED AVERAGING, 2020) discloses the Federated matched averaging (FedMA) algorithm designed for federated learning of modern neural network architectures e.g. convolutional neural networks (CNNs) and LSTMs. FedMA constructs the shared global model in a layer-wise manner by matching and averaging hidden elements (i.e. channels for convolution layers; hidden states for LSTM; neurons for fully connected layers) with similar feature extraction signatures. Our experiments indicate that FedMA not only outperforms popular state-of-the-art federated learning algorithms on deep CNN and LSTM architectures trained on real world datasets, but also reduces the overall communication burden.
Xiao et al (Averaging Is Probably Not the Optimum Way of Aggregating Parameters in Federated Learning, 2020) discloses each client computed parameter as a random vector because of the stochastic properties of SGD, and estimate mutual information between two client computed parameters at different training phases using two methods in two learning tasks. The results confirm the correlation between different clients and show an increasing trend of mutual information with training iteration. However, when we further compute the distance between client computed parameters, we find that parameters are getting more correlated while not getting closer. This phenomenon suggests that averaging parameters may not be the optimum way of aggregating trained parameters.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAITANYA RAMESH JAYAKUMAR whose telephone number is (571)272-3369. The examiner can normally be reached Mon-Fri 9am-1pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C.R.J./Examiner, Art Unit 2128
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128