DETAILED ACTION
This action is responsive to claims filed on 4 April 2023.
Claims 1-17 are pending for examination.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 4 April 2023 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered and attached by the examiner.
Claim Objections
Claim 8 is objected to because of the following informalities: “the main feature set” in line 6 should be “the main set of features”. Appropriate correction is required.
Claim 14 and analogous claim 16 are objected to because of the following informalities: “wherein the method further comprises, wherein the model further comprises” in lines 7-8 should be “wherein the method further comprises”. Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "the aggregated feature weight" in line 13. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, the term “the aggregated feature weight" has been construed to be “an aggregated feature weight”. Claims 2-13, 17, which are dependent on claim 1, are similarly rejected.
Claim 15 recites the limitation "the aggregated feature weight" in line 17. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, the term “the aggregated feature weight" has been construed to be “an aggregated feature weight”.
Claim 9 recites the limitation "the base training set" in line 3. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, the term “the base training set" has been construed to be “a base training set”.
Claim 14 and analogous claim 16 recites the limitation "the other client training sets" in line 5. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, the term “the other client training sets" has been construed to be “other client training sets”.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claims do not fall within at least one of the four categories of patent eligible subject matter because the full scope of “A transitory or non-transitory computer readable medium” includes transitory signals or “signals per se”, See MPEP 2106.03.
Claims 3, 6, 12-13, 14, 16 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception, abstract idea, without significantly more.
Step 1: This part of the eligibility analysis evaluates whether the claim(s) falls within any statutory
category. MPEP 2106.03:
According to the first part of the Alice analysis, in the instant case, the claims were determined
to be directed to one of the four statutory categories: an article of manufacture, a method/process (Claims 1-14), a machine/system/product (Claims 15-16), and a composition of matter. Based on the claims being determined to be within of the four categories (i.e., process, machine, manufacture, or composition of matter), (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea).
Step 2A Prong One: This part of the eligibility analysis evaluates whether the claim(s) recites a
judicial exception.
Regarding independent claims 14, 16, the claims recite a judicial exception (i.e., an abstract idea enumerated in the 2019 PEG) without significantly more (Step-2A: Prong One). The applicant's claim limitations under broadest reasonable interpretation covers activities classified under mental processes - concepts performed in the human mind (including an observation, evaluation, judgment, opinion) (see MPEP § 2106.04(a)(2), subsection Ill) and the 2019 PEG. As evaluated below:
Claims 14, 16:
“determining client feature weights for the aggregated model” (mental process of judgement)
If the identified limitation(s) falls within at least one of the groupings of abstract ideas, it is
reasonable to conclude that the claim(s) recites an abstract idea in Step 2A Prong One.
Step 2A Prong Two: This part of the eligibility analysis evaluates whether the claim(s) as a whole integrates the recited judicial exception into a practical application of the exception. As evaluated below:
“sending a model update representing the improved model parameters to a training system”
“receiving an aggregated model from the training system”
“sending the client feature weights to the training system”
“receiving a signal indicating a particular feature not indicated in the client training set”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
“improving model parameters in training iterations executed on a client training set”
“a training sample in a client training set indicating values for multiple features, at least some of the other client training sets indicating values for different features”
“wherein the client training set does not indicate values for the particular feature and a relative importance of the particular feature is above a threshold”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea when considered as an ordered combination and as a whole.
Step 2B: This part of the eligibility analysis evaluates whether the claim, as a whole, amounts to
significantly more than the recited exception, i.e., whether any additional element, or combination of
additional elements, adds an inventive concept to the claim. MPEP 2106.05.
First, the additional elements considered as part of the preamble and the additional elements
directed to the use of computer technology are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because they generally link the judicial exception to
the technology environment, see MPEP 2106.05(h).
Second, the additional elements mere application of the abstract idea or mere instructions to
implement an abstract idea on a computer are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because the limitations generally apply the use of a
generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Lastly, the claims are directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception. The courts have found these types of limitations insufficient to transform the judicial exception to a patentable invention, see MPEP 2106.05(g).
Furthermore, when considering evidence in view of Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018), see USPTO Berkheimer Memorandum (April 2018). Examiner notes Berkheimer: Option 2 - A citation to one or more of the court decisions discussed in MPEP § 2106.05(d}(II} as noting the well understood, routine, conventional nature of the additional element (s) (e.g., limitations directed to mere data gathering):
The courts have recognized the following computer functions as well understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity, see MPEP 2106.05(d).
The additional limitations, as analyzed, failed to integrate a judicial exception into a practical application at Step 2A and provide an inventive concept in Step 2B, per the analysis above. Thus, considering the additional elements individually and in combination and the claims as a whole, the additional elements do not provide significantly more than the abstract idea. This claim is not patent eligible. Therefore, in examining elements as recited by the limitations individually and as an ordered combination, as a whole, claims 14, 16 do not recite what the courts have identified as "significantly more".
Regarding dependent claims 3, 6, 12-13, the claims recite a judicial exception (i.e., an abstract idea enumerated in the 2019 PEG) without significantly more (Step-2A: Prong One). The applicant's claim limitations under broadest reasonable interpretation covers activities classified under mental processes - concepts performed in the human mind (including an observation, evaluation, judgment, opinion) (see MPEP § 2106.04(a)(2), subsection Ill) and the 2019 PEG. As evaluated below:
Claim 3:
Incorporates claim 1.
“selecting a particular feature and a client training set” (mental process of judgement)
If the identified limitation(s) falls within at least one of the groupings of abstract ideas, it is
reasonable to conclude that the claim(s) recites an abstract idea in Step 2A Prong One.
Step 2A Prong Two: This part of the eligibility analysis evaluates whether the claim(s) as a whole integrates the recited judicial exception into a practical application of the exception. As evaluated below:
“sending the signal comprises sending a signal to the client system corresponding to the selected client training set indicating the particular feature”
“receiving multiple model updates from multiple client systems”
“the aggregated model being arranged to receive multiple feature values representing multiple features”
“obtaining feature weights for the aggregated model representing a relative importance of the multiple features for the aggregated model's output”
“sending a signal to at least one of the multiple client systems in dependence on the aggregated feature weight for a feature of the aggregated model”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
“wherein the client training set does not indicate values for the particular feature and the relative importance of the particular feature is above a threshold”
“a model update representing model parameters improved in training iterations executed by a client system on a corresponding client training set”
“a training sample in a client training set indicating values for multiple features, at least some of the multiple client training sets indicating values for different features”
“aggregating the multiple model updates to obtain an aggregated model”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea when considered as an ordered combination and as a whole.
Step 2B: This part of the eligibility analysis evaluates whether the claim, as a whole, amounts to
significantly more than the recited exception, i.e., whether any additional element, or combination of
additional elements, adds an inventive concept to the claim. MPEP 2106.05.
First, the additional elements considered as part of the preamble and the additional elements
directed to the use of computer technology are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because they generally link the judicial exception to
the technology environment, see MPEP 2106.05(h).
Second, the additional elements mere application of the abstract idea or mere instructions to
implement an abstract idea on a computer are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because the limitations generally apply the use of a
generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Lastly, the claims are directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception. The courts have found these types of limitations insufficient to transform the judicial exception to a patentable invention, see MPEP 2106.05(g).
Furthermore, when considering evidence in view of Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018), see USPTO Berkheimer Memorandum (April 2018). Examiner notes Berkheimer: Option 2 - A citation to one or more of the court decisions discussed in MPEP § 2106.05(d}(II} as noting the well understood, routine, conventional nature of the additional element (s) (e.g., limitations directed to mere data gathering):
The courts have recognized the following computer functions as well understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity, see MPEP 2106.05(d).
The additional limitations, as analyzed, failed to integrate a judicial exception into a practical application at Step 2A and provide an inventive concept in Step 2B, per the analysis above. Thus, considering the additional elements individually and in combination and the claims as a whole, the additional elements do not provide significantly more than the abstract idea. This claim is not patent eligible. Therefore, in examining elements as recited by the limitations individually and as an ordered combination, as a whole, claim 3 does not recite what the courts have identified as "significantly more".
Claim 6:
Incorporates claim 5.
“wherein an aggregated feature weight is determined from the multiple client feature weights for which the corresponding client training set indicates values for the aggregated feature weights” (mental process of judgement)
If the identified limitation(s) falls within at least one of the groupings of abstract ideas, it is
reasonable to conclude that the claim(s) recites an abstract idea in Step 2A Prong One.
Step 2A Prong Two: This part of the eligibility analysis evaluates whether the claim(s) as a whole integrates the recited judicial exception into a practical application of the exception. As evaluated below:
“obtaining feature weights for the aggregated model comprises receiving multiple client feature weights for the aggregated model determined by multiple clients”
“receiving multiple model updates from multiple client systems”
“the aggregated model being arranged to receive multiple feature values representing multiple features”
“obtaining feature weights for the aggregated model representing a relative importance of the multiple features for the aggregated model's output”
“sending a signal to at least one of the multiple client systems in dependence on the aggregated feature weight for a feature of the aggregated model”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
“aggregating the client feature weights”
“a model update representing model parameters improved in training iterations executed by a client system on a corresponding client training set”
“a training sample in a client training set indicating values for multiple features, at least some of the multiple client training sets indicating values for different features”
“aggregating the multiple model updates to obtain an aggregated model”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea when considered as an ordered combination and as a whole.
Step 2B: This part of the eligibility analysis evaluates whether the claim, as a whole, amounts to
significantly more than the recited exception, i.e., whether any additional element, or combination of
additional elements, adds an inventive concept to the claim. MPEP 2106.05.
First, the additional elements considered as part of the preamble and the additional elements
directed to the use of computer technology are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because they generally link the judicial exception to
the technology environment, see MPEP 2106.05(h).
Second, the additional elements mere application of the abstract idea or mere instructions to
implement an abstract idea on a computer are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because the limitations generally apply the use of a
generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Lastly, the claims are directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception. The courts have found these types of limitations insufficient to transform the judicial exception to a patentable invention, see MPEP 2106.05(g).
Furthermore, when considering evidence in view of Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018), see USPTO Berkheimer Memorandum (April 2018). Examiner notes Berkheimer: Option 2 - A citation to one or more of the court decisions discussed in MPEP § 2106.05(d}(II} as noting the well understood, routine, conventional nature of the additional element (s) (e.g., limitations directed to mere data gathering):
The courts have recognized the following computer functions as well understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity, see MPEP 2106.05(d).
The additional limitations, as analyzed, failed to integrate a judicial exception into a practical application at Step 2A and provide an inventive concept in Step 2B, per the analysis above. Thus, considering the additional elements individually and in combination and the claims as a whole, the additional elements do not provide significantly more than the abstract idea. This claim is not patent eligible. Therefore, in examining elements as recited by the limitations individually and as an ordered combination, as a whole, claim 6 does not recite what the courts have identified as "significantly more".
Claim 12:
Incorporates claim 1.
“selecting two or more of the multiple model updates and configuring an ensemble model from the selected model updates” (mental process of judgement)
If the identified limitation(s) falls within at least one of the groupings of abstract ideas, it is
reasonable to conclude that the claim(s) recites an abstract idea in Step 2A Prong One.
Step 2A Prong Two: This part of the eligibility analysis evaluates whether the claim(s) as a whole integrates the recited judicial exception into a practical application of the exception. As evaluated below:
“receiving multiple model updates from multiple client systems”
“the aggregated model being arranged to receive multiple feature values representing multiple features”
“obtaining feature weights for the aggregated model representing a relative importance of the multiple features for the aggregated model's output”
“sending a signal to at least one of the multiple client systems in dependence on the aggregated feature weight for a feature of the aggregated model”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
“applying an average to the multiple model updates”
“a model update representing model parameters improved in training iterations executed by a client system on a corresponding client training set”
“a training sample in a client training set indicating values for multiple features, at least some of the multiple client training sets indicating values for different features”
“aggregating the multiple model updates to obtain an aggregated model”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea when considered as an ordered combination and as a whole.
Step 2B: This part of the eligibility analysis evaluates whether the claim, as a whole, amounts to
significantly more than the recited exception, i.e., whether any additional element, or combination of
additional elements, adds an inventive concept to the claim. MPEP 2106.05.
First, the additional elements considered as part of the preamble and the additional elements
directed to the use of computer technology are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because they generally link the judicial exception to
the technology environment, see MPEP 2106.05(h).
Second, the additional elements mere application of the abstract idea or mere instructions to
implement an abstract idea on a computer are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because the limitations generally apply the use of a
generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Lastly, the claims are directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception. The courts have found these types of limitations insufficient to transform the judicial exception to a patentable invention, see MPEP 2106.05(g).
Furthermore, when considering evidence in view of Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018), see USPTO Berkheimer Memorandum (April 2018). Examiner notes Berkheimer: Option 2 - A citation to one or more of the court decisions discussed in MPEP § 2106.05(d}(II} as noting the well understood, routine, conventional nature of the additional element (s) (e.g., limitations directed to mere data gathering):
The courts have recognized the following computer functions as well understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity, see MPEP 2106.05(d).
The additional limitations, as analyzed, failed to integrate a judicial exception into a practical application at Step 2A and provide an inventive concept in Step 2B, per the analysis above. Thus, considering the additional elements individually and in combination and the claims as a whole, the additional elements do not provide significantly more than the abstract idea. This claim is not patent eligible. Therefore, in examining elements as recited by the limitations individually and as an ordered combination, as a whole, claim 12 does not recite what the courts have identified as "significantly more".
Claim 13:
Incorporates claim 1.
“predict a medical condition” (mental process of judgement)
If the identified limitation(s) falls within at least one of the groupings of abstract ideas, it is
reasonable to conclude that the claim(s) recites an abstract idea in Step 2A Prong One.
Step 2A Prong Two: This part of the eligibility analysis evaluates whether the claim(s) as a whole integrates the recited judicial exception into a practical application of the exception. As evaluated below:
“wherein the model is a medical model arranged to receive medical feature values as input”
“receiving multiple model updates from multiple client systems”
“the aggregated model being arranged to receive multiple feature values representing multiple features”
“obtaining feature weights for the aggregated model representing a relative importance of the multiple features for the aggregated model's output”
“sending a signal to at least one of the multiple client systems in dependence on the aggregated feature weight for a feature of the aggregated model”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
“a model update representing model parameters improved in training iterations executed by a client system on a corresponding client training set”
“a training sample in a client training set indicating values for multiple features, at least some of the multiple client training sets indicating values for different features”
“aggregating the multiple model updates to obtain an aggregated model”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea when considered as an ordered combination and as a whole.
Step 2B: This part of the eligibility analysis evaluates whether the claim, as a whole, amounts to
significantly more than the recited exception, i.e., whether any additional element, or combination of
additional elements, adds an inventive concept to the claim. MPEP 2106.05.
First, the additional elements considered as part of the preamble and the additional elements
directed to the use of computer technology are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because they generally link the judicial exception to
the technology environment, see MPEP 2106.05(h).
Second, the additional elements mere application of the abstract idea or mere instructions to
implement an abstract idea on a computer are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because the limitations generally apply the use of a
generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Lastly, the claims are directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception. The courts have found these types of limitations insufficient to transform the judicial exception to a patentable invention, see MPEP 2106.05(g).
Furthermore, when considering evidence in view of Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018), see USPTO Berkheimer Memorandum (April 2018). Examiner notes Berkheimer: Option 2 - A citation to one or more of the court decisions discussed in MPEP § 2106.05(d}(II} as noting the well understood, routine, conventional nature of the additional element (s) (e.g., limitations directed to mere data gathering):
The courts have recognized the following computer functions as well understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity, see MPEP 2106.05(d).
The additional limitations, as analyzed, failed to integrate a judicial exception into a practical application at Step 2A and provide an inventive concept in Step 2B, per the analysis above. Thus, considering the additional elements individually and in combination and the claims as a whole, the additional elements do not provide significantly more than the abstract idea. This claim is not patent eligible. Therefore, in examining elements as recited by the limitations individually and as an ordered combination, as a whole, claim 13 does not recite what the courts have identified as "significantly more".
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3-6, 8, 10, 12-17 are rejected under 35 U.S.C. 103 as being unpatentable over De Brouwer et al. (U.S. Pre-Grant Publication No. 20200293887, hereinafter ‘De Brouwer'), in view of Liang et al. (NPL: "Think Locally, Act Globally: Federated Learning with Local and Global Representations", hereinafter 'Liang').
Regarding claim 1 and analogous claim 15, De Brouwer teaches A computer-implemented server method for training a model, the method comprising receiving multiple model updates from multiple client systems, a training sample in a client training set indicating values for multiple features, at least some of the multiple client training sets indicating values for different features ([0054] In a federated workflow 915, we start with a base model 951 that may have been trained in this conventional manner. Once this base model 951 is trained, refinement can proceed without centrally collecting any further data. Instead, the base model is distributed to individual devices 953. These edge devices perform local training to generate local model updates 957, using data (not shown) that is on those devices.; [0059] Consider, with reference to FIG. 10, a data set in the form of a table 1015. This data can be visualized as is a matrix with samples across rows and features down columns. The rows of a training sample in a client training set indicating values for multiple features data may correspond to samples used with a neural network for training. They also may correspond to a SQL-returned table and may have a at least some of the multiple client training sets indicating values for different features unique identifiers, IDs, across rows and again have columns of features.),
aggregating the multiple model updates to obtain an aggregated model, the aggregated model being arranged to receive multiple feature values representing multiple features ([0050] In some embodiments, a core template of machine learning workflow comprises four steps. Step 1 is data collection, to procure raw data. Step 2 is data re-formatting, to prepare the data in the right format. Step 3 is modeling, to choose and apply a learning algorithm. Step 4 is predictive analytics, to make a prediction. Variables that are likely to influence future events are predicted. Parameters used to make the prediction are represented in multi-dimensional matrix, called tensors.; [0054] The federated workflow aggregating the multiple model updates to obtain an aggregated model aggregates the local updates into a new global model 959 which will become our next base model 951 that will be used for inference and additional rounds 915 of training a federated loop. Again, updating via the federated loop 915 does not require centrally collecting data. Instead, we're sending the model to the data for training, not bringing data to the model for training. This is a decentralized workflow instead of a centralized workflow.; [0159] The FL aggregator is coupled to a communication network and includes a federated learner. The the aggregated model being arranged to receive multiple feature values representing multiple features federated learner is configured to receive modified tensors from at least some of the edge devices, aggregate the modified tensors with a current version of the base model tensor by federated learning to produce a new version of the base model tensor, and distribute the new version of the base model tensor to the edge devices. The federator learner can be implemented in the FL aggregator as in-line code, can be implemented in a separate module or some combination of the two coding strategies.),
obtaining feature weights for the aggregated model representing a relative importance of the multiple features for the aggregated model's output, sending a signal to at least one of the multiple client systems in dependence on the aggregated feature weight for a feature of the aggregated model ([0115] In some embodiments, Flea end users communicate and collaborate with one another to build and update models of computation in vertical tensor ensembles in a one-to many manner. With federated learning a global protocol is sent from one central authority to many participants who collect information on their edge device, obtaining feature weights for the aggregated model representing a relative importance of the multiple features for the aggregated model's output label the information and compute it locally, after which they sent the tensors to the central FL aggregator of the sponsor. They sending a signal to at least one of the multiple client systems in dependence on the aggregated feature weight for a feature of the aggregated model aggregate all the tensors and then report the updated and averaged tensors back to each of the participants.).
De Brouwer fails to teach a model update representing model parameters improved in training iterations executed by a client system on a corresponding client training set,
Liang teaches a model update representing model parameters improved in training iterations executed by a client system on a corresponding client training set ([3.1 Local Representation Learning, pg. 4] For each source of data (Xm, Ym), we learn a representation Hm which should: 1) be low-dimensional as compared to raw data Xm, 2) capture important features in Xm that are useful towards the global model, and 3) not overfit to device data which may not align to the global data distribution. To be more concrete, we define features that should be captured using a good representation h. In Figure 1(a) through 1(c) we summarize these local learning methods according to the choice of z: (a) the labels y (supervised learning), (b) the data itself x (unsupervised autoencoder learning), or (c) some auxiliary labels z (self-supervised learning). For simplicity, we focus the description on supervised learning but describe extensions to local adversarial learning of fair representations (Figure 1(d)) and unsupervised learning in Appendix B.1. Each device consists of a a model update local model with representing model parameters parameters which allow us to improved in training iterations executed by a client system on a corresponding client training set infer features from local device data. These features should be useful in predicting the labels using a joint global model with parameters over the features from all devices {H1, ...,HM}. The key difference is that the global model now operates on lower-dimensional local representations Hm. Therefore, g can be a much smaller model which we will show in our experiments (§5.2).),
De Brouwer and Liang are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of De Brouwer, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Liang to De Brouwer before the effective filing date of the claimed invention in order to jointly learn compact local representations on each device and a global model across all devices (cf. Liang, [Abstract, pg. 1] Federated learning is a method of training models on private data distributed over multiple devices. To keep device data private, the global model is trained by only communicating parameters and updates which poses scalability challenges for large models. To this end, we propose a new federated learning algorithm that jointly learns compact local representations on each device and a global model across all devices. As a result, the global model can be smaller since it only operates on local representations, reducing the number of communicated parameters. Theoretically, we provide a generalization analysis which shows that a combination of local and global models reduces both variance in the data as well as variance across device distributions. Empirically, we demonstrate that local models enable communication-efficient training while retaining performance. We also evaluate on the task of personalized mood prediction from real-world mobile data where privacy is key. Finally, local models handle heterogeneous data from new devices, and learn fair representations that obfuscate protected attributes such as race, age, and gender.).
Regarding claim 3, De Brouwer, as modified by Liang, teaches The training method of claim 1.
Liang teaches comprising selecting a particular feature and a client training set,
wherein the client training set does not indicate values for the particular feature and the relative importance of the particular feature is above a threshold, sending the signal comprises sending a signal to the client system corresponding to the selected client training set indicating the particular feature ([3.2 Global Aggregation, pg.4-5] Learning this joint global model g across all devices requires the aggregation of global parameter updates from each device. At each iteration t of global model training, the sending the signal server sends a selecting a particular feature and a client training set, wherein the client training set does not indicate values for the particular feature and the relative importance of the particular feature is above a threshold copy of the global model parameters g(t) comprises sending a signal to the client system to each device which we now label as g(t)m to represent the corresponding to the selected client training set indicating the particular feature asynchronous updates made to each local copy. Each device runs their local model to obtain local features and the global model to obtain predictions. We can compute the overall loss on device m: (1)).
De Brouwer and Liang are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 4, De Brouwer, as modified by Liang, teaches The training method of claim 1.
De Brouwer teaches comprising distributing the aggregated model to the multiple client systems ([0020] The technology disclosed includes is a method of federated learning utilizing computation capability of edge devices. The method comprises sending out tensors by multiple edge devices with federated learning models, receiving tensors by an FL aggregator including a federated learning update repository from the edge devices, distributing the aggregated model to the multiple client systems distributing updated models from the federated learning update repository to the edge devices, and the edge devices using the updated models.).
De Brouwer and Liang are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 5, De Brouwer, as modified by Liang, teaches The training method of claim 4.
De Brouwer teaches wherein obtaining feature weights for the aggregated model comprises receiving multiple client feature weights for the aggregated model determined by multiple clients and aggregating the client feature weights ([0115] In some embodiments, Flea end users communicate and collaborate with one another to build and update models of computation in vertical tensor ensembles in a one-to many manner. With federated learning a global protocol is sent from one central authority to many determined by multiple clients and aggregating the client feature weights participants who collect information on their edge device, label the information and compute it locally, after which they obtaining feature weights for the aggregated model comprises receiving multiple client feature weights for the aggregated model sent the tensors to the central FL aggregator of the sponsor. They aggregate all the tensors and then report the updated and averaged tensors back to each of the participants.).
De Brouwer and Liang are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 6, De Brouwer, as modified by Liang, teaches The training method of claim 5.
De Brouwer teaches wherein an aggregated feature weight is determined from the multiple client feature weights for which the corresponding client training set indicates values for the aggregated feature weights ([0115] In some embodiments, Flea end users communicate and collaborate with one another to build and update models of computation in vertical tensor ensembles in a one-to many manner. With federated learning a global protocol is sent from one central authority to many participants who collect information on their edge device, label the information and compute it locally, after which they sent the determined from the multiple client feature weights for which the corresponding client training set indicates values for the aggregated feature weights tensors to the central FL aggregator of the sponsor. They aggregated feature weight aggregate all the tensors and then report the updated and averaged tensors back to each of the participants.).
De Brouwer and Liang are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 8, De Brouwer, as modified by Liang, teaches The training method of claim 1.
De Brouwer teaches comprising training a base model on a base training set and distributing the trained base model to the multiple client systems, the model updates being updates of the base model, the base model being arranged to receive feature values for a main set of features, a training sample in a client training set indicating values for multiple features, said multiple features being a subset of the main feature set ([0054] In a federated workflow 915, we start with a training a base model on a base training set base model 951 that may have been trained in this conventional manner. Once this base model 951 is trained, refinement can proceed without centrally collecting any further data. Instead, the distributing the trained base model to the multiple client systems base model is distributed to individual devices 953. These edge devices perform model updates being updates of the base model local training to generate local model updates 957, using data (not shown) that is on those devices. The federated workflow aggregates the the base model being arranged to receive feature values for a main set of features local updates into a new global model 959 which will become our next base model 951 that will be used for inference and additional rounds 915 of training a federated loop.; [0060] Consider an image processing application and a tensor applied to images that are, for example, 224×224 pixels, prior to being sent to a neural network for inference and training by backward propagation. Images on different devices have the same feature space, but they're different images, belonging to different sample spaces. Each edge device can start with the same base model. An FL aggregator or federated learning repository or some other central authority or compute resource sends the base model to the edge device for update training, to produce updated models 957. The edge devices 953 a training sample in a client training set indicating values for multiple features, said multiple features being a subset of the main feature set train using respective partitions of the data 1015, producing the updated models 957, which are aggregated 959 into an updated model which can be distributed as a new base model 951. In this process, the base model resides locally on each device. Each device trains locally on data that is available on device. The federated loop aggregates the local updates to produce a new global model.).
De Brouwer and Liang are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 10, De Brouwer, as modified by Liang, teaches The training method of claim 1.
De Brouwer teaches comprising one or more iterations of receiving multiple model updates from multiple client systems with respect to an aggregated model received by the client system, aggregating the multiple model updates to obtain a further aggregated model, obtaining feature weights for the further aggregated model ([0065] Communications between devices 953 and server 1221 are asynchronous, over network connections, and sometimes unreliable. In some cases, an edge device or client make a request for a training task, but does not receive a response for the server. This can be represented by a upward arrow, for instance near the beginning of cycle 1223, without a responsive downward arrow. In other cases, the client might request and receive an assignment and current model version, but never upload an updated model. In other cases, a client may participate multiple times during a given training cycle. The server 1221 checks to make sure that updates received apply to a current version of the base model, that the edge device is not updating a deprecated base model version. A cycle, such as 1213, 1215 or 1217, eventually reaches a predetermined threshold. This threshold could be expressed as a number of clients that have participated in the round, as a number of training samples processed in the updated models, or as an elapsed amount of time. one or more iterations of receiving multiple model updates from multiple client systems with respect to an aggregated model received by the client system Each of the cycles corresponds to one round of the federated loop 915 that aggregating the multiple model updates to obtain a further aggregated model produces a new global model (959, which becomes 951), and to obtaining feature weights for the further aggregated model distribution to the edge devices of the updated, new model. The edge devices can use the new model for predictions and training as additional data is collected. Preferably, the edge devices do not repeatedly train using old data that previously was used to train an updated model that was forwarded to the server 1221 for aggregation. The process repeats, as depicted for three cycles in FIG. 12.).
De Brouwer and Liang are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 12, De Brouwer, as modified by Liang, teaches The training method of claim 1.
De Brouwer teaches wherein aggregating the multiple model updates comprises applying an average to the multiple model updates ([0062] Those local updates are centrally aggregated into a new based model and the process repeats. Aggregation can be performed using a federated average algorithm, applying the averaging formula 1577 in FIG. 15. This is a aggregating the multiple model updates comprises applying an average to the multiple model updates weighted average of the updates to the model, weighted according to the number of samples used by an edge device to produce its update.), and/or
selecting two or more of the multiple model updates and configuring an ensemble model from the selected model updates ([0109] In some embodiments, selecting two or more of the multiple model updates and configuring an ensemble model from the selected model updates Flea end users communicate and collaborate with one another to build and update models, effecting a lateral tensor ensemble of user models, in a one-to-one manner. The end users could also laterally organize their own trials and choose a central FL aggregator to which to send the gradients and get the averaged gradients back in a distributed fashion.).
De Brouwer and Liang are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 13, De Brouwer, as modified by Liang, teaches The training method of claim 1.
De Brouwer teaches wherein the model is a medical model arranged to receive medical feature values as input and/or to predict a medical condition ([0081] With this example in mind, we return to describing the overall approach. As show in in FIG. 2, Flea end users can communicate and collaborate with one another (potentially in tandem with one or more FL aggregator backends) to build and update models of computation in multiple ways. These configurations are described in the context of model is a medical model medical research use cases.; [0136] We can use arranged to receive medical feature values as input information that comprehensively characterizes each individual for demographics, biologic omics, physiology, anatomy, and environment, along with and to predict a medical condition treatment and outcomes for medical conditions.).
De Brouwer and Liang are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 14 and analogous claim 16, De Brouwer teaches A computer-implemented client method for training a model, the method comprising a training sample in a client training set indicating values for multiple features, at least some of the other client training sets indicating values for different features ([0054] In a federated workflow 915, we start with a base model 951 that may have been trained in this conventional manner. Once this base model 951 is trained, refinement can proceed without centrally collecting any further data. Instead, the base model is distributed to individual devices 953. These edge devices perform local training to generate local model updates 957, using data (not shown) that is on those devices.; [0059] Consider, with reference to FIG. 10, a data set in the form of a table 1015. This data can be visualized as is a matrix with samples across rows and features down columns. The rows of a training sample in a client training set indicating values for multiple features data may correspond to samples used with a neural network for training. They also may correspond to a SQL-returned table and may have a at least some of the other client training sets indicating values for different features unique identifiers, IDs, across rows and again have columns of features.),
sending a model update representing the improved model parameters to a training system ([0045] This ground truth is backward propagated through a network on the edge device, producing parameter adjustments. Occasionally, the sending a model update representing the improved model parameters to a training system updated parameters are returned to an FL aggregator. The FL aggregator periodically updates and redistributes an updated model.),
wherein the method further comprises, wherein the model further comprises receiving an aggregated model from the training system and determining client feature weights for the aggregated model and sending the client feature weights to the training system ([0115] In some embodiments, Flea end users communicate and collaborate with one another to build and update models of computation in vertical tensor ensembles in a one-to many manner. With federated learning a receiving an aggregated model from the training system global protocol is sent from one central authority to many determining client feature weights for the aggregated model participants who collect information on their edge device, label the information and compute it locally, after which they sending the client feature weights to the training system sent the tensors to the central FL aggregator of the sponsor. They aggregate all the tensors and then report the updated and averaged tensors back to each of the participants.), and/or
De Brouwer fails to teach improving model parameters in training iterations executed on a client training set, receiving a signal indicating a particular feature not indicated in the client training set, wherein the client training set does not indicate values for the particular feature and a relative importance of the particular feature is above a threshold.
Liang teaches improving model parameters in training iterations executed on a client training set ([3.1 Local Representation Learning, pg. 4] For each source of data (Xm, Ym), we learn a representation Hm which should: 1) be low-dimensional as compared to raw data Xm, 2) capture important features in Xm that are useful towards the global model, and 3) not overfit to device data which may not align to the global data distribution. To be more concrete, we define features that should be captured using a good representation h. In Figure 1(a) through 1(c) we summarize these local learning methods according to the choice of z: (a) the labels y (supervised learning), (b) the data itself x (unsupervised autoencoder learning), or (c) some auxiliary labels z (self-supervised learning). For simplicity, we focus the description on supervised learning but describe extensions to local adversarial learning of fair representations (Figure 1(d)) and unsupervised learning in Appendix B.1. Each device consists of a improving model parameters in training iterations executed on a client training set local model with parameters which allow us to infer features from local device data. These features should be useful in predicting the labels using a joint global model with parameters over the features from all devices {H1, ...,HM}. The key difference is that the global model now operates on lower-dimensional local representations Hm. Therefore, g can be a much smaller model which we will show in our experiments (§5.2).),
receiving a signal indicating a particular feature not indicated in the client training set, wherein the client training set does not indicate values for the particular feature and a relative importance of the particular feature is above a threshold ([3.2 Global Aggregation, pg.4-5] Learning this joint global model g across all devices requires the aggregation of global parameter updates from each device. At each iteration t of global model training, the server sends a wherein the client training set does not indicate values for the particular feature and a relative importance of the particular feature is above a threshold copy of the global model parameters g(t) to each device which we now label as g(t)m to represent the asynchronous receiving a signal indicating a particular feature not indicated in the client training set updates made to each local copy. Each device runs their local model to obtain local features and the global model to obtain predictions. We can compute the overall loss on device m: (1)).
De Brouwer and Liang are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 17, De Brouwer, as modified by Liang, teaches The method of claim 1.
De Brouwer teaches A transitory or non-transitory computer readable medium comprising data representing instructions, which when executed by a processor system, cause the processor system to perform the method ([0164] Other implementations may include a non-transitory computer readable storage medium storing instructions executable by a processor to perform actions of the system described above. Each of the features discussed in the particular implementation section for other implementations apply equally to this implementation. As indicated above, all the other features are not repeated here and should be considered repeated by reference.).
De Brouwer and Liang are combinable for the same rationale as set forth above with respect to claim 1.
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over De Brouwer, in view of Liang, and further in view of Yin et al. (NPL: "FedLoc: Federated Learning Framework for Data-Driven Cooperative Localization and Location Data Processing", hereinafter 'Yin').
Regarding claim 2, De Brouwer, as modified by Liang, teaches The training method of claim 1.
De Brouwer, as modified by Liang, fails to teach wherein the aggregated model is arranged to receive as input feature values for a main feature set, a training sample in a client training set indicating values for multiple features, said multiple features being a subset of the main feature set, at least one client training set indicating feature values for features that are a strict subset of the main feature set.
Yin teaches wherein the aggregated model is arranged to receive as input feature values for a main feature set, a training sample in a client training set indicating values for multiple features, said multiple features being a subset of the main feature set, at least one client training set indicating feature values for features that are a strict subset of the main feature set ([A. BRIEF REVIEW OF FEDERATED LEARNING, pg. 195] The idea of federated learning exists for a long time in the context of distributed learning, and it was given the name by some researchers at Google in 2016 [2], [89]. Federated learning is a flexible and safe cooperation framework for mobile users. The idea behind the federated learning is to approximate a global model/objective as a summation of local models/objectives trained individually by mobile users. Mathematically, the above idea can be expressed as l(X, y; θ) ≈ K k=1 l (k) (Xk , yk ; θ), (16) where X is the complete set of the training inputs, y is the complete set of the training outputs, and they constitute the aggregated model is arranged to receive as input feature values for a main feature set complete training set D; l(·) is a global objective in terms of the model hyper-parameters θ; while a training sample in a client training set indicating values for multiple features Xk is the k-th local set of the training inputs, yk is the k-th local set of the training outputs, and they constitute Dk , which is a said multiple features being a subset of the main feature set subset of D; l(k) (·) is a at least one client training set indicating feature values for features that are a strict subset of the main feature set local objective of the k-th local dataset, Dk ; K is the total number of collaborating mobile users, which is assumed to be large. Both l(·) and l(k) (·) are composite functions of a selected learning model/regression function and a cost function. Lastly, we note that the outputs y are mostly positions or position related measurements in our work.).
De Brouwer, Liang, and Yin are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of De Brouwer and Liang, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Yin to De Brouwer before the effective filing date of the claimed invention in order to collaboratively build accurate location services without sacrificing user privacy, in particular, sensitive information related to their geographical trajectories (cf. Yin, [ABSTRACT, pg. 187] In this overview paper, data-driven learning model-based cooperative localization and location data processing are considered, in line with the emerging machine learning and big data methods. We first review (1) state-of-the-art algorithms in the context of federated learning, (2) two widely used learning models, namely the deep neural network model and the Gaussian process model, and (3) various distributed model hyper-parameter optimization schemes. Then, we demonstrate various practical use cases that are summarized from a mixture of standard, newly published, and unpublished works, which cover a broad range of location services, including collaborative static localization/fingerprinting, indoor target tracking, outdoor navigation using low-sampling GPS, and spatio-temporal wireless traffic data modeling and prediction. Experimental results show that near centralized data fitting- and prediction performance can be achieved by a set of collaborative mobile users running distributed algorithms. All the surveyed use cases fall under our newly proposed Federated Localization (FedLoc) framework, which targets on collaboratively building accurate location services without sacrificing user privacy, in particular, sensitive information related to their geographical trajectories. Future research directions are also discussed at the end of this paper.).
Claims 7, 9 are rejected under 35 U.S.C. 103 as being unpatentable over De Brouwer, in view of Liang, Yin, and further in view of Wang (NPL: "Interpret Federated Learning with Shapley Values").
Regarding claim 7, De Brouwer, as modified by Liang, teaches The training method of claim 1.
De Brouwer, as modified by Liang, fails to teach wherein obtaining feature weights comprises for multiple training samples applying the aggregated model to feature values in a training sample obtaining a model output, applying an explainability algorithm to obtain sample feature weights indicating the relative importance of the feature values for the training sample, combining the sample feature weights to obtain the feature weights.
Yin teaches wherein obtaining feature weights comprises for multiple training samples applying the aggregated model to feature values in a training sample obtaining a model output, combining the sample feature weights to obtain the feature weights ([III. LEARNING MODELS, pg. 190] This section aims to introduce wherein obtaining feature weights comprises for multiple training samples applying the aggregated model to feature values in a training sample obtaining a model output two representative learning models that can be used as the “brain” of the proposed FedLoc framework.; [A. DEEP NEURAL NETWORK, pg. 190-191] We will provide some concrete examples in Section IV, V and VI. Gradient descent type methods with back-propagation are commonly used to solve the above minimization problem in spite of its numerical instability caused by gradient vanishing or explosion. After the optimal set of weights ˆ θ is obtained, one can conduct applying an explainability algorithm to obtain sample feature weights indicating the relative importance of the feature values for the training sample prediction for a novel input x∗ using f (x∗; ˆ θ) given in Eq.(1).); [A. BRIEF REVIEW OF FEDERATED LEARNING, pg. 195] The idea of federated learning exists for a long time in the context of distributed learning, and it was given the name by some researchers at Google in 2016 [2], [89]. Federated learning is a flexible and safe cooperation framework for mobile users. The idea behind the federated learning is to approximate a global model/objective as a summation of local models/objectives trained individually by mobile users. Mathematically, the above idea can be expressed as l(X, y; θ) ≈ K k=1 l (k) (Xk , yk ; θ), (16) where X is the complete set of the training inputs, y is the complete set of the training outputs, and they constitute the complete training set D; l(·) is a global objective in terms of the model hyper-parameters θ; while Xk is the k-th local set of the training inputs, yk is the k-th local set of the training outputs, and they constitute Dk , which is a subset of D; l(k) (·) is a local objective of the k-th local dataset, Dk ; K is the total number of collaborating mobile users, which is assumed to be large.; I: [DNN model with the Least-Squares Cost., pg. 195] The combining the sample feature weights to obtain the feature weights global objective for training a DNN is given as follows: l(X, y; θ) = n i=1 (yi − f (xi; θ)) 2 , (17) where the outputs are assumed to be independent, and f (xi; θ) is represented by a DNN with L hidden layers [57] with θ = {W1,W2, ...,WL+1} representing the DNN weights to be tuned for all hidden layers and output layer. It is obvious that the global objective is already in form of sum-of-residualsquared.).
De Brouwer, Liang, and Yin are combinable for the same rationale as set forth above with respect to claim 2.
Wang teaches applying an explainability algorithm to obtain sample feature weights indicating the relative importance of the feature values for the training sample ([4 Interpreting Federated Learning Models] Most model interpretation methods can be directly used for Horizontal Federated Learning as all parties have data for the full feature space. There is no special issue for interpreting prediction results on both training data and new data, for both specific single predictions as granular check or for batch predictions as holistic check. Vertical Federated Learning raises new issues for Model Interpretation where the feature space is divided into different parties. Directly using methods like Shapley values for each prediction will very likely reveal the protected feature value from the other parties and cause privacy issues. It is not trivial to develop a safe mechanism for vertical Federated Learning and find a balance between model interpretation and data privacy. We propose a applying an explainability algorithm variant version of SHAP [Scott et a. 2017] to use Shapley value for FML model interpretation. We take dual-party (Host and Guest) vertical Federated Learning as an example, but the idea can be extended to multiple parties. Host owns the label data yi and part of the feature space xi H. Guest owns another part of the feature space xi G. Here i=1,…,n as we suppose host and guest have n overlapped instances with the same IDs. By using vertical FML, host and guest collaborate to develop a machine learning model for predicting Y. Now host wants to obtain sample feature weights indicating the relative importance of the feature values for the training sample interpret a specific prediction the model makes for instance x by looking at the Shapley values of the features. Instead of guest giving out the feature importance for all its feature space x G, we combine the feature space of x G as one united federated feature x fed , and compute the Shapley value for each of the host features x H and this federated feature x fed.),
De Brouwer, Liang, Yin, and Wang are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of De Brouwer, Liang, and Yin, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Wang to De Brouwer before the effective filing date of the claimed invention in order to balance the model interpretability and data privacy in vertical Federated Learning by using Shapley values to reveal detailed feature importance for host features and a unified importance value for federated guest features (cf. Wang, [Abstract] Federated Learning is introduced to protect privacy by distributing training data into multiple parties. Each party trains its own model and a meta-model is constructed from the sub models. In this way the details of the data are not disclosed in between each party. In this paper we investigate the model interpretation methods for Federated Learning, specifically on the measurement of feature importance of vertical Federated Learning where feature space of the data is divided into two parties, namely host and guest. For host party to interpret a single prediction of vertical Federated Learning model, the interpretation results, namely the feature importance, are very likely to reveal the protected data from guest party. We propose a method to balance the model interpretability and data privacy in vertical Federated Learning by using Shapley values to reveal detailed feature importance for host features and a unified importance value for federated guest features. Our experiments indicate robust and informative results for interpreting Federated Learning models.).
Regarding claim 9, De Brouwer, as modified by Liang, Yin, and Wang, teaches The training method of claim 7.
De Brouwer teaches wherein the training sample is a training sample in the client training set, and/or the training sample is a training sample in the base training set ([0054] In a federated workflow 915, we start with a the training sample is a training sample in the base training set base model 951 that may have been trained in this conventional manner. Once this base model 951 is trained, refinement can proceed without centrally collecting any further data. Instead, the base model is distributed to individual devices 953. These edge devices perform the training sample is a training sample in the client training set local training to generate local model updates 957, using data (not shown) that is on those devices. The federated workflow aggregates the local updates into a new global model 959 which will become our next base model 951 that will be used for inference and additional rounds 915 of training a federated loop.).
De Brouwer, Liang, Yin, and Wang are combinable for the same rationale as set forth above with respect to claim 7.
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over De Brouwer, in view of Liang, and further in view of Li et al. (NPL: "Federated Learning-Based Ultra-Short term load forecasting in power Internet of things", hereinafter 'Li').
Regarding claim 11, De Brouwer, as modified by Liang, teaches The training method of claim 1.
De Brouwer, as modified by Liang, fails to teach wherein the model is arranged for applying the model to a training sample indicating values for multiple features and not indicating a value for at least one missing feature, wherein applying the model comprises inputting an interpolated value for the missing feature, and/or inputting a signal indicating no feature value is indicated for a feature.
Li teaches wherein the model is arranged for applying the model to a training sample indicating values for multiple features and not indicating a value for at least one missing feature, wherein applying the model comprises inputting an interpolated value for the missing feature, and/or inputting a signal indicating no feature value is indicated for a feature ([A. Data preprocessing, pg. 65] The data considered in this model are load data of many different small regions, each small area is not necessarily in the same large area, there is weather information, the collection cycle is every other time T, and its collection frequency is f. the model is arranged for applying the model to a training sample indicating values for multiple features and not indicating a value for at least one missing feature Analyze the collected data and deal with the missing data. If the missing value of the data set is not more than one hour, it is applying the model comprises inputting an interpolated value for the missing feature processed according to the mean or interpolation near the missing value. If there is a constant missing value for more than one hour, the missing value of each column is filled in according to the average value of the column, which is formally described as: if DIDi,T == null then Mean(DIDi,T −1, DIDi,T +1) (3).).
De Brouwer, Liang, and Li are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of De Brouwer and Liang, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Li to De Brouwer before the effective filing date of the claimed invention in order to effectively generate accurate load forecasting and reduce the data security risk under the condition that the data of each edge node does not come out of its location (cf. Li, [Abstract, pg. 63] The stable and efficient management and dispatching of power system depend on the accurate short term load forecasting of the following few minutes to a week. With the rapid development of the power Internet of Things, the number of network edge devices and data volume has increased exponentially. However, the traditional centralized method cannot accurately grasp load variation patterns of all area, which entails storage pressure and delays of data calculation and transmission. In addition, the centralized method has potential data security risk for its transmitting and storing all data in the data center. The present research proposes an ultra-short term load forecasting method for the power Internet of Things based on federated learning, which learns the model parameters from the data distributed in multiple edge nodes. Simulation results show that the method effectively generates accurate load forecasting and reduces the data security risk under the condition that the data of each edge node does not come out of its location.).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
AbdulRahman et al. (NPL: “FedMCCS: Multicriteria Client Selection Model for Optimal IoT Federated Learning”) teaches FedMCCS, a multicriteria-based approach for client selection in FL.
Brisimi et al. (NPL: “Federated learning of predictive models from federated Electronic Health Records”) teaches a general decentralized optimization framework enabling multiple data holders to collaborate and converge to a common predictive model, without explicitly exchanging raw data.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAGGIE MAIDO whose telephone number is (703) 756-1953. The examiner can normally be reached M-Th: 6am - 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MM/Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129