DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 2/09/2026 has been entered.
Response to Arguments
Applicant's arguments filed 02/09/2026 have been fully considered but they are not persuasive.
Regarding the applicant arguments for 103, the applicant argues in page 8 and 9 “Applicant respectfully submits that Bhowmick, alone or in combination with Traistcyn, fails to teach the features of claim 1 which recites: … Claim I features determining, by individual clients of a federated machine learning system, respective counts representing a largest number of items of respective portions of training data of a single group of the training data, where the training data includes multiple nonzero groups and the respective counts are less than a total number of items in the respective portions of the dataset for each client. Applicant respectfully submits that the claim features respective counts at clients individually less than a total count of items of training data at those clients by virtue of multiple non-zero groups of data items and that the cited references fail to disclose such a count.” The prior art of record was applied based on the understanding of the previously presented claim 1 that as was given a 112(b) due to indefinite issues as explained in the office action dated 11/07/2025. The amended limitations in the current claim set dated 2/09/2026 have overcome the 112(b) issue. However applicant argues that the count should be non-zero groups as recited in the argument above based on the addition of the amendment “the count less than a total number of items in the portion of the dataset”. The argument is directed towards amended limitations and the limitation has not been examined making the argument moot. Also the applicant argued in page 9 line 8-10. “where the training data includes multiple nonzero groups and the respective counts are less than a total number of items in the respective portions of the dataset for each client”, yet the limitations of the claim 1 does not recited non-zero groups. The applicant continues to argue in page 9 paragraph 2 “In the rejection, the Action recognizes that Bhowmick fails to disclose determining such counts and instead cites the parallel accounting of Triastcyn. However, Triastcyn discloses "…. Triastcyn discloses using respective counts of total local dataset sizes. ... Applicant therefore respectfully submits that the cited references fail to teach or suggest training the received federated machine learning model using a portion of a dataset private to the respective client to generate one or more parameter updates, the portion of the dataset distributed among a plurality of groups individually comprising at least one item and determining a count of the largest number of items in the portion of the dataset associated with any single group of the plurality of groups of the portion of the dataset, the count less than a total number of items in the portion of the dataset, as claimed.” The applicant argued that Triastcyn fails to teach the addition of the amended limitation “the count less than a total number of items in the portion of the dataset”. However as explained previously amended limitations have not been examined rendering the argument moot and not persuasive. Applicant argues the same reason for analogous claims 8 and 15 in page 10 and the argument is not persuasive.
Regarding the applicant arguments for 101 in page 11 and 12, the applicant argues “The Office Action rejected claims 1 - 20 under 35 U.S.C. § 101 as allegedly directed to a judicial exception. Applicant respectfully traverses these rejections.
Applicant calls attention to the August 4, 2025 Memorandum on Subject Matter Eligibility which states on pages 2 - 3 ( emphasis added):…
Applicant notes that the Action identifies the training and determining steps of the independent claims as reciting mental processes despite the 2025 Memorandum, and Example 39, making clear that these steps do not recite mental processes. Regarding Prong I analysis of Example 39, the Subject Matter Eligibility Examples states (emphasis added): …
Applicant respectfully submits that claims are similar to Example 39 in that they do not recite any mathematical relationships, formulas, or calculations and no mathematical concepts are recited in the claims. Furthermore, the claims do not recite a mental process because the various steps, including the training and determining steps, are not practically performed in the human mind. Applicant therefore respectfully submits that the claims are patent-eligible and respectfully requests that the rejection of the claims under 35 U.S.C. § 101 be withdrawn.” However in claim 1 the limitation “determining a count of the largest number of items in the portion of the private dataset associated with any single group of the plurality of groups of the portion of the dataset, the count less than a total number of items in the portion of the dataset; and” under Step 2A Prong 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon? Yes as the limitation requires count of the largest time in a dataset and that is consider a mathematical process also at a high level determining a count is a mental process that requires identifying the largest count. The limitation “the count less than a total number of items in the portion of the dataset” requires the evaluation of the count to be less than a total number of items. Thus the argument is moot and not convincing see update 101 rejection.
Claim Objections
Claim 1, 8, and 15 objected to because of the following informalities:
Regarding claim 1 and analogous 8 and 15, the limitation “the count less than a total number of items in the portion of the dataset;” should say “the count being less than a total number of items in the portion of the dataset;”. Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claim 1, 8, and 15 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Regarding claim 1 and analogous claims 8, and 15, the limitation “the count less than a total number of items in the portion of the dataset;” does not have written description support in the specification. The specification specifies in para 0049 the following “FIG. 3 is a block diagram illustrating an embodiment implementing a federated machine learning system providing centralized subject-level Differential Privacy (Differential Privacy), in various embodiments. Embodiments of FIG. 3 may implement the following pseudo code:
PNG
media_image1.png
230
606
media_image1.png
Greyscale
” that teaches the differential privacy method in the form of the pseudo code. The count for the largest group count is define as
PNG
media_image2.png
29
143
media_image2.png
Greyscale
. However the count is just the count of the largest group and does not provide a method that shows how “the count less than a total number of items in the portion of the dataset” is determine based on the claim limitation “determining a count of the largest number of items in the portion of the dataset, the count less than a total number of items in the portion of the dataset;”.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea without significantly more. The claim(s) recite(s) significantly more. The subject matter eligibility test for products and process is describe below for claim 1 in view of dependent claims.
Regarding claim 1:
Step 1: Is the claim to a process machine manufacture or composition of matter?
Claim 1 recites a method, which is a process that fall under the statutory categories.
Step 2A Prong 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes – The claim recites the following:
“determining a count of the largest number of items in the portion of the private dataset associated with any single group of the plurality of groups of the portion of the dataset, the count less than a total number of items in the portion of the dataset;” - The limitations recites a mental process of determining a count (see MPEP 2106.04(a)(2)III) and a mathematical process of counting the amount of items in a single group MPEP 2106.04(a)(2)I.
Step 2 Prong 2: Does the claim recite additional elements that integrate the judicial exception into a particular application? No –
The claim includes the additional elements are:
“A computer-implemented method, comprising: receiving, at a plurality of clients of a federated machine learning system, a federated learning model from an aggregation server of the federated machine learning system”
Regarding the additional limitations of receiving federated learning model is considered as Insignificant Extra-Solution Activity as mere data gathering (see MPEP 2106.5(g)).
“training the received federated machine learning model using a portion of a dataset private to the respective client to generate one or more parameter updates,”
The additional element of training the received foddered machine learning model fall under “apply it” as using a computer to apply an exemption. See Mere Instructions to Apply an Exemption (MPEP 2106.05(f)).
(Examiner Note: The limitation only recites training at the client a machine learning model and does not provided any specificity to how the training is performed. Providing greater specificity to how the model is trained at the client side could provide significantly more.)
“the portion of the dataset distributed among a plurality of groups individually comprising at least one item;”
Regarding the additional limitations of receiving federated learning model is considered as Insignificant Extra-Solution Activity (see MPEP 2106.5(g)).
“sending data representing the one or more parameter updates to the aggregation server;”
Regarding the storing additional limitations mentioned above. They fall under Insignificant Extra-Solution Activity as mere data gathering (see MPEP 2106.5(g)).
“applying respective noise values, according to the determined respective counts of the largest number of items, to individual ones of the one or more parameter updates;”
The additional element of applying respective noise value fall under “apply it” as using a computer to apply an exemption. See Mere Instructions to Apply an Exemption (MPEP 2106.05(f)).
“receiving, at the aggregation server, the respective data representing the parameter updates from the plurality of clients;”
Regarding receiving the respective data, falls under Insignificant Extra-Solution Activity as mere data gathering (see MPEP 2106.5(g)).
“and revising the federated learning model according to an aggregation of the respective received data representing the parameter updates of the plurality of clients”
The additional element of revising the federated learning model according to the aggregation server fall under “apply it” as using a computer to apply an exemption. See Mere Instructions to Apply an Exemption (MPEP 2106.05(f)).
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No - The claim does not include additional elements that are sufficient to amount to a significantly more than the judicial exemption. As an order whole, the claim is directed to a mental and mathematical process of training a machine learning model. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements receiving, sending, applying and revising, fall under data gathering and apply it and do not limit the claim. The method does not improve on the function of a computer, transforms an article into another article, nor is it applied by a particular machine, making the claim not patent eligible.
(Examiner Note: The claim could provide a practical application by deploying the model for use after aggregation. Significantly more could also be provided by specifying how the differential privacy is perform to provide privacy.)
Regarding claim 2:
Step 2A Prong 2, Step 2B: The additional element is “wherein the applying of the respective noise values is performed by the respective clients prior to said sending” The additional element falls under the “apply it” as applying the noise values is performed by the clients and does not provide significantly more (MPEP 2106.05(f)). The judicial exemptions do not integrate into a practical application nor provide an improvement. The process does not provide an inventive concept nor provides a practical application.
Regarding claim 3:
Step 2A Prong 2, Step 2B:
“further comprising sending, from the respective clients of the plurality of clients, the respective counts of the largest number of items to the aggregation server,”
The additional element falls under the “apply it” by using generic computers to send the respective counts and does not provide significantly more (see MPEP 2106.05(f)). The judicial exemptions do not integrate into a practical application nor provide an improvement. The process does not provide an inventive concept nor provides a practical application.
“and wherein the applying of the respective noise values is performed by the aggregation server subsequent to receiving the respective data representing the parameter updates and the respective counts from the plurality of clients.”
The additional element falls under the “apply it” by using generic computer to apply the respective noise values and does not provide significantly more (see MPEP 2106.05(f)). The judicial exemptions do not integrate into a practical application nor provide an improvement. The process does not provide an inventive concept nor provides a practical application.
Regarding claim 4:
Step 2A Prong 1: The claim recites “wherein the applying of the respective noise values to the individual ones of the one or more parameter updates comprises determining the respective noise values in proportion to the respective counts of the largest number of items” - Determining the respective noise value in proportion to the respective counts, falls under the mental process (see MPEP 2106.04(a)(2)III).
Step 2A Prong 2, Step 2B: No additional elements are mentioned in the claim. These judicial exemptions do not integrate into a practical application nor provide an improvement. The process does not provide an inventive concept nor provides a practical application.
Regarding claim 5:
Step 2A Prong 2, Step 2B: The claim includes the additional elements are “wherein applying the respective noise values according to the respective counts to the individual ones of the one or more parameter updates provides differential privacy for the ” The additional element insignificant extra-solution activity (MPEP 2106.05(g)). The judicial exemptions do not integrate into a practical application nor provide an improvement. The process does not provide an inventive concept.
Regarding claim 6:
Step 2A Prong 2, Step 2B: The claim includes the additional elements:
“wherein the receiving, training, determining, sending, applying, receiving and revising are performed for a single training round of a plurality of training rounds of the federated machine learning system,” The additional element insignificant extra-solution activity (MPEP 2106.05(g)). The additional element do not integrate the judicial exception into a practical application because provide an improvement. The process does not provide an inventive concept.
“and wherein individual ones of the plurality of training rounds use different portions of the plurality of clients” - The additional element insignificant extra-solution activity (MPEP 2106.05(g)). The additional element do not integrate the judicial exception into a practical application because provide an improvement. The process does not provide an inventive concept.
Regarding claim 7:
Step 2A Prong 2, Step 2B: The claim includes the additional elements are “wherein training the received machine learning model comprises using a mini-batch of the dataset private to the respective client to generate the one or more parameter updates.”. The additional elements fall under insignificant extra-solution activity (see MPEP 2106.05(g)). The additional elements do not integrate the judicial exception into a practical application because it does not provide an improvement. The process does not provide an inventive concept.
Claims 8-14 recites a system and are analogous to the method claims 1-7. Therefore, the rejections of claims 1-7 above applies to claims 8-14.
Claims 15-20 recites computer readable medium product and are analogous to the method claims 1-7. Therefore, the rejections of claims 1-11 above applies to claims 15-20.
Allowable Subject Matter
Claim 3 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 101 and 112(a) set forth in this Office action and to include all of the limitations of the base claim and any intervening claims. Furthermore the allowable subject matter is in regards to 35 U.S.C. 102/103 only. The Rejections under 101 are present and explained above. The prior art does not tech the particular features of claim 3 however the preset rejections under 101 must be overcome. Bhowmick , Kuo and Triastcyn are used to teach all the limitations of claim 1, but claim 3 requires sending from the clients “the respective counts of the largest number of items to the aggregation server” when performing federated learning. In particular the limitation of sending the largest count is not found in the prior art and/or would not be obvious to one of ordinary skill in the art absent impressible hindsight.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 2, 4-6, 8-13, and 15-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bhowmick et al . (US20210166157A1) (‘Bhowmick’’) in view of Yu-Hsuan Kuo, Cho-Chun Chiu, Daniel Kifer, Michael Hay, Ashwin Machanavajjhala "Differentially Private Hierarchical Count-of-Counts Histograms" arXiv:1804.00370v2 [cs.DB] 13 Sep 2018 (“Kuo”) and A. Triastcyn and B. Faltings, "Federated Learning with Bayesian Differential Privacy," 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 2019, pp. 2587-2596 (“Triastcyn”).
Regarding claim 1 and analogous claims 8 and 15, as best understood based on the 112(b) issues identified above, Bhowmick teaches a computer-implemented method, comprising: receiving, at a plurality of clients of a federated machine learning system, a federated learning model from an aggregation server of the federated machine learning system (Bhowmick Para 0005, Various embodiments of a private federated learning system with protections against reconstruction attacks will be described herein.
0006, One embodiment provides for a data processing system comprising a memory to store instructions and one or more processors to execute the instructions . The instructions cause the one or more processors to receive a machine learning model from a server at a client device [A computer-implemented method, comprising: receiving, at a plurality of clients of a federated machine learning system], train the machine learning model using local data at the client device to generate a trained machine learning model , generate an update for the machine learning model , the update including a weight vector that represents a difference between the machine learning model and the trained machine learning model , privatize the update for the machine learning model, and transmit the privatized update for the machine learning model to the server [a federated learning model from an aggregation server of the federated machine learning system])
performing at respective clients of the plurality of clients: training the received federated machine learning model using a portion of a dataset private to the respective client to generate one or more parameter updates, (Bhowmick Para. 0021, Federated learning , also referred to as distributed learning , focuses on learning problems when data is distributed across many devices. Let θ
∈
Θ be model parameters for a particular model that is to be trained using user data. In federated learning, the model parameters θ are transmitted to each device and then each device trains a local model using its local data [using a portion of a dataset private to the respective client]. The locally trained model on device i becomes
θ
i
[to generate one or more parameter updates]; and the difference between the starting model parameters and the locally trained parameters
∆
i
=
θ
i
-
θ
are sent to a central server [training the received machine learning model]. A collection of b model differences {
∆
1
,
…
,
∆
b
} is then aggregated to obtain
∆
on a central server which is then used to update the central model parameters
θ
←
θ
+
∆
. The update to the shared model can then be deployed to the mobile devices . This feedback loop continues as the model improves and the user data changes);
the portion of the dataset distributed among a plurality of groups individually comprising at least one item; (Bhowmick para 0040 line 1-7, With respect to the operations (block 402) to perform on-device training, user data is leveraged to update central model parameters 8. Let user i have a dataset with
n
i
, example-label pairs x,={
x
i
, ... , x,
x
i
,
n
i
}, each
i
∈
B
performs
θ
i
←
U
p
d
a
t
e
(
x
,
θ
;
H
)
where Update denotes the update rule on each device. Any update procedure can be used. (Examiner Note: The dataset will comprise at least one item to perform device training));
sending data representing the one or more parameter updates to the aggregation server (Bhowmick Para. 0034 line 1-15, The local machine learning modules 136a - 136n , 137a - 137n , 138a - 138n on each client device can generate model updates that are privatized by the client devices 110a - 110n , 111a - 111n , 112a - 112n before transmission to the server 130. For example , client devices 110a - 110n can each send privatized model updates 121 , client devices 111a - 111n can each send privatized model updates 122 , and client devices 112a - 112n can each send privatized model updates 123. The privatized model updates can be sent through the network 120 to the server 130 , where the updates can be processed into an aggregated model update 135. Updates are sent to the server while satisfying separated differential privacy for the local updates and no raw data for users is transmitted to the server [sending data representing the one or more parameter updates]. Separated differential privacy is used to protect the privatized model updates 121 , 22 , 123 from a reconstruction breach);
receiving, at the aggregation server, the respective data representing the parameter updates from the plurality of clients (Bhowmick Para. 0036 line 15 -28,
The privatized model updates 212a - 212c can be stripped of their IP addresses or other information that can be used to identify the client devices 210 prior to entering an ingestor 232 on the server 130. The ingestor 232 can collect the data from the client devices 210 and remove metadata and forwards the data to an aggregator 233. The aggregator takes [receiving] the privatized model updates and aggregates them to form a single update to the current server model , which in the initial round is machine learning model 131 ( e.g. , model MO ) [the respective data representing the parameter updates from the plurality of clients]. A model updater 234 can then apply the updates to the current server model to generate an updated machine learning model 235 ( e.g. , model M1 ). The privatized model updates 212a - 212c can be protected using separated differential privacy as described herein . The aggregated model updates and / or updated machine learning model 235 can be protected using the central model of differential privacy);
and revising the federated learning model according to an aggregation of the respective received data representing the parameter updates of the plurality of clients (Bhowmick Para. 0039, The individual model updates can then be privatized on the set of client devices using separated differential privacy ( block 404 ). The privatized model updates are then sent from the set of client devices to a central learning server ( block 405 ) . A collection of b model differences {
∆
1
,
…
,
∆
b
} [representing the parameter updates of the plurality of clients] is then aggregated to obtain an aggregate model update
∆
on the central server ( block 406 ). The aggregate model update can then be used to update the central model parameters
∅
←
θ
+
∆
[revising the federated learning model according to an aggregation of the respective received data]. The update to the shared model can then be deployed to the client devices. These operations can continue in a loop continues as the model improves and user data changes.).
Bhowmick does not explicitly teach determining a count of the largest number of items in the portion of the private dataset associated with any single group of one or more groups of the portion of the dataset; and applying respective noise values, according to the determined respective counts of the largest number of items, to individual ones of the one or more parameter updates;
However Kuo teaches determining a count of the largest number of items in the portion of the dataset, the count less than a total number of items in the portion of the dataset (Kuo page 1 1. INTRODUCTION para 2, However, an important class of queries that has been under-studied are hierarchical count-of-counts histograms, which are used to study the skewness of a distribution. The 2010 Decennial Census published 33 tables related to such queries [9], but these tables were truncated because formal privacy methods for protecting such tables did not exist. To get a count-of-counts histogram, one _rst aggregates records in a table R into groups (e.g., A=SELECT groupid, COUNT(*) AS size FROM R GROUPBY groupid) and then forms a histogram on the groups (H=SELECT size, COUNT(*) FROM A GROUPBY size). Thus H can be treated as an array, where H[i] is the number of groups of size i. When there is a hierarchical attribute associated with each group such as location, the goal is to estimate a histogram H for every element in the hierarchy and enforce consistency: the sum of the histograms at the children equals the histogram at the parent. For example, consider a table Persons(person id, group id, location), with hierarchy national/state/county on the lo- cation attribute. A hierarchical count-of-counts histogram on this table would ask: for each geographic region (national, state, county) and every j, how many households (i.e. groups) in that region have j people.
PNG
media_image3.png
386
709
media_image3.png
Greyscale
[determining a count of the largest number of items in the portion of the dataset]
4.3 Cumulative Histogram para 5, Recall K is an upper bound on the maximum number of people in a group. This method is not very sensitive to K { in the experiments we used K = 100; 000 on datasets where the largest group had around 10,000 people { an order of magnitude di_erence and still the estimated size of the largest group ended up being around 10,000. Thus if we have no prior knowledge, we can estimate K as follows. Set aside a small privacy budget, since K does not need much accuracy (e.g.,
ϵ
=
10
-
4
). Let X be the number of people in the largest group [determining a count of the largest number of items in the portion of the dataset].
PNG
media_image4.png
449
703
media_image4.png
Greyscale
[the count less than a total number of items in the portion of the dataset]));
Bhowmick and Kuo are considered to be analogous to the claim invention because they are in the same field of Differential Privacy. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Bhowmick to incorporate the teachings of Kuo and disclose counting items in the dataset to determine the largest item. Doing so to scale the noise correctly and satisfy differential privacy (Theorem 1. Algorithm 1 satisfies
ϵ
-differential privacy. Proof. Privacy is easy to analyze because the algorithm separates the differentially private data access from the post- processing. Specifically, the only part of the algorithm that touches the sensitive data occurs in Lines 1 through 4. It uses sequential composition across levels of the hierarchy. Thus each of the L + 1 levels is assigned
ϵ
/(L + 1) of the privacy budget. Within each level there is parallel composition because adding or removing one person from a group only affects the node that contains that group (and none of the sibling nodes). The count-of-counts histograms produced at each node use either the method of Section 4.2 or Section 4.3) with privacy budget
ϵ
=(L + 1) and they scale the noise correctly to the global sensitivity, as discussed in those sections.)
applying respective noise values, according to the determined respective counts of the largest number of items, to individual ones of the one or more parameter updates (Triastcyn Page 2591, V. Federated Learning with Bayesian Differential Privacy, A. Client Privacy, Para 2 and 3
Under the classic DP [10], [11], the privacy is enforced by clipping all user updates ui to a fixed L2-norm threshold C and then adding Gaussian noise with the variance C2σ2.The noise parameter σ is calibrated to bound the privacy loss in each communication round, and then the privacy loss is accumulated across the rounds using the moments accountant [7]. We use the same privacy mechanism, but employ the Bayesian accounting method instead of the moments accountant. Intuitively, our accounting method should have a significant advantage over the moments accountant in the settings where data is distributed similarly across the users because in this case their updates would be in a strong agreement. In order to map the Bayesian differential privacy framework to this setting, let us introduce some notation.
Page 2592, V. Federated Learning with Bayesian Differential Privacy, B. Instance Privacy Para. 3 In order to get tighter instance privacy guarantees, we apply the subsampled Gaussian noise [applying respective noise values] mechanism to gradient computation on user devices [to individual ones of the one or more parameter updates]. The accounting follows the same procedure as described above, except that the noise parameter σ and the sampling probability q may be different, depending on which of the settings described below is used. ) There are two possible accounting schemes:
Triastcyn Page 2591, V. Federated Learning with Bayesian Differential Privacy, B. Instance Privacy Para. 7
2) Parallel accounting: In this scheme, every client computes
c
^
(
λ
) using q =
B
i
/
N
i
, where
N
i
is the local dataset size of the client. Consequently, since
N
i
≤ N, the privacy costs will be higher. But this is compensated by using parallel composition instead of sequential: the server aggregates the maximum of
c
^
(
λ
) over all users. Again, using secure multiparty computation 75 is possible to prevent the server from learning individual privacy costs);
Bhowmick and Triastcyn are considered to be analogous to the claim invention because they are in the same field of time Federated Learning with Differential Privacy. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Bhowmick to incorporate the teachings of Triastcyn and disclose counting the size of the dataset. Doing so to lower the amount of noise used and model accuracy (Triastcyn Abstract, We consider the problem of reinforcing federated learning with formal privacy guarantees. We propose to employ Bayesian differential privacy, a relaxation of differential privacy for similarly distributed data, to provide sharper privacy loss bounds. We adapt the Bayesian privacy accounting method to the federated setting and suggest multiple improvements for more efficient privacy budgeting at different levels. Our experiments show significant advantage over the state-of-the-art differential privacy bounds for federated learning on image classification tasks, including a medical application, bringing the privacy budget below ε = 1 at the client level, and below ε = 0.1 at the instance level. Lower amounts of noise also benefit the model accuracy and reduce the number of communication rounds.).
Regarding claim 2 and analogous claim 9 and 16, as best understood based on the 112(b) issues identified above, Bhowmick in view of Triastcyn and Kuo teach the computer-implemented method of claim 1.
Bhowmick teaches wherein the applying of the respective noise values is performed by the respective clients prior to said sending (Bhowmick Para. 0039 line 24-28 The individual model updates can then be privatized on the set of client devices using separated differential privacy (block 404). The privatized model updates are then sent from the set of client devices to a central learning server (block 405)(i.e. prior to said sending noise is added to the updates)
Para. 0063 line 1-14, FIG. 5B illustrates a method 510 of privatizing model updates, according to an embodiment. Method 510 can be performed on a client device (e.g., client device 310) of the set of client devices selected to send a model update to a model update server (e.g., server 130). In one embodiment, method 510 includes an operation (block 511) to obtain a weight vector that represents a difference between a previous and recently trained model. This difference represents the model update that is to be transmitted to the learning server to update the current server model. Method 510 additionally includes an operation (block 512) to decompose the weight vector into a unit vector ( e.g., unit direction
U
=
W
/
W
2
) and a magnitude (e.g., radius R=
W
2
).
Para 0064, Method 510 additionally includes an operation (block 514) to separately privatize the magnitude. The magnitude can be privatized via mapping mechanism
M
2
:
R
→
Z
2
described above. The mapping mechanism can be based on relative noise (PrivMagn) and will be privatized [applying of the respective noise values] based on assumptions made about the availability of the data available to the attacker).
Regarding claim 4 and analogous claims 11 and 18, Bhowmick in view of Triastcyn and Kuo teach the computer-implemented method of claim 1.
Triastcyn teaches wherein the applying of the respective noise values to the individual ones of the one or more parameter updates comprises determining the respective noise values in proportion to the respective counts of the largest number of items (
Triastcyn Page 2591,
PNG
media_image5.png
182
489
media_image5.png
Greyscale
Page 2592,
PNG
media_image6.png
399
484
media_image6.png
Greyscale
[determining the respective noise values in proportion]
(2) Parallel accounting: In this scheme, every client computes
c
^
(
λ
) using q =
B
i
/
N
i
, where
N
i
is the local dataset size of the client [the respective counts of the largest number of items]. Consequently, since
N
i
≤ N, the privacy costs will be higher. But this is compensated by using parallel composition instead of sequential: the server aggregates the maximum of
c
^
(
λ
) over all users. Again, using secure multiparty computation is possible to prevent the server from learning individual privacy costs)).
Bhowmick, Kuo, and Triastcyn are combined with the same rationale as set forth above with respect to claim 1 and analogous claims 8 and 15.
Regarding claim 5 and analogous claims 12 and 17, Bhowmick in view of Triastcyn and Kuo teach the computer-implemented method of claim 1.
Triastcyn teaches wherein applying the respective noise values according to the respective counts to the individual ones of the one or more parameter updates provides differential privacy for the (Triastcyn Page 2591, A. Client privacy
When it comes to reinforcing federated learning with differential privacy, the foremost attention is given to the client level privacy [10], [11]. The goal is to hide the presence of a single user, or to be more specific, to bound the influence of any single user on the learning outcome distribution (i.e. the distribution of the model parameters) [the respective client].
Page 2592 V. Federated Learning with Bayesian Differential Privacy A. Client Privacy
PNG
media_image7.png
666
794
media_image7.png
Greyscale
[provides differential privacy for the private dataset of the respective client]).
Bhowmick, Kuo, and Triastcyn are combined with the same rationale as set forth above with respect to claim 1 and analogous claims 8 and 15.
Regarding claim 6 and analogous claim 13, as best understood based on the 112(b) issues identified above, Bhowmick in view of Triastcyn and Kuo teach the computer-implemented method of claim 1.
Bhowmick teaches wherein the receiving, training, [determining], sending, [applying], receiving and revising are performed for a single training round of a plurality of training rounds of the federated machine learning system (Bhowmick Para. 0006 [receiving], 0021 [training], 0034 [sending], 0036 [receiving], and 0039 [revising] (See figure 1 and 3),
Para. 0036,
FIG. 2 illustrates an additional system 200 to enable private federated learning, according to embodiments described herein. In one embodiment, the system 200 includes a set of client devices 210a-210c (collectively, 210), which can be any of the client devices described above (e.g., client devices 110a-110n, 111a-111n, 112a-112n). The client devices 210, using the techniques described above, can generate privatized model updates 212a-212c (e.g., privatized model update 212a from client device 210a, privatized model update 212b from client device 210b, privatized model update 212c from client device 210c and forwards the data to an aggregator 233. The aggregator takes the privatized model updates and aggregates them to form a single update to the current server model, which in the initial round is machine learning model 131 (e.g., model MO). A model updater 234 can then apply the updates to the current server model to generate an updated machine learning model 235 (e.g., model Ml). The privatized model updates 212a-212c can be protected using separated differential privacy as described herein. The aggregated model updates and/or updated machine learning model 235 can be protected using the central model of differential privacy.), which can be transmitted to the server 130 via the network 120. The privatized model updates 212a-212c can be stripped of their IP addresses or other information that can be used to identify the client devices 210 prior to entering an ingestor 232 on the server 130. The ingestor 232 can collect the data from the client devices 210 and remove metadata [performed for a single training round of a plurality of training rounds of the federated machine learning system]),
(Bhowmick Para. 0040, With respect to the operations (block 401) to transmit the machine learning model to the set of client devices, the server has some global model parameters θ
∈
R
d
which can be the model weights for each layer of a neural net or it can be the model weights of just the last layer in the case of transfer learning. Each model can have a particular neural net architecture and loss function, which in one embodiment are assumed to be consistent across devices. Each model also has some set of hyperparameters H which will include parameters such as learning rate, dropout rate, mini-batch size, number of rounds for training, trainable parameters, etc. These hyperparameters are tuned on the server and sent along with the current server model θ
∈
R
d
to each device. The server then sends the current model θ to a batch of devices, where each device will train the model using local data. The batch of devices will be of expected size q·N where N is the total number of users opted in for training and q is the subsampling rate, so that user i will be selected for local training with probability q [(i.e. the model uses different portions clients)]. The selected batch can be denoted as
B
⊆
[
n
]
[and wherein individual ones of the plurality of training rounds use different portions of the plurality of clients].)
Bhowmick does not explicitly teach [wherein the receiving, training,] determining, [sending,] applying[, receiving and revising are performed for a single training round of a plurality of training rounds of the federated machine learning system].
However Kuo teaches wherein the receiving, training,] determining, [sending,] [applying, receiving and revising are performed for a single training round of a plurality of training rounds of the federated machine learning system] ((Kuo page 1 1. INTRODUCTION para 2, However, an important class of queries that has been under-studied are hierarchical count-of-counts histograms, which are used to study the skewness of a distribution. The 2010 Decennial Census published 33 tables related to such queries [9], but these tables were truncated because formal privacy methods for protecting such tables did not exist. To get a count-of-counts histogram, one _rst aggregates records in a table R into groups (e.g., A=SELECT groupid, COUNT(*) AS size FROM R GROUPBY groupid) and then forms a histogram on the groups (H=SELECT size, COUNT(*) FROM A GROUPBY size). Thus H can be treated as an array, where H[i] is the number of groups of size i. When there is a hierarchical attribute associated with each group such as location, the goal is to estimate a histogram H for every element in the hierarchy and enforce consistency: the sum of the histograms at the children equals the histogram at the parent. For example, consider a table Persons(person id, group id, location), with hierarchy national/state/county on the lo- cation attribute. A hierarchical count-of-counts histogram on this table would ask: for each geographic region (national, state, county) and every j, how many households (i.e. groups) in that region have j people.
PNG
media_image3.png
386
709
media_image3.png
Greyscale
4.3 Cumulative Histogram para 5, Recall K is an upper bound on the maximum number of people in a group. This method is not very sensitive to K { in the experiments we used K = 100; 000 on datasets where the largest group had around 10,000 people { an order of magnitude di_erence and still the estimated size of the largest group ended up being around 10,000. Thus if we have no prior knowledge, we can estimate K as follows. Set aside a small privacy budget, since K does not need much accuracy (e.g.,
ϵ
=
10
-
4
). Let X be the number of people in the largest group [determining]).
However Triastcyn teaches [wherein the receiving, training, determining, sending,] applying[, receiving and revising are performed for a single training round of a plurality of training rounds of the federated machine learning system] (Triastcyn Page 2591, V. Federated Learning with Bayesian Differential Privacy, B. Instance Privacy Para. 7
2) Parallel accounting: In this scheme, every client computes
c
^
(
λ
) using q =
B
i
/
N
i
, where
N
i
is the local dataset size of the client. Consequently, since
N
i
≤ N, the privacy costs will be higher. But this is compensated by using parallel composition instead of sequential: the server aggregates the maximum of
c
^
(
λ
) over all users. Again, using secure multiparty computation is possible to prevent the server from learning individual privacy costs)
Under the classic DP [10], [11], the privacy is enforced by clipping all user updates ui to a fixed L2-norm threshold C and then adding Gaussian noise with the variance C2σ2.The noise parameter σ is calibrated to bound the privacy loss in each communication round, and then the privacy loss is accumulated across the rounds using the moments accountant [7]. We use the same privacy mechanism, but employ the Bayesian accounting method instead of the moments accountant. Intuitively, our accounting method should have a significant advantage over the moments accountant in the settings where data is distributed similarly across the users because in this case their updates would be in a strong agreement. In order to map the Bayesian differential privacy framework to this setting, let us introduce some notation.
Page 2592, V. Federated Learning with Bayesian Differential Privacy, B. Instance Privacy Para. 3 In order to get tighter instance privacy guarantees, we apply the subsampled Gaussian noise [applying] mechanism to gradient computation on user devices. The accounting follows the same procedure as described above, except that the noise parameter σ and the sampling probability q may be different, depending on which of the settings described below is used.)).
Bhowmick, Kuo, and Triastcyn are combined with the same rationale as set forth above with respect to claim 1 and analogous claims 8 and 15.
Regarding claim 10, Bhowmick in view of Triastcyn and Kuo teach the computer-implemented method of claim 8.
Bhowmick teaches wherein the aggregation server is further configured to apply the respective noise values to individual ones of the one or more parameter updates (Bhowmick 0051, Aggregate model privatization is performed by incorporating central differential privacy into the separated differential privacy model used for model updates.
PNG
media_image8.png
320
338
media_image8.png
Greyscale
[apply the respective noise values to individual ones of the one or more parameter updates]).
Regarding claim 19, Bhowmick in view of Triastcyn and Kuo teach the computer-implemented method of claim 17.
Triastcyn teaches wherein the respective noise values are determined according to a gaussian distribution( Triastcyn Page 2591, V. Federated Learning with Bayesian Differential Privacy A. Client Privacy Para 1-3,
When it comes to reinforcing federated learning with differential privacy, the foremost attention is given to the client level privacy [10], [11]. The goal is to hide the presence of a single user, or to be more specific, to bound the influence of any single user on the learning outcome distribution (i.e. the distribution of the model parameters).
Under the classic DP [10], [11], the privacy is enforced by clipping all user updates ui to a fixed L2-norm threshold C and then adding Gaussian noise with the variance C2σ2 [noise values are determined according to a gaussian distribution]. The noise parameter σ is calibrated to bound the privacy loss in each communication round, and then the privacy loss is accumulated across the rounds using the moments accountant [7].
We use the same privacy mechanism, but employ the Bayesian accounting method instead of the moments accountant. Intuitively, our accounting method should have a significant advantage over the moments accountant in the settings where data is distributed similarly across the users because in this case their updates would be in a strong agreement. In order to map the Bayesian differential privacy framework to this setting, let us introduce some notation).
Bhowmick, Kuo, and Triastcyn are combined with the same rationale as set forth above with respect to claim 1 and analogous claims 8 and 15.
Claim(s) 7, 14, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bhowmick in view of Triastcyn and Kuo and further in view of H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. "Learning Differentially Private Recurrent Language Models". In 6th International Conference on Learning Representations, ICLR 2018, pages 1-14 (“McMahan”).
Regarding claim 7 and analogous 14 and 20, as best understood based on the 112(b) issues identified above, Bhowmick in view of Triastcyn and Kuo teach the computer-implemented method of claim 1.
Bhowmick, Kuo and Triastcyn are combined in the same rationale as set forth above with respect to claim 1 and analogous claims 8 and 15.
Bhowmick do not explicitly teach wherein training the received federated machine learning model comprises using a mini-batch of the dataset private to the respective client to generate the one or more parameter updates (McMahan Page 3, 2 Algorithms for User-Level Differentially Private Training, Para. 3,
We also evaluate the FederatedSGD algorithm, essentially large-batch SGD where each minibatch is composed of "microbatches" that include data from a single distinct user. In some datacenter applications FedSGD might be preferable to Fed.Avg, since fast networks make it more practical to run more iterations. However, those additional iterations come at a privacy cost. Further, the privacy benefits of federated learning are nicely complementary to those of differential privacy, and Fed.Avg can be applied in the datacenter as well, so we focus on this algorithm while showing that our results also extend to FedSGD
Page 4. Algorithm 1,
PNG
media_image9.png
235
474
media_image9.png
Greyscale
[a mini-batch of the dataset private to the respective client to generate the one or more parameter updates]).
Bhowmick and McMahan are considered to be analogous to the claim invention because they are in the same field of time Federated Learning with Differential Privacy. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Bhowmick to incorporate the teachings of McHam and disclose using mini-batches. Doing so to use faster networks and make it more practical to run more iterations (McMahan Page 3, 2 Algorithms for User-Level Differentially Private Training, Para. 3 line 1-4, We also evaluate the FederatedSGD algorithm, essentially large-batch SGD where each minibatch is composed of "microbatches" that include data from a single distinct user. In some datacenter applications FedSGD might be preferable to Fed.Avg, since fast networks make it more practical to run more iterations).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALFREDO CAMPOS whose telephone number is (571)272-4504. The examiner can normally be reached 7:00 - 4:00 pm M - F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J. Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ALFREDO CAMPOS/ Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129