DETAILED ACTION
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
2. Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
3. The information disclosure statement (IDS) submitted on 03/03/2026 has been received, entered into the record, and considered. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Continued Examination Under 37 CFR 1.114
4. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 03/03/2026 has been entered.
Response to Amendment
5. Receipt of Applicant’s Amendment filed on 02/05/2026 is acknowledged. The amendment includes the amending of claims 1, 9, 16, 19, 24, and 27.
Claim Rejections - 35 USC § 102
6. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
7. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
8. Claims 16, 19-24, and 27-30 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by McMahan et al. (U.S. PGPUB 2019/0340534).
9. Regarding claims 16 and 24, McMahan teaches a method and processing device comprising:
A) for each respective machine learning model of a plurality of machine learning models: for each respective remote processing device of a plurality of remote processing devices: sending, from a server to the respective remote processing device, an initial set of global model parameters for the respective machine learning model (Paragraphs 28 and 83); and
B) receiving, at the server from the respective remote processing device, a locally updated set of model parameters for the respective machine learning model (Paragraphs 28-29 and 88); and
C) updating, at the server, the respective machine learning model based on the locally updated set of model parameters received from each remote processing device of the plurality of remote processing devices and a corresponding density estimator to generate an updated set of global model parameters (Paragraphs 29-30, 46, and 89);
D) wherein the density estimator comprises a probability parameterized by weighting parameters for the respective machine learning model (Paragraphs 29-30, 46, and 89); and
E) sending, from the server to each remote processing device of the plurality of remote processing devices, the updated set of global model parameters for each machine learning model of the plurality of machine learning models (Paragraphs 28, 83, and 90).
The examiner notes that McMahan teaches “for each respective machine learning model of a plurality of machine learning models: for each respective remote processing device of a plurality of remote processing devices: sending, from a server to the respective remote processing device, an initial set of global model parameters for the respective machine learning model” as “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data” (Paragraph 28) and “At (310), method (300) can include providing the global model to each client device, and at (312), method (300) can include receiving the global model” (Paragraph 83). The examiner further notes that the initial distribution of an initial global model (which entails an initial set of parameters) from a server to multiple clients (i.e. remote processing devices) teaches the claimed sending. The examiner further notes that McMahan teaches “receiving, at the server from the respective remote processing device, a locally updated set of model parameters for the respective machine learning model” as “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data. The updated local models are W.sub.t.sup.1, W.sub.t.sup.2, . . . , W.sub.t.sup.n” (Paragraph 28), “Each client then sends the update back to the server, where the global update is computed by aggregating all the client-side updates” (Paragraph 29), and “At (318), method (300) can include receiving, by the server, the local update. In particular, the server can receive a plurality of local updates from a plurality of client devices” (Paragraph 88). The examiner further notes that the server obtaining subsequent transmitted client updates (i.e. updated set of model parameters that are locally updated at each client) of its local model from each client teaches the claimed receiving. The examiner further notes that McMahan teaches “updating, at the server, the respective machine learning model based on the locally updated set of model parameters received from each remote processing device of the plurality of remote processing devices and a corresponding density estimator to generate an updated set of global model parameters” as “Each client then sends the update back to the server, where the global update is computed by aggregating all the client-side updates… in some implementations, a weighted sum might be used to replace the average based on desired performance” (Paragraphs 29-30), “Another way of encoding the updates is by quantizing the weights. For example, the weights can be probabilistically quantized” (Paragraph 46), and “At (320), method (300) can include again determining the global model. In particular, the global model can be determined based at least in part on the received local update(s). For instance, the received local updates can be aggregated to determine the global model. The aggregation can be an additive aggregation and/or an averaging aggregation. In particular implementations, the aggregation of the local updates can be proportional to the partition sizes of the data examples on the client devices” (Paragraph 89). The examiner further notes that the aggregation of the received weighted client updates (which entails locally updated model parameters) results in a generation of an updated global model. Moreover, the instant specification merely mentions the claimed density estimator without defining what it constitutes (See Paragraph 78). Specifically, the claimed density estimator is labeled as a mathematical value (See p(x|ϕ.sub.k)) without defining what this mathematical value actually constitutes. Thus, the calculated weights that are applied to the probabilistic updates teaches the claimed density estimator in the broadest reasonable interpretation. The examiner further notes that McMahan teaches “wherein the density estimator comprises a probability parameterized by weighting parameters for the respective machine learning model” as “Each client then sends the update back to the server, where the global update is computed by aggregating all the client-side updates… in some implementations, a weighted sum might be used to replace the average based on desired performance” (Paragraphs 29-30), “Another way of encoding the updates is by quantizing the weights. For example, the weights can be probabilistically quantized” (Paragraph 46), and “At (320), method (300) can include again determining the global model. In particular, the global model can be determined based at least in part on the received local update(s). For instance, the received local updates can be aggregated to determine the global model. The aggregation can be an additive aggregation and/or an averaging aggregation. In particular implementations, the aggregation of the local updates can be proportional to the partition sizes of the data examples on the client devices” (Paragraph 89). The examiner further notes that the aggregation of the received weighted client updates (which entails locally updated model parameters) results in a generation of an updated global model. Moreover, the instant specification merely mentions the claimed density estimator without defining what it constitutes (See Paragraph 78). Specifically, the claimed density estimator is labeled as a mathematical value (See p(x|ϕ.sub.k)) without defining what this mathematical value actually constitutes. Thus, the calculated weights that are applied to the probabilistic updates teaches the claimed density estimator in the broadest reasonable interpretation. The examiner further notes that McMahan teaches “sending, from the server to each remote processing device of the plurality of remote processing devices, the updated set of global model parameters for each machine learning model of the plurality of machine learning models” as “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data” (Paragraph 28), “At (310), method (300) can include providing the global model to each client device, and at (312), method (300) can include receiving the global model” (Paragraph 83), and “Any number of iterations of local and global updates can be performed. That is, method (300) can be performed iteratively to update the global model based on locally stored training data over time” (Paragraph 90). The examiner further notes that the iterative distribution of a global model (which entails a set of updated parameters) from a server to multiple clients (i.e. remote processing devices) teaches the claimed sending.
Regarding claims 19 and 27, McMahan further teaches a method and computer and processing device comprising:
A) determining prior mixture weights for the respective machine learning model based on the density estimator (Paragraphs 29-30 and 46).
The examiner notes that McMahan teaches “determining prior mixture weights for the respective machine learning model based on the density estimator” as “Each client then sends the update back to the server, where the global update is computed by aggregating all the client-side updates…in some implementations, a weighted sum might be used to replace the average based on desired performance” (Paragraphs 29-30) and “Another way of encoding the updates is by quantizing the weights. For example, the weights can be probabilistically quantized” (Paragraph 46). The examiner further notes that a calculated weighted sum entails determining weights for the models. Moreover, the instant specification merely mentions the claimed density estimator without defining what it constitutes (See Paragraph 78). Specifically, the claimed density estimator is labeled as a mathematical value (See p(x|ϕ.sub.k)) without defining what this mathematical value actually constitutes. Thus, the calculated weights that are applied to the probabilistic updates teaches the claimed density estimator in the broadest reasonable interpretation.
Regarding claims 20 and 28, McMahan further teaches a method and processing device comprising:
A) wherein the plurality of remote processing devices comprises a smartphone (Paragraph 69).
The examiner notes that McMahan teaches “wherein the plurality of remote processing devices comprises a smartphone” as “The server 210 can exchange data with one or more client devices 230 over the network 242. Any number of client devices 230 can be connected to the server 210 over the network 242. Each of the client devices 230 can be any suitable type of computing device, such as a general purpose computer, special purpose computer, laptop, desktop, mobile device, navigation system, smartphone, tablet, wearable computing device, gaming console, a display with one or more processors, or other suitable computing device” (Paragraph 69). The examiner further notes that the client devices of McMahan can include smartphones.
Regarding claim 21, McMahan further teaches a method comprising:
A) wherein the plurality of remote processing devices comprise an internet of things device (Paragraph 69).
The examiner notes that McMahan teaches “wherein the plurality of remote processing devices comprise an internet of things device” as “The server 210 can exchange data with one or more client devices 230 over the network 242. Any number of client devices 230 can be connected to the server 210 over the network 242. Each of the client devices 230 can be any suitable type of computing device, such as a general purpose computer, special purpose computer, laptop, desktop, mobile device, navigation system, smartphone, tablet, wearable computing device, gaming console, a display with one or more processors, or other suitable computing device” (Paragraph 69). The examiner further notes that the client devices of McMahan can include wearable computing devices (i.e. IOT devices in the broadest reasonable interpretation).
Regarding claims 22 and 29, McMahan further teaches a method and processing device comprising:
A) wherein each respective machine learning model of the plurality of machine learning models is a neural network model (Paragraph 55).
The examiner notes that McMahan teaches “wherein each respective machine learning model of the plurality of machine learning models is a neural network model” as “FIG. 1 depicts an example system 100 for training one or more global machine learning models 106 using respective training data 108 stored locally on a plurality of client devices 102. System 100 can include a server device 104. Server 104 can be configured to access machine learning model 106, and to provide model 106 to a plurality of client devices 102. Model 106 can be, for instance, a linear regression model, logistic regression model, a support vector machine model, a neural network (e.g. convolutional neural network, recurrent neural network, etc.), or other suitable model” (Paragraph 55). The examiner further notes that the models distributed to the client devices of McMahan can be neural network models.
Regarding claims 23 and 30, McMahan further teaches a method and processing device comprising:
A) wherein each respective machine learning model of the plurality of machine learning models comprises a same network structure (Paragraphs 28, 55, and 83).
The examiner notes that McMahan teaches “wherein each respective machine learning model of the plurality of machine learning models comprises a same network structure” as “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data” (Paragraph 28), “FIG. 1 depicts an example system 100 for training one or more global machine learning models 106 using respective training data 108 stored locally on a plurality of client devices 102. System 100 can include a server device 104. Server 104 can be configured to access machine learning model 106, and to provide model 106 to a plurality of client devices 102. Model 106 can be, for instance, a linear regression model, logistic regression model, a support vector machine model, a neural network (e.g. convolutional neural network, recurrent neural network, etc.), or other suitable model” (Paragraph 55), and “At (310), method (300) can include providing the global model to each client device, and at (312), method (300) can include receiving the global model” (Paragraph 83). The examiner further notes that the initial distribution of an initial global model (which entails an initial set of parameters) from a server to multiple clients (i.e. remote processing devices) entails that each local model has the same network structure.
Claim Rejections - 35 USC § 103
10. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
11. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
12. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
13. Claims 1-6 and 9-13 are rejected under 35 U.S.C. 103 as being unpatentable over McMahan et al. (U.S. PGPUB 2019/0340534) as applied to claims 16, 19-24, and 27-30 above, and further in view of Wood et al. (U.S. PGPUB 2018/0285759).
14. Regarding claims 1 and 9, McMahan teaches a method and processing device comprising:
A) receiving, at a processing device, a set of global parameters for each machine learning model of a plurality of machine learning models (Paragraphs 22, 27, 28, and 83);
B) for each respective machine learning model of the plurality of machine learning models: processing, at the processing device, data stored locally on the processing device with the respective machine learning model according to the set of global parameters to generate a machine learning model output (Paragraphs 22, 27, 28, and 84);
D) performing, at the processing device, an update to the respective machine learning model based on the machine learning model output to generate locally updated machine learning model parameters (Paragraphs 28 and 84); and
E) sending the locally updated machine learning model parameters to a remote processing device (Paragraphs 28-29 and 88); and
F) receiving, at the processing device from the remote processing device, a set of globally updated machine learning model parameters for each machine learning model of the plurality of machine learning models (Paragraphs 22, 27, 28-29, and 88);
G) wherein the globally updated machine learning model parameters for each respective machine learning model are based at least in part on the locally updated machine learning model parameters and a corresponding density estimator (Paragraphs 22, 27, 28-30, 46, 88, and 90); and
H) wherein the density estimator comprises a probability parameterized by weighting parameters for the respective machine learning model (Paragraphs 22, 27, 28-30, 46, 88, and 90).
The examiner notes that McMahan teaches “receiving, at a processing device, a set of global parameters for each machine learning model of a plurality of machine learning models” as “systems implementing federated learning can perform the following actions in each of a plurality of rounds of model optimization: a subset of clients are selected; each client in the subset updates the model based on their local data; the updated models or model updates are sent by each client to the server” (Paragraph 22), “the model can include one or more neural networks (e.g., deep neural networks, recurrent neural networks, convolutional neural networks, etc.) or other machine-learned models” (Paragraph 27), “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data” (Paragraph 28) and “At (310), method (300) can include providing the global model to each client device, and at (312), method (300) can include receiving the global model” (Paragraph 83). The examiner further notes that the initial distribution of an initial global model (which entails an initial set of parameters) from a server to multiple clients (i.e. remote processing devices) teaches the claimed receiving. Moreover, a client can include one or more models (each of which can house one or more models) that the global parameters are applied to from the server. The examiner further notes that McMahan teaches “for each respective machine learning model of the plurality of machine learning models: processing, at the processing device, data stored locally on the processing device with the respective machine learning model according to the set of global parameters to generate a machine learning model output” as “systems implementing federated learning can perform the following actions in each of a plurality of rounds of model optimization: a subset of clients are selected; each client in the subset updates the model based on their local data; the updated models or model updates are sent by each client to the server” (Paragraph 22), “the model can include one or more neural networks (e.g., deep neural networks, recurrent neural networks, convolutional neural networks, etc.) or other machine-learned models” (Paragraph 27), “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data” (Paragraph 28), and “method (300) can include determining, by the client device, a local update. In a particular implementation, the local update can be determined by retraining or otherwise updating the global model based on the locally stored training data” (Paragraph 84). The examiner further notes that a client can include one or more models (each of which can house one or more models) that the global parameters are applied to from the server which are then updated via the use of local client data. The examiner further notes that McMahan teaches “performing, at the processing device, an update to the respective machine learning model based on the machine learning model output to generate locally updated machine learning model parameters” as “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data” (Paragraph 28) and “method (300) can include determining, by the client device, a local update. In a particular implementation, the local update can be determined by retraining or otherwise updating the global model based on the locally stored training data” (Paragraph 84). The examiner further notes that the determined local updates are based off of retraining (i.e. an example of the undefined claimed optimization in the broadest reasonable interpretation). The examiner further notes that McMahan teaches “sending the locally updated machine learning model parameters to a remote processing device” as “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data. The updated local models are W.sub.t.sup.1, W.sub.t.sup.2, . . . , W.sub.t.sup.n” (Paragraph 28), “Each client then sends the update back to the server, where the global update is computed by aggregating all the client-side updates” (Paragraph 29), and “At (318), method (300) can include receiving, by the server, the local update. In particular, the server can receive a plurality of local updates from a plurality of client devices” (Paragraph 88). The examiner further notes that the transmitted local updates to the server teaches the claimed sending. The examiner further notes that McMahan teaches “receiving, at the processing device from the remote processing device, a set of globally updated machine learning model parameters for each machine learning model of the plurality of machine learning models” as “systems implementing federated learning can perform the following actions in each of a plurality of rounds of model optimization: a subset of clients are selected; each client in the subset updates the model based on their local data; the updated models or model updates are sent by each client to the server” (Paragraph 22), “the model can include one or more neural networks (e.g., deep neural networks, recurrent neural networks, convolutional neural networks, etc.) or other machine-learned models” (Paragraph 27), “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data. The updated local models are W.sub.t.sup.1, W.sub.t.sup.2, . . . , W.sub.t.sup.n” (Paragraph 28), “Each client then sends the update back to the server, where the global update is computed by aggregating all the client-side updates” (Paragraph 29), and “At (318), method (300) can include receiving, by the server, the local update. In particular, the server can receive a plurality of local updates from a plurality of client devices” (Paragraph 88). The examiner further notes that the server obtaining subsequent transmitted client updates (i.e. updated set of model parameters) of its local model(s) (each of which can include one or more models) from each client teaches the claimed receiving. The examiner further notes that McMahan teaches “wherein the globally updated machine learning model parameters for each respective machine learning model are based at least in part on the locally updated machine learning model parameters and a corresponding density estimator” as “systems implementing federated learning can perform the following actions in each of a plurality of rounds of model optimization: a subset of clients are selected; each client in the subset updates the model based on their local data; the updated models or model updates are sent by each client to the server” (Paragraph 22), “the model can include one or more neural networks (e.g., deep neural networks, recurrent neural networks, convolutional neural networks, etc.) or other machine-learned models” (Paragraph 27), “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data. The updated local models are W.sub.t.sup.1, W.sub.t.sup.2, . . . , W.sub.t.sup.n” (Paragraph 28), “Each client then sends the update back to the server, where the global update is computed by aggregating all the client-side updates… in some implementations, a weighted sum might be used to replace the average based on desired performance” (Paragraphs 29-30), “Another way of encoding the updates is by quantizing the weights. For example, the weights can be probabilistically quantized” (Paragraph 46), “At (318), method (300) can include receiving, by the server, the local update. In particular, the server can receive a plurality of local updates from a plurality of client devices” (Paragraph 88), and “Any number of iterations of local and global updates can be performed. That is, method (300) can be performed iteratively to update the global model based on locally stored training data over time” (Paragraph 90). The examiner further notes that the server obtaining subsequent transmitted client updates (i.e. updated set of model parameters) of its local model(s) results in updated the global model for that iteration. Moreover, the instant specification merely mentions the claimed density estimator without defining what it constitutes (See Paragraph 78). Specifically, the claimed density estimator is labeled as a mathematical value (See p(x|ϕ.sub.k)) without defining what this mathematical value actually constitutes. Thus, the calculated weights that are applied to the probabilistic updates teaches the claimed density estimator in the broadest reasonable interpretation. The examiner further notes that McMahan teaches “wherein the density estimator comprises a probability parameterized by weighting parameters for the respective machine learning model” as “systems implementing federated learning can perform the following actions in each of a plurality of rounds of model optimization: a subset of clients are selected; each client in the subset updates the model based on their local data; the updated models or model updates are sent by each client to the server” (Paragraph 22), “the model can include one or more neural networks (e.g., deep neural networks, recurrent neural networks, convolutional neural networks, etc.) or other machine-learned models” (Paragraph 27), “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data. The updated local models are W.sub.t.sup.1, W.sub.t.sup.2, . . . , W.sub.t.sup.n” (Paragraph 28), “Each client then sends the update back to the server, where the global update is computed by aggregating all the client-side updates… in some implementations, a weighted sum might be used to replace the average based on desired performance” (Paragraphs 29-30), “Another way of encoding the updates is by quantizing the weights. For example, the weights can be probabilistically quantized” (Paragraph 46), “At (318), method (300) can include receiving, by the server, the local update. In particular, the server can receive a plurality of local updates from a plurality of client devices” (Paragraph 88), and “Any number of iterations of local and global updates can be performed. That is, method (300) can be performed iteratively to update the global model based on locally stored training data over time” (Paragraph 90). The examiner further notes that the server obtaining subsequent transmitted client updates (i.e. updated set of model parameters) of its local model(s) results in updated the global model for that iteration. Moreover, the instant specification merely mentions the claimed density estimator without defining what it constitutes (See Paragraph 78). Specifically, the claimed density estimator is labeled as a mathematical value (See p(x|ϕ.sub.k)) without defining what this mathematical value actually constitutes. Thus, the calculated weights that are applied to the probabilistic updates teaches the claimed density estimator in the broadest reasonable interpretation.
McMahan does not explicitly teach:
C) receiving, at the processing device, user feedback associated with the machine learning model output;
D) performing, at the processing device, an update to the respective machine learning model based on the machine learning model output and the user feedback associated with the machine learning model output.
Wood, however, teaches “receiving, at the processing device, user feedback associated with the machine learning model output” as “statistical model 108 may be trained and/or adapted to new data received on the trainers. For example, the trainers may execute on electronic devices (e.g., personal computers, laptop computers, mobile phones, tablet computers, portable media players, digital cameras, etc.) that produce updates 114-116 to statistical model 108 based on user feedback from users of the electronic devices” (Paragraph 18), “the statistical model may have multiple local versions 202 and one or more global versions 204. Individual local versions 202 may be personalized to specific users, recommendations, job listings, advertisements, content items, and/or other types of entities 218. Output 212 from each local version may be displayed and/or otherwise presented to one or more users, and user feedback 206 and/or other input data related to output 212 may be collected and/or tracked” (Paragraph 37), and “User feedback 206 related to output 212 may additionally be collected during the user session as clicks, views, searches, likes, dislikes, comments, shares, applications to job listings, and/or other interaction with the online professional network. Each piece of user feedback 206 may be included in training data that is applied to parameters 224 of the local version to generate an update (e.g., updates 222) to the local version. Consequently, the output of the local version may be adapted to the user's real-time behavior or preferences during the user session” (Paragraph 39) and “performing, at the processing device, an update to the respective machine learning model based on the machine learning model output and the user feedback associated with the machine learning model output” as “statistical model 108 may be trained and/or adapted to new data received on the trainers. For example, the trainers may execute on electronic devices (e.g., personal computers, laptop computers, mobile phones, tablet computers, portable media players, digital cameras, etc.) that produce updates 114-116 to statistical model 108 based on user feedback from users of the electronic devices” (Paragraph 18), “the statistical model may have multiple local versions 202 and one or more global versions 204. Individual local versions 202 may be personalized to specific users, recommendations, job listings, advertisements, content items, and/or other types of entities 218. Output 212 from each local version may be displayed and/or otherwise presented to one or more users, and user feedback 206 and/or other input data related to output 212 may be collected and/or tracked” (Paragraph 37), and “User feedback 206 related to output 212 may additionally be collected during the user session as clicks, views, searches, likes, dislikes, comments, shares, applications to job listings, and/or other interaction with the online professional network. Each piece of user feedback 206 may be included in training data that is applied to parameters 224 of the local version to generate an update (e.g., updates 222) to the local version. Consequently, the output of the local version may be adapted to the user's real-time behavior or preferences during the user session” (Paragraph 39).
The examiner further notes that the secondary reference of Wood teaches the concept of the use of user feedback (which is based on the output of a local model) as a basis for training (i.e. an example of the claimed undefined optimizing in the broadest reasonable interpretation) the local model. The combination would result in using user feedback as a basis to generate the local updates to the local models of McMahan.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Wood’s would have allowed McMahan’s to provide a method for tuning based off of a pre-specified amount of user feedback, as noted by Wood (Paragraph 32).
Regarding claims 2 and 10, McMahan further teaches a method and processing device comprising:
A) performing at the processing device, a number of updates before sending the locally updated machine learning model parameters to the remote processing device (Paragraphs 41, 42, 49, and 90)
The examiner notes that McMahan teaches “performing at the processing device, a number of updates before sending the locally updated machine learning model parameters to the remote processing device” as “A second type of communication efficient update provided by the present disclosure is a sketched update in which the client encodes the update H.sub.t.sup.i in a compressed form prior to sending to the server. The client device can compute the full update H.sub.t.sup.i and then encode the update or can compute the update H.sub.t.sup.i according to a structured technique and then encode such structured update” (Paragraph 41), “Many different types of encoding or compression are envisioned by the present disclosure. For example, the compression can be lossless compression or lossy compression. Two example encoding techniques are described in further detail below: a subsampling technique and a quantization technique” (Paragraph 42), “the above can be generalized to more than 1 bit for each scalar. For example, for b-bit quantization, [h.sub.min, h.sub.max] can be equally divided into 2.sup.b intervals. Suppose h.sub.i falls in the interval bounded by h′ and h″. The quantization can operate by replacing h.sub.min and h.sub.max in the above equation with h′ and h″, respectively” (Paragraph 49), and “Any number of iterations of local and global updates can be performed” (Paragraph 90). The examiner further notes that an initial update at each client (at any iteration) and a subsequent encoding of that local update (at any iteration) at each client are examples of a number of updates being performed before the sending to the remote processing device.
Regarding claims 3 and 11, McMahan further teaches a method and processing device comprising:
A) wherein the globally updated machine learning model parameters for each respective machine learning model of the plurality of machine learning models are based at least in part on locally updated machine learning model parameters of a second processing device (Paragraphs 29 and 88)
The examiner notes that McMahan teaches “wherein the globally updated machine learning model parameters for each respective machine learning model of the plurality of machine learning models are based at least in part on locally updated machine learning model parameters of a second processing device” as “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data. The updated local models are W.sub.t.sup.1, W.sub.t.sup.2, . . . , W.sub.t.sup.n” (Paragraph 28), “Each client then sends the update back to the server, where the global update is computed by aggregating all the client-side updates” (Paragraph 29), and “At (318), method (300) can include receiving, by the server, the local update. In particular, the server can receive a plurality of local updates from a plurality of client devices” (Paragraph 88). The examiner further notes that the server obtaining subsequent transmitted client updates (i.e. locally updated parameters) of its local model from each client entails a second processing device sending its local updates (i.e. multiple clients includes at least a first processing device and a second processing device).
Regarding claims 4 and 12, McMahan does not explicitly teach a method and processing device comprising:
A) wherein the user feedback comprises an indication of a correctness of the machine learning model output.
Wood, however, teaches “wherein the user feedback comprises an indication of a correctness of the machine learning model output” as “statistical model 108 may be trained and/or adapted to new data received on the trainers. For example, the trainers may execute on electronic devices (e.g., personal computers, laptop computers, mobile phones, tablet computers, portable media players, digital cameras, etc.) that produce updates 114-116 to statistical model 108 based on user feedback from users of the electronic devices” (Paragraph 18), “the statistical model may have multiple local versions 202 and one or more global versions 204. Individual local versions 202 may be personalized to specific users, recommendations, job listings, advertisements, content items, and/or other types of entities 218. Output 212 from each local version may be displayed and/or otherwise presented to one or more users, and user feedback 206 and/or other input data related to output 212 may be collected and/or tracked” (Paragraph 37), and “User feedback 206 related to output 212 may additionally be collected during the user session as clicks, views, searches, likes, dislikes, comments, shares, applications to job listings, and/or other interaction with the online professional network. Each piece of user feedback 206 may be included in training data that is applied to parameters 224 of the local version to generate an update (e.g., updates 222) to the local version. Consequently, the output of the local version may be adapted to the user's real-time behavior or preferences during the user session” (Paragraph 39).
The examiner further notes that the secondary reference of Wood teaches the concept of the use of user feedback (which is based on the output of a local model). Such user feedback data includes “likes” (i.e. the claimed indication of correctness in the broadest reasonable interpretation). The combination would result in using user feedback as a basis to generate the local updates to the local models of McMahan.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Wood’s would have allowed McMahan’s to provide a method for tuning based off of a pre-specified amount of user feedback, as noted by Wood (Paragraph 32).
Regarding claim 5, McMahan further teaches a method comprising:
A) wherein the data stored locally on the processing device is one of: image data, audio data, or video data (Paragraph 56).
The examiner notes that McMahan teaches “wherein the data stored locally on the processing device is one of: image data, audio data, or video data” as “Client devices 102 can each be configured to determine one or more local updates associated with model 106 based at least in part on training data 108. For instance, training data 108 can be data that is respectively stored locally on the client devices 106. The training data 108 can include audio files, image files, video files, a typing history, location history, and/or various other suitable data” (Paragraph 56). The examiner further notes that the client local data includes audio, image, and video data.
Regarding claims 6 and 13, McMahan further teaches a method and processing device comprising:
A) wherein the processing device is one of a smartphone or an internet of things device (Paragraph 69).
The examiner notes that McMahan teaches “wherein the processing device is one of a smartphone or an internet of things device” as “The server 210 can exchange data with one or more client devices 230 over the network 242. Any number of client devices 230 can be connected to the server 210 over the network 242. Each of the client devices 230 can be any suitable type of computing device, such as a general purpose computer, special purpose computer, laptop, desktop, mobile device, navigation system, smartphone, tablet, wearable computing device, gaming console, a display with one or more processors, or other suitable computing device” (Paragraph 69). The examiner further notes that the client devices of McMahan can include smartphones.
10. Claims 7-8 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over McMahan et al. (U.S. PGPUB 2019/0340534) as applied to claims 16, 19-24, and 27-30 above, and further in view of Wood et al. (U.S. PGPUB 2018/0285759) as applied to claims 1-6 and 9-13 above, and further in view of Feng et al. (Article entitled “Joint Service Pricing and Cooperative Relay Communication for Federated Learning”, dated 29 November 2018).
11. Regarding claims 7 and 14, McMahan further teaches a method and processing device comprising:
A) wherein processing, at the processing device, the data stored locally on the processing device with the respective machine learning model is performed at least in part by one or more processing units (Paragraphs 5, 22, 27, 28, and 84).
The examiner notes that McMahan teaches “wherein processing, at the processing device, the data stored locally on the processing device with the respective machine learning model is performed at least in part by one or more processing units” as “The client device includes at least one processor and at least one non-transitory computer-readable medium that stores instructions that, when executed by the at least one processor, cause the client computing device to perform operations” (Paragraph 5), “systems implementing federated learning can perform the following actions in each of a plurality of rounds of model optimization: a subset of clients are selected; each client in the subset updates the model based on their local data; the updated models or model updates are sent by each client to the server” (Paragraph 22), “the model can include one or more neural networks (e.g., deep neural networks, recurrent neural networks, convolutional neural networks, etc.) or other machine-learned models” (Paragraph 27), “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data” (Paragraph 28), and “method (300) can include determining, by the client device, a local update. In a particular implementation, the local update can be determined by retraining or otherwise updating the global model based on the locally stored training data” (Paragraph 84). The examiner further notes that a client (which includes one or more processors) can include one or more models (each of which can house one or more models) that the global parameters are applied to from the server which are then updated via the use of local client data.
McMahan and Wood do not explicitly teach:
A) one or more neural processing units.
Feng, however, teaches “one or more neural processing units” as “For the sake of protecting data privacy and due to the rapid development of mobile devices, e.g., powerful central processing unit (CPU) and nascent neural processing unit (NPU), collaborative machine learning on mobile devices, e.g., federated learning, has been envisioned as a new AI approach with broad application prospects” (Abstract).
The examiner further notes that although McMahan and Wood clearly teach client devices with processors for federated learning, there is no explicit teaching that such clients include a neural processor. Nevertheless, the secondary reference of Feng teaches that clients in a federated learning environment can include neural processors. The combination would result in the clients of McMahan and Wood to also include neural processors.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Feng’s would have allowed McMahan’s and Wood’s to provide a method for handling massive data in a secure manner, as noted by Feng (Section 1).
Regarding claims 8 and 15, McMahan further teaches a method and processing device comprising:
A) wherein performing, at the processing device, the update of the respective machine learning model is performed at least in part by one or more processing units (Paragraphs 5, 28, and 84).
The examiner notes that McMahan teaches “wherein performing, at the processing device, the update of the respective machine learning model is performed at least in part by one or more processing units” as “The client device includes at least one processor and at least one non-transitory computer-readable medium that stores instructions that, when executed by the at least one processor, cause the client computing device to perform operations” (Paragraph 5), “In round t≥0, the server distributes the current model W.sub.t to a subset S.sub.t of n.sub.t clients (for example, to a selected subset of clients whose devices are plugged into power, have access to broadband, and are idle). Some or all of these clients independently update the model based on their local data” (Paragraph 28) and “method (300) can include determining, by the client device, a local update. In a particular implementation, the local update can be determined by retraining or otherwise updating the global model based on the locally stored training data” (Paragraph 84). The examiner further notes that the determined local updates at the clients (which have one or more processors) are based off of retraining (i.e. an example of the undefined claimed optimization in the broadest reasonable interpretation).
McMahan and Wood do not explicitly teach:
A) one or more neural processing units.
Feng, however, teaches “one or more neural processing units” as “For the sake of protecting data privacy and due to the rapid development of mobile devices, e.g., powerful central processing unit (CPU) and nascent neural processing unit (NPU), collaborative machine learning on mobile devices, e.g., federated learning, has been envisioned as a new AI approach with broad application prospects” (Abstract).
The examiner further notes that although McMahan and Wood clearly teach client devices with processors for federated learning, there is no explicit teaching that such clients include a neural processor. Nevertheless, the secondary reference of Feng teaches that clients in a federated learning environment can include neural processors. The combination would result in the clients of McMahan and Wood to also include neural processors.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Feng’s would have allowed McMahan’s and Wood’s to provide a method for handling massive data in a secure manner, as noted by Feng (Section 1).
12. Claims 17 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over McMahan et al. (U.S. PGPUB 2019/0340534) as applied to claims 16, 19-24, and 27-30 above, and further in view of McMahan et al. ((U.S. PGPUB 2017/0109322) (herein referred to as “Konecny”)).
13. Regarding claims 17 and 25, McMahan does not explicitly teach a method and processing device comprising:
A) wherein updating, at the server, the respective machine learning model based on the locally updated set of model parameters comprises computing an effective gradient for each model parameter of the initial set of global model parameters for the respective machine learning model.
Konecny, however, teaches “wherein updating, at the server, the respective machine learning model based on the locally updated set of model parameters comprises computing an effective gradient for each model parameter of the initial set of global model parameters for the respective machine learning model” as “stochastic gradient descent techniques can be naively applied to the optimization problem, wherein one or more “minibatch” gradient calculations (e.g. using one or more randomly selected use devices) are performed per round of communication. For instance, the minibatch can include at least a subset of the training data stored locally on the user devices. In such implementations, one or more user devices can be configured to determine the average gradient associated with the local training data respectively stored on the user devices for a current version of a model. The user devices can be configured to provide the determined gradients to the server, as part of the local updates. The server can then aggregate the gradients to determine a global model update” (Paragraph 29).
The examiner further notes that the secondary reference of Konecny teaches the concept computing a gradient of a set of parameters for optimization at a server. The combination would result in performing a gradient computation for the optimization of McMahan.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Konecny’s would have allowed McMahan’s to provide a method for solving optimization issues in distributed systems, as noted by Konecny (Paragraph 5).
Response to Arguments
14. Applicant's arguments filed on 02/05/2026 have been fully considered but they are not persuasive.
Applicants argue on Page 10 that “McMahan describes "a weighted sum" of client-side updates, "probabilistically quantiz[ing]" weights, and "averaging aggravation" of weights. McMahan at paras. [0030], [0046], and [0089]. However, as discussed with the Examiner during the interview, McMahan is silent with respect to a density estimator, and especially to a density estimator comprising a probability parameterized by weighting parameters for the respective machine learning model”. However, the mathematical expression of the claimed density estimator in the instant specification is labeled as p(x|ϕ.sub.k) (See Paragraph 78) without defining what this mathematical value actually constitutes. Moreover, the mathematical “probability” expression in Paragraph 47 is labeled as p(Y,Z|X,w) and is completely different from the claimed density expression (See also how the expression in Paragraph 47 has additional variables of Y & w). Simply put, the instant specification merely states a mathematical expression of the claimed density estimator as p(x|ϕ.sub.k) without ever explaining what this expression actually constitutes.
The examiner further wishes to refer to McMahan which states “Each client then sends the update back to the server, where the global update is computed by aggregating all the client-side updates… in some implementations, a weighted sum might be used to replace the average based on desired performance” (Paragraphs 29-30), “Another way of encoding the updates is by quantizing the weights. For example, the weights can be probabilistically quantized” (Paragraph 46), and “At (320), method (300) can include again determining the global model. In particular, the global model can be determined based at least in part on the received local update(s). For instance, the received local updates can be aggregated to determine the global model. The aggregation can be an additive aggregation and/or an averaging aggregation. In particular implementations, the aggregation of the local updates can be proportional to the partition sizes of the data examples on the client devices” (Paragraph 89). The examiner further notes that the aggregation of the received weighted client updates (which entails locally updated model parameters) results in a generation of an updated global model. Moreover, the instant specification merely mentions the claimed density estimator without defining what it constitutes (See Paragraph 78). Specifically, as explained above, the claimed density estimator is labeled as a mathematical value (See p(x|ϕ.sub.k)) without defining what this mathematical value actually constitutes. Thus, the calculated weights that are applied to the probabilistic updates teaches the claimed density estimator in the broadest reasonable interpretation
Conclusion
15. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. PGPUB 2020/0285980 issued to Sharad et al. on 10 September 2020. The subject matter disclosed therein is pertinent to that of claims 1-17, 19-25, and 27-30 (e.g., methods to perform federated training).
U.S. PGPUB 2021/0287080 issued to Moloney et al. on 16 September 2021. The subject matter disclosed therein is pertinent to that of claims 1-17, 19-25, and 27-30 (e.g., methods to perform federated training).
Contact Information
16. Any inquiry concerning this communication or earlier communications from the examiner should be directed to Mahesh Dwivedi whose telephone number is (571) 272-2731. The examiner can normally be reached on Monday to Friday 8:20 am – 4:40 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached (571) 272-4085. The fax number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see 20. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Mahesh Dwivedi
Primary Examiner
Art Unit 2168
April 22, 2026
/MAHESH H DWIVEDI/Primary Examiner, Art Unit 2168