Last updated: May 29, 2026
Application No. 18/306,513
TRAINING METHOD AND APPARATUS

Non-Final OA §103§112
Filed
Apr 25, 2023
Priority
May 12, 2022 — EU 22173100.3
Examiner
JAYAKUMAR, CHAITANYA R
Art Unit
2128
Tech Center
2100 — Computer Architecture & Software
Assignee
Nokia Technologies Oy
OA Round
1 (Non-Final)
This examiner grants 24% of cases after interview

— +20.5% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 53 resolved cases, 2023–2026
Examiner Intelligence

JAYAKUMAR, CHAITANYA R View full profile →
Grants only 24% of cases
Career Allowance Rate
13 granted / 53 resolved
-30.5% vs TC avg
Strong +20% interview lift
Without
With
+20.5%
Interview Lift
resolved cases with interview
Typical timeline
5y 3m
Avg Prosecution
9 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
7.0%
-33.0% vs TC avg
§103
90.8%
+50.8% vs TC avg
§102
1.8%
-38.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 53 resolved cases
Office Action

§103 §112
DETAILED ACTION

This action is in response to the submission filed 25 April 2023 for application 18/306,513. Currently claims 1-13 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
An information disclosure statement (IDS) was submitted on 24 May 2023. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. 

	Drawings
The drawings are objected to because Figures 9 and 10 (especially 10 a and 10 b) are not clear enough to read.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.

The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 5, 6, and 13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 5 recites the limitation "… the respective sample …" in line 3.  There is insufficient antecedent basis for this limitation in the claim. Claim 6 depends on claim 5 and therefore inherits the same rejection.
Claim 6 recites the limitation "… the other of …" in line 2.  There is insufficient antecedent basis for this limitation in the claim.
Claim 13 recites the limitation "… the received set of the generated local centroids …" in Page 3 (last but 3rd line).  There is insufficient antecedent basis for this limitation in the claim.
Claim 13 recites the limitation "… the received one or more global clusters …" in Page 4 (line 7).  There is insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 4-9, and 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Rieger et al (Client Adaptation improves Federated Learning with Simulated Non-IID Clients, 2020) in view of Brandao et al (Efficient Privacy Preserving Distributed K-Means for Non-IID Data, 2021).
Regarding claim 1:
Rieger teaches: An apparatus, comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to ([Page 1, Column 2, Last Paragraph] The current paper presents two contributions. Firstly, we propose the conditional gated activation unit (CGAU), an enhancement for current neural network architectures suited for federated learning that captures features in client dependent non-IID data. Note: This shows that it uses at least one processor and memory):
obtain local data comprising one or more samples at a user device ([Page 1, Column 2, Paragraph 1] An example of such highly sensitive data is audio collected by personal mobile devices. Note: Mobile device corresponds to user device. Audio corresponds to sample);
compute representations of at least some of said samples by passing said one or more samples through a local feature extractor, wherein the local feature extractor is implemented by a neural network having trainable parameters ([Page 1, Column 2, Last Paragraph] The current paper presents two contributions. Firstly, we propose the conditional gated activation unit (CGAU), an enhancement for current neural network architectures suited for federated learning that captures features in client dependent non-IID data. [Page 4, Column 2, Section 4, Paragraph 1] We use two data sets to evaluate our proposed scheme, one based on audio data and one on image data. We chose the two data sets to cover imbalanced and balanced data as well as binary and multi-class tasks. In both settings we use a pre-trained network that was not trained on the data set at hand. The pre-trained networks are used as feature extractors to provide the features for a classifier);
cluster the computed representations, by using a clustering algorithm, to generate one or more local centroids ([Page 4, Column 1, Paragraph 2] In order to simulate such non-IID client feature distributions, we investigate simulated clients where the samples of a particular class are clustered based on the feature embedding from a pre-trained network. [Page 4, Column 1, Paragraph 5] For each class, we then cluster the training data using a K-means clustering, thereby obtaining a set of C times K cluster centroids);
provide at least some of said generated local centroids and at least some of said parameters of the local feature extractor to a server ([Page 2, Section 2, Paragraph 2] the client sends the local model to the server. [Page 3, Column 1, Paragraph 1] as the feature extractor network can be trained centrally using any available but task-relevant data. [Page 4, Column 1, Paragraph 5] Each of the K clients are then assigned a set of centroids);
receive global feature extractor parameters from said server, wherein the global feature extractor parameters are generated by combining multiple local feature extractor parameters of said one or more user devices ([Page 2, Column 2, Last Paragraph] We investigate the use of pre-trained networks for classifications tasks in a federated learning scheme, where a pretrained network is used as a “frozen” feature extractor (that is, the pre-trained network is not further trained using federated learning). By using a pre-trained network, we off-load [Page 3, Column 1, Paragraph 1] the needed computational power and data required from the federated learning process as the feature extractor network can be trained centrally using any available but task-relevant data, or taken from already trained networks from a relevant domain. Additionally, the communication costs are reduced, as the feature extractor does not need to be sent back and forth between rounds of federated learning. The federated learning then only has to learn to solve the problem of interest by learning a (much smaller) classifier on top of the embedding that the pre-trained network produces);
update the parameters of the local feature extractor based on the received global feature extractor parameters ([Page 2, Section 2, Paragraph 2] The number of weight updates before global consolidation E is a hyperparameter. After E weight updates on the local model);
assign selected samples of one or more samples and one or more augmentations of said selected samples to global clusters ([Page 4, Column 1, Paragraph 2] In order to simulate such non-IID client feature distributions, we investigate simulated clients where the samples of a particular class are clustered based on the feature embedding from a pre-trained network. [Page 10, Column 1, Paragraph 1] The augmentation to the XOR-problem is that we consider the clusters to be from two different clients: client one (“up client”) has only positive x2, and client two has only negative x2 (“down client”));
and further update the updated parameters of the local feature extractor such that a cross-entropy between cluster assignments of the selected samples and the augmentation of said selected samples is minimised, thereby generating a trained local feature extractor ([Page 5, Column 2, Paragraph 1] While training, we monitor the loss of a held-out sub-partition of 5 % the training set (a validation partition) by centrally collecting the outcomes on the validation data points at each client. The best model (model weights) with the lowest cross-entropy loss on the validation partition across clients after a total of a 1000 rounds of federated learning is then used in the final evaluation on the test set).
However, Rieger does not explicitly disclose: receive one or more global centroids from said server, wherein said one or more global centroids are generated by clustering multiple local centroids of one or more user devices;
Brandao teaches, in an analogous system: receive one or more global centroids from said server, wherein said one or more global centroids are generated by clustering multiple local centroids of one or more user devices ([Page 2, Paragraph 3] To reduce the data that is shared with the server and for robustness against non-IID data, clients compute the K-means locally, with a variable number of clusters, and only the centroids are sent to the server. To preserve privacy, the centroids are encrypted homomorphically, which still allows the server to compute the distance from the local centroids to the global centroids, over encrypted data. The distances are then sent to the clients who, after decryption, assign each local centroid to a global centroid).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Rieger to incorporate the teachings of Brandao to receive one or more global centroids from said server, wherein said one or more global centroids are generated by clustering multiple local centroids of one or more user devices. One would have been motivated to do this modification because doing so would give the benefit of preserving privacy as taught by Brandao [Page 2, Paragraph 3].

Regarding claim 4:
The system of Rieger and Brandao teaches: An apparatus as claimed in claim 1 (as shown above).
Rieger further teaches: wherein the providing of the at least some of said generated local centroids and the at least some of said parameters of the local feature extractor to the server is further caused to provide parameters of the trained local feature extractor and local centroids generated by clustering representations of said samples computed by passing said one or more samples through the trained local feature extractor to the server ([Page 4, Column 2, Section 4, Paragraph 1] In both settings we use a pre-trained network that was not trained on the data set at hand. The pre-trained networks are used as feature extractors to provide the features for a classifier. [Page 4, Column 2, Last Paragraph] In this case, we consider the simulated clients to be maximally non-IID (all samples that are the closest to a particular centroid are collected).

Regarding claim 5:
The system of Rieger and Brandao teaches: An apparatus as claimed in claim 1 (as shown above).
Rieger further teaches: wherein the trained local feature extractor used to compute representations of a sample of said local data or an augmentation of the respective sample is updated using stochastic gradient descent ([Page 5, Column 2, Paragraph 1] Each client completed 10 steps of gradient descent with batch sizes of 32 samples, or until all client data had been seen once—each client had a variable size of data set, seeing as samples are assigned based on proximity to the cluster centroids. The clients utilized a stochastic gradient descent optimizer2).

Regarding claim 6:
The system of Rieger and Brandao teaches: An apparatus as claimed in claim 5 (as shown above).
Rieger further teaches: wherein the trained local feature extractor used to compute representations of the other of said sample or said augmentation is generated using an averaging process ([Page 5, Column 2, Paragraph 2] We determine the average Fr´echet distances from any given client’s features to all others (as described in Section 3.4). The results averaged across replicates of the shuffling proportion are shown in Table 1).


Regarding claim 7:
The system of Rieger and Brandao teaches: An apparatus as claimed in claim 1 (as shown above).
Rieger further teaches: wherein the augmentation comprises one or more of: transformations of image data; and transformations of audio data ([Page 1, Column 2, Paragraph 3] two classification tasks, one from the audio domain and one from the image domain. [Page 4, Section 4, Paragraph 1] We use two data sets to evaluate our proposed scheme, one based on audio data and one on image data. We chose the two data sets to cover imbalanced and balanced data as well as binary and multi-class tasks).

Regarding claim 8:
The system of Rieger and Brandao teaches: An apparatus as claimed in claim 1 (as shown above).
Rieger further teaches: wherein said local data comprises unlabelled data from one or more devices ([Page 1, Section 1, Paragraph 2] Clustering algorithms belong to the unsupervised learning class, i.e., they can learn from unlabeled data).

Regarding claim 9:
The system of Rieger and Brandao teaches: An apparatus as claimed in claim 1 (as shown above).
Rieger further teaches: wherein the at least one memory storing instructions that, when executed by the at least one processor, further cause the apparatus at least to initialise the parameters of said local feature extractor ([Page 2, Column 2, Last Paragraph] We investigate the use of pre-trained networks for classifications tasks in a federated learning scheme, where a pretrained network is used as a “frozen” feature extractor (that is, the pre-trained network is not further trained using federated learning). Note: Pre-trained network is used as a frozen feature extractor corresponds to initialising the parameters of said local feature extractor).

Regarding claim 11:
Rieger teaches: An apparatus, comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to ([Page 1, Column 2, Last Paragraph] The current paper presents two contributions. Firstly, we propose the conditional gated activation unit (CGAU), an enhancement for current neural network architectures suited for federated learning that captures features in client dependent non-IID data. Note: This shows that it uses at least one processor and memory):
receive trained local parameters of local neural network feature extractors from a plurality of user devices ([Page 1, Column 2, Last Paragraph] The current paper presents two contributions. Firstly, we propose the conditional gated activation unit (CGAU), an enhancement for current neural network architectures suited for federated learning that captures features in client dependent non-IID data. [Page 4, Column 2, Section 4, Paragraph 1] In both settings we use a pre-trained network that was not trained on the data set at hand. The pre-trained networks are used as feature extractors to provide the features for a classifier);
receive a set of generated local centroids from the plurality of user devices, wherein the set of the local centroids describe clustering representations of local data at the respective user devices ([Page 4, Column 1, Paragraph 2] In order to simulate such non-IID client feature distributions, we investigate simulated clients where the samples of a particular class are clustered based on the feature embedding from a pre-trained network. [Page 4, Column 1, Paragraph 5] For each class, we then cluster the training data using a K-means clustering, thereby obtaining a set of C times K cluster centroids);
generate global feature extractor parameters by combining local feature extractor parameters from some or all of said user devices ([Page 2, Column 2, Last Paragraph] We investigate the use of pre-trained networks for classifications tasks in a federated learning scheme, where a pretrained network is used as a “frozen” feature extractor (that is, the pre-trained network is not further trained using federated learning). By using a pre-trained network, we off-load [Page 3, Column 1, Paragraph 1] the needed computational power and data required from the federated learning process as the feature extractor network can be trained centrally using any available but task-relevant data, or taken from already trained networks from a relevant domain. Additionally, the communication costs are reduced, as the feature extractor does not need to be sent back and forth between rounds of federated learning. The federated learning then only has to learn to solve the problem of interest by learning a (much smaller) classifier on top of the embedding that the pre-trained network produces);
and provide the generated global feature extractor parameters ... to said plurality of user devices, for use in training respective local neural network feature extractors ([Page 1, Column 2, Paragraph 3] Motivated by this, we propose to learn a local embedding for each client along with the global model as shown in Fig. 1. [Page 2, Column 2, Last Paragraph] We investigate the use of pre-trained networks for classifications tasks in a federated learning scheme, where a pretrained network is used as a “frozen” feature extractor (that is, the pre-trained network is not further trained using federated learning). By using a pre-trained network, we off-load [Page 3, Column 1, Paragraph 1] the needed computational power and data required from the federated learning process as the feature extractor network can be trained centrally using any available but task-relevant data, or taken from already trained networks from a relevant domain. Additionally, the communication costs are reduced, as the feature extractor does not need to be sent back and forth between rounds of federated learning. The federated learning then only has to learn to solve the problem of interest by learning a (much smaller) classifier on top of the embedding that the pre-trained network produces).
However, Rieger does not explicitly disclose: generate global centroids by clustering the received set of the generated local centroids using a clustering algorithm; and the generated global centroids.
Brandao teaches, in an analogous system: generate global centroids by clustering the received set of the generated local centroids using a clustering algorithm ([Page 2, Paragraph 3] To reduce the data that is shared with the server and for robustness against non-IID data, clients compute the K-means locally, with a variable number of clusters, and only the centroids are sent to the server. To preserve privacy, the centroids are encrypted homomorphically, which still allows the server to compute the distance from the local centroids to the global centroids, over encrypted data. The distances are then sent to the clients who, after decryption, assign each local centroid to a global centroid);
and the generated global centroids ([Page 2, Paragraph 3]  the global centroids).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Rieger to incorporate the teachings of Brandao to generate global centroids by clustering the received set of the generated local centroids using a clustering algorithm; and the generated global centroids. One would have been motivated to do this modification because doing so would give the benefit of preserving privacy as taught by Brandao [Page 2, Paragraph 3].

Regarding claim 12:
The system of Rieger and Brandao teaches: An apparatus as claimed in claim 11 (as shown above).
Rieger further teaches: wherein said global feature extractor parameters are an average of the received local parameters ([Page 2, Section 2, Paragraph 2] After E weight updates on the local model, the client sends the local model to the server. The server averages the model weights from all clients into the global model).

Regarding claim 13:
Rieger teaches: A system comprising: a user device, comprising, at least one processor, at least one memory storing instructions that, when executed by the at least one processor, cause the user device at least to ([Page 1, Column 2, Last Paragraph] The current paper presents two contributions. Firstly, we propose the conditional gated activation unit (CGAU), an enhancement for current neural network architectures suited for federated learning that captures features in client dependent non-IID data. Note: This shows that it uses at least one processor and memory):
obtain local data comprising one or more samples at the user device ([Page 1, Column 2, Paragraph 1] An example of such highly sensitive data is audio collected by personal mobile devices. Note: Mobile device corresponds to user device. Audio corresponds to sample. Sensitive data corresponds to local data);
compute representations of at least some of said samples by passing said one or more samples through a local feature extractor, wherein the local feature extractor is implemented by a neural network having trainable parameters ([Page 1, Column 2, Last Paragraph] The current paper presents two contributions. Firstly, we propose the conditional gated activation unit (CGAU), an enhancement for current neural network architectures suited for federated learning that captures features in client dependent non-IID data. [Page 4, Column 2, Section 4, Paragraph 1] We use two data sets to evaluate our proposed scheme, one based on audio data and one on image data. We chose the two data sets to cover imbalanced and balanced data as well as binary and multi-class tasks. In both settings we use a pre-trained network that was not trained on the data set at hand. The pre-trained networks are used as feature extractors to provide the features for a classifier);
cluster the computed representations, by using a clustering algorithm, to generate one or more local centroids ([Page 4, Column 1, Paragraph 2] In order to simulate such non-IID client feature distributions, we investigate simulated clients where the samples of a particular class are clustered based on the feature embedding from a pre-trained network. [Page 4, Column 1, Paragraph 5] For each class, we then cluster the training data using a K-means clustering, thereby obtaining a set of C times K cluster centroids);
provide at least some of said generated local centroids and at least some of said parameters of the local feature extractor to a second apparatus ([Page 2, Section 2, Paragraph 2] the client sends the local model to the server. [Page 3, Column 1, Paragraph 1] as the feature extractor network can be trained centrally using any available but task-relevant data. [Page 4, Column 1, Paragraph 5] Each of the K clients are then assigned a set of centroids).
the second apparatus, comprising, at least one processor, at least one memory storing instructions that, when executed by the at least one processor, cause the user device at least to ([Page 1, Column 2, Last Paragraph] The current paper presents two contributions. Firstly, we propose the conditional gated activation unit (CGAU), an enhancement for current neural network architectures suited for federated learning that captures features in client dependent non-IID data. Note: This shows that it uses at least one processor and memory):
generate global feature extractor parameters by combining local feature extractor parameters from the user device and from some other user devices ([Page 2, Column 2, Last Paragraph] We investigate the use of pre-trained networks for classifications tasks in a federated learning scheme, where a pretrained network is used as a “frozen” feature extractor (that is, the pre-trained network is not further trained using federated learning). By using a pre-trained network, we off-load [Page 3, Column 1, Paragraph 1] the needed computational power and data required from the federated learning process as the feature extractor network can be trained centrally using any available but task-relevant data, or taken from already trained networks from a relevant domain. Additionally, the communication costs are reduced, as the feature extractor does not need to be sent back and forth between rounds of federated learning. The federated learning then only has to learn to solve the problem of interest by learning a (much smaller) classifier on top of the embedding that the pre-trained network produces);
and provide the global feature extractor parameters to the user device ([Page 2, Column 2, Last Paragraph] We investigate the use of pre-trained networks for classifications tasks in a federated learning scheme, where a pretrained network is used as a “frozen” feature extractor (that is, the pre-trained network is not further trained using federated learning). By using a pre-trained network, we off-load [Page 3, Column 1, Paragraph 1] the needed computational power and data required from the federated learning process as the feature extractor network can be trained centrally using any available but task-relevant data, or taken from already trained networks from a relevant domain. Additionally, the communication costs are reduced, as the feature extractor does not need to be sent back and forth between rounds of federated learning. The federated learning then only has to learn to solve the problem of interest by learning a (much smaller) classifier on top of the embedding that the pre-trained network produces);
the user device, further caused to update the parameters of the local feature extractor based on the received global feature extractor parameters ([Page 2, Section 2, Paragraph 2] The number of weight updates before global consolidation E is a hyperparameter. After E weight updates on the local model);
assign selected samples of one or more samples and one or more augmentations of said selected samples to the received one or more global clusters ([Page 4, Column 1, Paragraph 2] In order to simulate such non-IID client feature distributions, we investigate simulated clients where the samples of a particular class are clustered based on the feature embedding from a pre-trained network. [Page 10, Column 1, Paragraph 1] The augmentation to the XOR-problem is that we consider the clusters to be from two different clients: client one (“up client”) has only positive x2, and client two has only negative x2 (“down client”));
and further update the updated parameters of the local feature extractor such that a cross-entropy between cluster assignments of the selected samples and the augmentation of said selected samples is minimised, thereby generating a trained local feature extractor ([Page 5, Column 2, Paragraph 1] While training, we monitor the loss of a held-out sub-partition of 5 % the training set (a validation partition) by centrally collecting the outcomes on the validation data points at each client. The best model (model weights) with the lowest cross-entropy loss on the validation partition across clients after a total of a 1000 rounds of federated learning is then used in the final evaluation on the test set).
However, Rieger does not explicitly disclose: generate one or more global centroids by clustering the received set of the generated local centroids the user device and from some other user devices by using a clustering algorithm; provide the one or more global centroids to the user device.
Brandao teaches, in an analogous system: generate one or more global centroids by clustering the received set of the generated local centroids the user device and from some other user devices by using a clustering algorithm ([Page 2, Paragraph 3] To reduce the data that is shared with the server and for robustness against non-IID data, clients compute the K-means locally, with a variable number of clusters, and only the centroids are sent to the server. To preserve privacy, the centroids are encrypted homomorphically, which still allows the server to compute the distance from the local centroids to the global centroids, over encrypted data. The distances are then sent to the clients who, after decryption, assign each local centroid to a global centroid);
provide the one or more global centroids to the user device ([Page 2, Paragraph 3] To reduce the data that is shared with the server and for robustness against non-IID data, clients compute the K-means locally, with a variable number of clusters, and only the centroids are sent to the server. To preserve privacy, the centroids are encrypted homomorphically, which still allows the server to compute the distance from the local centroids to the global centroids, over encrypted data. The distances are then sent to the clients who, after decryption, assign each local centroid to a global centroid).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Rieger to incorporate the teachings of Brandao to generate one or more global centroids by clustering the received set of the generated local centroids the user device and from some other user devices by using a clustering algorithm; provide the one or more global centroids to the user device. One would have been motivated to do this modification because doing so would give the benefit of preserving privacy as taught by Brandao [Page 2, Paragraph 3].

Claims 2 and 3 are rejected under 35 U.S.C. 103 as being unpatentable over Rieger et al (Client Adaptation improves Federated Learning with Simulated Non-IID Clients, 2020) in view of Brandao et al (Efficient Privacy Preserving Distributed K-Means for Non-IID Data, 2021) and further in view of Neumann (US 20210057100 A1).
Regarding claim 2:
The system of Rieger and Brandao teaches: An apparatus as claimed in claim 1 (as shown above).
However, the system of Rieger and Brandao does not explicitly disclose: wherein the updating of the parameters of the feature extractor is further caused to minimise a loss function, wherein the loss function includes a clustering parameter.
Neumann teaches, in an analogous system: wherein the updating of the parameters of the feature extractor is further caused to minimise a loss function, wherein the loss function includes a clustering parameter ([Page 31, [0146], Column 2] An element of machine-learning data may include variables used to minimize a loss function. An element of machine-learning data may include a graphical representation of clusters generated by a clustering algorithm).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined system of Rieger and Brandao to incorporate the teachings of Neumann wherein the updating of the parameters of the feature extractor is further caused to minimise a loss function, wherein the loss function includes a clustering parameter. One would have been motivated to do this modification because doing so would give the benefit of Machine learning that includes any algorithm and/or statistical model that processor connected to memory utilizes to perform any task without using explicit instructions, and instead relying on patterns and inferences as taught by Neumann [0146].

Regarding claim 3:
The system of Rieger and Brandao teaches: An apparatus as claimed in claim 2 (as shown above).
However, the system of Rieger and Brandao does not explicitly disclose: wherein the loss function includes a degeneracy parameter incorporating a prediction of an augmentation applied to respective samples.
Neumann teaches, in an analogous system: wherein the loss function includes a degeneracy parameter incorporating a prediction of an augmentation applied to respective samples ([Page 22, [0110], Column 2] Scoring function may be expressed as a risk function representing an “expected loss” of an algorithm relating inputs to outputs, where loss is computed as an error function representing a degree to which a prediction generated by the relation is incorrect when compared to a given input-output pair provided in first training set. Note: Scoring function corresponds to a degeneracy parameter).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined system of Rieger and Brandao to incorporate the teachings of Neumann wherein the loss function includes a degeneracy parameter incorporating a prediction of an augmentation applied to respective samples. One would have been motivated to do this modification because doing so would give the benefit of Machine learning that includes any algorithm and/or statistical model that processor connected to memory utilizes to perform any task without using explicit instructions, and instead relying on patterns and inferences as taught by Neumann [0146].

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Rieger et al (Client Adaptation improves Federated Learning with Simulated Non-IID Clients, 2020) in view of Brandao et al (Efficient Privacy Preserving Distributed K-Means for Non-IID Data, 2021) and further in view of Baker et al (Decentralized dynamic functional network connectivity: State analysis in collaborative settings, 2019).
Regarding claim 2:
The system of Rieger and Brandao teaches: An apparatus as claimed in claim 1 (as shown above).
However, the system of Rieger and Brandao does not explicitly disclose: wherein the clustering algorithm used to generate one or more local centroids generates equally sized local clusters.
Baker teaches, in an analogous system: wherein the clustering algorithm used to generate one or more local centroids generates equally sized local clusters ([Page 2913, Figure 1] with cluster size C = 2. [Page 2913, Column 1, Last Paragraph] where each processor broadcasts an updated set of local centroids. Note: cluster size =2 corresponds to equally sized local clusters).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined system of Rieger and Brandao to incorporate the teachings of Baker wherein the clustering algorithm used to generate one or more local centroids generates equally sized local clusters. One would have been motivated to do this modification because doing so would give the benefit of each cluster performs the standard GlobalPCA. As the recursion steps back from this base-case, the result from GlobalPCA is passed between subclusters, and GlobalPCA performed again until the recursion ends as taught by Neumann [0146].


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Ribeiro et al (Prediction of Privacy Preferences with User Profiles: A Federated Learning Approach, 2021) discloses a system capable of learning the users’ privacy preferences according to the context, and the learning mechanism must to be private, i.e. the users’ data about their privacy preferences and context should not be disclosed to anyone.
Peng et al (FEDERATED ADVERSARIAL DOMAIN ADAPTATION, 2019) discloses a principled approach to the problem of federated domain adaptation, which aims to align the representations learned among the different nodes with the data distribution of the target node. Our approach extends adversarial adaptation techniques to the constraints of the federated setting. In addition, we devise a dynamic attention mechanism and leverage feature disentanglement to enhance knowledge transfer. Empirically, we perform extensive experiments on several image and text classification tasks and show promising results under unsupervised federated domain adaptation setting.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAITANYA RAMESH JAYAKUMAR whose telephone number is (571)272-3369. The examiner can normally be reached Mon-Fri 9am-1pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C.R.J./               Examiner, Art Unit 2128                                                                                                                                                                                         
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128
Read full office action
Prosecution Timeline

Apr 25, 2023
Application Filed
Apr 28, 2026
Non-Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

15/884,279
Patent 12293260
GENERATING AND DEPLOYING PACKAGES FOR MACHINE LEARNING AT EDGE DEVICES
7y 3m to grant Granted May 06, 2025
16/547,380
Patent 12147915
SYSTEMS AND METHODS FOR MODELLING PREDICTION ERRORS IN PATH-LEARNING OF AN AUTONOMOUS LEARNING AGENT
5y 3m to grant Granted Nov 19, 2024
15/866,225
Patent 11770571
Matrix Completion and Recommendation Provision with Deep Learning
5y 8m to grant Granted Sep 26, 2023
16/507,025
Patent 11769074
COLLECTING OBSERVATIONS FOR MACHINE LEARNING
4y 2m to grant Granted Sep 26, 2023
15/826,613
Patent 11741693
SYSTEM AND METHOD FOR SEMI-SUPERVISED CONDITIONAL GENERATIVE MODELING USING ADVERSARIAL NETWORKS
5y 9m to grant Granted Aug 29, 2023
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
24%
Grant Probability
45%
With Interview (+20.5%)
5y 3m (~2y 1m remaining)
Median Time to Grant
Low
PTA Risk
Based on 53 resolved cases by this examiner. Grant probability derived from career allowance rate.