Prosecution Insights
Last updated: April 19, 2026
Application No. 17/540,660

PRIVACY-PRESERVING CLASS LABEL STANDARDIZATION IN FEDERATED LEARNING SETTINGS

Non-Final OA §103
Filed
Dec 02, 2021
Examiner
GALVIN-SIEBENALER, PAUL MICHAEL
Art Unit
2147
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
3 (Non-Final)
25%
Grant Probability
At Risk
3-4
OA Rounds
3y 3m
To Grant
0%
With Interview

Examiner Intelligence

Grants only 25% of cases
25%
Career Allow Rate
1 granted / 4 resolved
-30.0% vs TC avg
Minimal -25% lift
Without
With
+-25.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
39 currently pending
Career history
43
Total Applications
across all art units

Statute-Specific Performance

§101
29.8%
-10.2% vs TC avg
§103
36.8%
-3.2% vs TC avg
§102
19.0%
-21.0% vs TC avg
§112
14.5%
-25.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 4 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is in response to the amendment filed on Dec. 22nd, 2025. The amendments are linked to the original application filed on Dec. 2nd, 2021. Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on Dec. 22nd, 2025 has been entered. Response to Amendment The Examiner thanks the applicant for the remarks, edits and arguments. Regarding Claim Rejections – 35 USC 103 Applicant Remarks: The applicant has amended claims 1, 13, and 20 to further define the claims. With the amendments, the applicant believes that both Romanini and Gao fail to teach the claims currently. The applicant argues that Romanini and Gao fail to teach each and every element of the independent claims. Further the applicant states that Romanini and Gao are unable to teach the amendments. For example, the applicant states that these arts do not teach the use of graphs, subgraphs, Embedding or the use of embedding to perform the actions listed in the independent claims. therefore, these articles fail to teach each and every element of the independent claims. Next the applicant states that the art used for the dependent claims would not be able to cure the dependency of the art proposed to teach the elements of the independent claims. Therefore, by means of dependency, the proposed art to teach the dependent claims fails to teach every element of the dependent claim. Finally for the reasons above, the applicant believes the rejection under 35 U.S.C. 103 should be withdrawn. Examiner Response: The applicant argues that both Romanini and Gao fail to teach the amended elements of the independent claims. As a result, the proposed arts used to teach the dependent claims would also fail to cure the deficiencies of the Romanini and Gao in the independent claims. The examiner has also recognized that Romanini and Gao fail to teach he amended claims. Further review of Romanini in particular fails to mention graphs or embeddings and therefore is no longer considered as prior art for this application. Since the examiner is not relying on Romanini, the art Gao would also no longer need to support the removed art, therefore Gao has also been removed as prior art. After each amendment the examiner must consider the previous office actions, amened claims, applicant remarks, and previously proposed arts. Further the examiner must perform a compete search to ensure the proposed invention meets the conditions for allowance or a possible rejection. Upon completion of this evaluation and further search, the examiner believes that the new arts have been discovered which teaches the amended claims and some of the original claims without reinterpretation. The examiner believes that the new proposed arts, with the combination of previously proposed art, teaches each element of the amended claims. Therefore, with addition of new art, the rejection under 35 U.S.C. 103 is upheld, see 103 rejection below. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1, 3, 4, 6, 12, 13, 15, 16, 20, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al., (Chen et al., “FedGL: Federated Graph Learning Framework with Global Self-Supervision”, 2021, hereinafter “Chen”) in view of Liu et al., (Liu et al., "Client-Edge-Cloud Hierarchical Federated Learning", Oct 2019, pp. 1-6, hereinafter "Liu"). Regarding claim 1, Chen discloses, “A computer-implemented method comprising:” Introduction, pp. 3; “In this paper, we propose a general Federated Graph Learning framework FedGL, which is capable of learning a high-quality graph model by discovering and exploiting the global self-supervision information to effectively deal with the heterogeneity and complementarity. The general framework is shown in Fig. 2.” This article discloses a method that is able to combine graph neural networks and federated learning. The framework is seen in Fig. 2.) “determining, by a cloud server using one or more data privacy-preserving techniques, a signature for each of multiple classes of data transmitted to the cloud server by multiple client devices within a federated learning environment, wherein the cloud server is coupled to each of the multiple client devices within the federated learning environment, and” (The Framework of FedGL, pp. 8; “Clients: local model training. Each client uses its local graph data to train several rounds of GCN model, obtaining model parameters 𝑊𝑘, node embeddings 𝐻𝑘, and prediction results 𝑃𝑘, then upload them to the server. Note that 𝐾 clients train their local models in parallel.” This model will perform federated learning tasks and maintain privacy preserving techniques. This model follows the standard federated learning architecture) And (Global Self-supervision Discovery, pp. 11; “Based on P - , we try to discover pseudo labels for self-supervised learning, which has been proven to be effective in the learning of image and graph data [16, 24, 43, 49]. Concretely, we unearth these high-confidence prediction results from P - and take out the predicted labels, thus obtaining the global pseudo labels. For the prediction result vector P i - of the 𝑖-th node in P - , if its predicted probability of a certain class is higher than a certain threshold, then it is selected as a pseudo label: [See Equations (12)] where Y - is the one-hot matrix of the global pseudo label, and 𝜆 ∈ [0, 1) is the confidence threshold for determining the pseudo label.” Further this model is able to use and generate labels from different data and predictions provided by the clients in the system. The central server will evaluate the data and generate labels of for the given data and send the labels back to the clients for further training.) “wherein determining the signature for each of the multiple classes of data comprises (i) constructing multiple graphs for the multiple classes of data, wherein constructing each of the multiple graphs comprises depicting one or more patterns among data points belonging to the respective class of data, and (ii) generating embeddings of at least portions of the multiple graphs;” (The Framework of FedGL, pp. 8; “Server: global self-supervision discovery. Except aggregating local model parameters to obtain a global model, we propose to discover the global self-supervision information on the server, including global pseudo label and global pseudo graph, to deal with the heterogeneity and complementarity. Specifically, server firstly performs a weighted average fusion on the prediction results 𝑃1, ..., 𝑃𝐾 to obtain the global prediction result P - . Then, server selects the result with higher probability from the predicted probability vector of each row in P - as the pseudo label of each node, which constitutes the one-hot matrix Y - of the global pseudo label. Similarly, server performs weighted average fusion on the node embeddings 𝐻1, ..., 𝐻𝐾 to obtain the global node embedding H - .” The model generates pseudo labels and global similarity graph from data sent from the clients. The graph is a used to help determine labels in the data. This model is able to label data types from the clients and construct a graph using that knowledge as well for further training.) And (Global Self-supervision Discovery, pp. 11; “Based on P - , we try to discover pseudo labels for self-supervised learning, which has been proven to be effective in the learning of image and graph data [16, 24, 43, 49]. Concretely, we unearth these high-confidence prediction results from P - and take out the predicted labels, thus obtaining the global pseudo labels. For the prediction result vector P i - of the 𝑖-th node in P - , if its predicted probability of a certain class is higher than a certain threshold, then it is selected as a pseudo label: [See Equations (12)] where Y - is the one-hot matrix of the global pseudo label, and 𝜆 ∈ [0, 1) is the confidence threshold for determining the pseudo label.” This teaches how the model uses the pseudo labels. The clients in the federated system are able to use the global graphs and the label date to execute self-supervised learning.) “identifying, by the cloud server, one or more signature matches across at least a portion of the multiple client devices based at least in part on the generated embeddings;” (The Framework of FedGL, pp. 8; “By multiplying H - and its transpose, server can reconstruct the whole adjacency matrix, obtaining the weighted adjacency matrix A - of the global pseudo graph. Server distributes the discovered global pseudo label Y - and global pseudo graph A - to each client to start the next round of training.” The server is able to identify pseudo labels from the data provided by the clients. The pseudo labels are saved by the server and distributed to the clients for training.) “generating, by the cloud server, one or more class labels for at least one or more of the multiple classes of data associated with the one or more signature matches;” (The Framework of FedGL, pp. 9; “Clients: global self-supervision utilization. The global pseudo label is regarded as the "real" label to enrich the relatively rare real training labels by constructing a self-supervised learning loss 𝐿𝑆𝑆𝐿 and adding it to the main task loss 𝐿𝐺𝐶𝑁 for joint optimization. For example in Fig. 3, edge (3, 4) in client 1 and edge (2, 4) in client 𝐾 have been well complemented. By exploiting the global pseudo label and global pseudo graph, the quality of each local model can be effectively improved, thereby leading to a high-quality global model.” The clients in this system will use the generated labels from all of the other clients to train a local model and produce predictions. This model will evaluate its own gathered data and use the pseudo labels to help identify and classify the data.) “labeling, by the cloud server, the at least one or more of the multiple classes of data associated with the one or more signature matches with the one or more generated class labels;” (Figure 2, pp. 8; Figure 2 discloses the general framework of the model proposed. The server will take data from the clients in the system. The central server will: “Aggregate the global weights … Discover global pseudo labels ... Construct global pseudo graph.”, during regular training. As stated, it will generate labels from the data sent to the server by the clients.) “transmitting, by the cloud server to the at least a portion of the multiple client devices, the at least one or more labeled classes of data; and” (The Framework of FedGL, pp. 8; “Server: global self-supervision discovery. Except aggregating local model parameters to obtain a global model, we propose to discover the global self-supervision information on the server, including global pseudo label and global pseudo graph, to deal with the heterogeneity and complementarity. Specifically, server firstly performs a weighted average fusion on the prediction results 𝑃1, ..., 𝑃𝐾 to obtain the global prediction result P - . Then, server selects the result with higher probability from the predicted probability vector of each row in P - as the pseudo label of each node, which constitutes the one-hot matrix Y - of the global pseudo label. Similarly, server performs weighted average fusion on the node embeddings 𝐻1, ..., 𝐻𝐾 to obtain the global node embedding H - .” The global server will evaluate the data sent from the clients and perform the steps listed in figure 2. After this is complete the central server will send an aggregated global model or parameters, updated labels and a graph to the clients in the system. The clients will then use this data to perform self-supervised learning until a training threshold is met.) “performing, by the cloud server, one or more automated actions based at least in part on the at least one or more labeled classes of data, wherein performing one or more automated actions comprises training one or more machine learning models to perform at least one federated learning task using distributed data from across the multiple client devices, wherein the distributed data comprises portions of data within the at least one or more labeled classes of data;” (Figure 3, pp. 8; The client models will perform the actions listed in figure 3. Once the clients receive the global model, graph and labels the client will automatically perform training steps using the data provided from the server.) And (Global Model, pp. 10; “Following FedAvg [34], we employ the weighted average aggregation method to aggregate the model parameters of 𝐾 clients to obtain the global model: [See Equation (10)] where 𝑁𝑘 is the number of nodes in the graph on the client 𝑘, and 𝑀 is the sum of the number of nodes in the graph of the 𝐾 clients, and 𝑊𝑘 is the model parameters of the client 𝑘. N k M denotes the proportion of the data volume of each client, which is used to measure the importance of its model parameters in aggregation.” The server will receive data from the clients and aggregate the data to generate a global model. This will model or parameters are sent to the clients in response to the clients sending their data. The server will automaticity perform these actions in response to the clients sending their data.) Chen fails to explicitly disclose, “wherein the method is carried out by at least one computing device comprising at least the cloud server.”. However, Liu discloses, “wherein the method is carried out by at least one computing device comprising at least the cloud server.” (Client-Edge-Cloud Hierarchical FL, pp. 2; “To combine their advantages, we consider a hierarchical FL system, which has one cloud server, L edge servers indexed by l , with disjoint client sets C l l = 1 L , and N clients indexed by i and l , with distributed datasets D i l i = 1 N . Denote D l as the aggregated dataset under edge l . Each edge server aggregates models from its clients.” The model in this article discloses the use of a commonly used federated learning architecture. This will with use a cloud server to contain a global model and communicate training steps with multiple client models in the network.) It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Chen and Liu. Chen teaches a federated learning system that is able to label and train machine learning models using client data while maintaining client privacy. Liu teaches a federated system that is able to perform common federated learning tasks using a cloud server and separate and connected client devices. One of ordinary skill would have motivation to combine a machine learning system that uses federated learning to maintain client privacy and use client data to train and refine machine learning models with a system that discloses the use of a federated system which uses cloud servers to perform the machine learning actions, “Next, we investigate two critical quantities in collaborative training systems, namely, the training time and energy consumption of mobile devices. We compare cloud-based FL (K2=1) and hierarchical FL in Table II, assuming fixed K1K2. A close observation of the table shows that the training time to reach a certain test accuracy decreases monotonically as we increase the communication frequency (i.e., K2) with the edge server for both the MNIST and CIFAR-10 datasets. This demonstrates the great advantage in training time of the hierarchical FL over cloud-based FL in training an FL model.” (Liu, Results, pp. 6). Regarding claim 3, Chen discloses, “wherein performing one or more automated actions comprises performing at least one machine learning based operation within the federated learning environment using at least a portion of the one or more trained machine learning models.” (Figure 3, pp. 8: As seen in figure 3, the clients contain their own local machine learning models. These models are used and trained by the client with information form the central server.) And (The Framework of Fed GL, pp. 7; “Clients: local model training. Each client uses its local graph data to train several rounds of GCN model, obtaining model parameters 𝑊𝑘, node embeddings 𝐻𝑘, and prediction results 𝑃𝑘, then upload them to the server. Note that 𝐾 clients train their local models in parallel.” The client models will receive a global model or parameters, a pseudo graph and pseudo labels from the server during training. The clients will use their own local models and the data from the central server to execute machine learning tasks of prediction or classification.) Regarding claim 4, Chen discloses, “wherein generating the one or more class labels comprises assigning, across the multiple client devices within the federated learning environment, a unique label for each respective one of the one or more classes of data associated with the one or more signature matches.” (The Framework of FedGL, pp. 9; “Clients: global self-supervision utilization. The global pseudo label is regarded as the "real" label to enrich the relatively rare real training labels by constructing a self-supervised learning loss 𝐿𝑆𝑆𝐿 and adding it to the main task loss 𝐿𝐺𝐶𝑁 for joint optimization. For example in Fig. 3, edge (3, 4) in client 1 and edge (2, 4) in client 𝐾 have been well complemented. By exploiting the global pseudo label and global pseudo graph, the quality of each local model can be effectively improved, thereby leading to a high-quality global model.” The central server will take in data from the client devices and use a global model to generate labels from the data. The labels are generated from each of the clients for every client to use all while preserving privacy of each client.) Regarding claim 6, Chen discloses, “wherein determining comprises computing a total number of different class labels associated with the data across the multiple client devices.” (Global Self-supervision Discovery, pp. 11; “Based on P - , we try to discover pseudo labels for self-supervised learning, which has been proven to be effective in the learning of image and graph data [16, 24, 43, 49]. Concretely, we unearth these high-confidence prediction results from P - and take out the predicted labels, thus obtaining the global pseudo labels.” The server will store the labels and pseudo labels. While training and adding more labels to the data structure the model will ensure there are no repeat labels. This teaches that the data is stored in a data structure which contain n number of labels.) Regarding claim 12, Liu discloses, “wherein software implementing the method is provided as a service in a cloud environment.” (Client-Edge-Cloud Hierarchical FL, pp. 2; “To combine their advantages, we consider a hierarchical FL system, which has one cloud server, L edge servers indexed by l , with disjoint client sets C l l = 1 L , and N clients indexed by i and l , with distributed datasets D i l i = 1 N . Denote D l as the aggregated dataset under edge l . Each edge server aggregates models from its clients.” The model in this article discloses the use of a commonly used federated learning architecture. This will with use a cloud server to contain a global model and communicate training steps with multiple client models in the network.) Regarding claim 13, Chen discloses, “determine, by the cloud server using one or more data privacy-preserving techniques, a signature for each of multiple classes of data transmitted to the cloud server by multiple client devices within a federated learning environment, wherein the cloud server is coupled to each of the multiple client devices within the federated learning environment, and” (The Framework of FedGL, pp. 8; “Clients: local model training. Each client uses its local graph data to train several rounds of GCN model, obtaining model parameters 𝑊𝑘, node embeddings 𝐻𝑘, and prediction results 𝑃𝑘, then upload them to the server. Note that 𝐾 clients train their local models in parallel.” This model will perform federated learning tasks and maintain privacy preserving techniques. This model follows the standard federated learning architecture) And (Global Self-supervision Discovery, pp. 11; “Based on P - , we try to discover pseudo labels for self-supervised learning, which has been proven to be effective in the learning of image and graph data [16, 24, 43, 49]. Concretely, we unearth these high-confidence prediction results from P - and take out the predicted labels, thus obtaining the global pseudo labels. For the prediction result vector P i - of the 𝑖-th node in P - , if its predicted probability of a certain class is higher than a certain threshold, then it is selected as a pseudo label: [See Equations (12)] where Y - is the one-hot matrix of the global pseudo label, and 𝜆 ∈ [0, 1) is the confidence threshold for determining the pseudo label.” Further this model is able to use and generate labels from different data and predictions provided by the clients in the system. The central server will evaluate the data and generate labels of for the given data and send the labels back to the clients for further training.) “wherein determining the signature for each of the multiple classes of data comprises (i) constructing multiple graphs for the multiple classes of data, wherein constructing each of the multiple graphs comprises depicting one or more patterns among data points belonging to the respective class of data, and (ii) generating embeddings of at least portions of the multiple graphs;” (The Framework of FedGL, pp. 8; “Server: global self-supervision discovery. Except aggregating local model parameters to obtain a global model, we propose to discover the global self-supervision information on the server, including global pseudo label and global pseudo graph, to deal with the heterogeneity and complementarity. Specifically, server firstly performs a weighted average fusion on the prediction results 𝑃1, ..., 𝑃𝐾 to obtain the global prediction result P - . Then, server selects the result with higher probability from the predicted probability vector of each row in P - as the pseudo label of each node, which constitutes the one-hot matrix Y - of the global pseudo label. Similarly, server performs weighted average fusion on the node embeddings 𝐻1, ..., 𝐻𝐾 to obtain the global node embedding H - .” The model generates pseudo labels and global similarity graph from data sent from the clients. The graph is a used to help determine labels in the data. This model is able to label data types from the clients and construct a graph using that knowledge as well for further training.) And (Global Self-supervision Discovery, pp. 11; “Based on P - , we try to discover pseudo labels for self-supervised learning, which has been proven to be effective in the learning of image and graph data [16, 24, 43, 49]. Concretely, we unearth these high-confidence prediction results from P - and take out the predicted labels, thus obtaining the global pseudo labels. For the prediction result vector P i - of the 𝑖-th node in P - , if its predicted probability of a certain class is higher than a certain threshold, then it is selected as a pseudo label: [See Equations (12)] where Y - is the one-hot matrix of the global pseudo label, and 𝜆 ∈ [0, 1) is the confidence threshold for determining the pseudo label.” This teaches how the model uses the pseudo labels. The clients in the federated system are able to use the global graphs and the label date to execute self-supervised learning.) “identify, by the cloud server, one or more signature matches across at least a portion of the multiple client devices based at least in part on the generated embeddings;” (The Framework of FedGL, pp. 8; “By multiplying H - and its transpose, server can reconstruct the whole adjacency matrix, obtaining the weighted adjacency matrix A - of the global pseudo graph. Server distributes the discovered global pseudo label Y - and global pseudo graph A - to each client to start the next round of training.” The server is able to identify pseudo labels from the data provided by the clients. The pseudo labels are saved by the server and distributed to the clients for training.) “generate, by the cloud server, one or more class labels for at least one or more of the multiple classes of data associated with the one or more signature matches;” (The Framework of FedGL, pp. 9; “Clients: global self-supervision utilization. The global pseudo label is regarded as the "real" label to enrich the relatively rare real training labels by constructing a self-supervised learning loss 𝐿𝑆𝑆𝐿 and adding it to the main task loss 𝐿𝐺𝐶𝑁 for joint optimization. For example in Fig. 3, edge (3, 4) in client 1 and edge (2, 4) in client 𝐾 have been well complemented. By exploiting the global pseudo label and global pseudo graph, the quality of each local model can be effectively improved, thereby leading to a high-quality global model.” The clients in this system will use the generated labels from all of the other clients to train a local model and produce predictions. This model will evaluate its own gathered data and use the pseudo labels to help identify and classify the data.) “label, by the cloud server, the at least one or more of the multiple classes of data associated with the one or more signature matches with the one or more generated class labels;” (Figure 2, pp. 8; Figure 2 discloses the general framework of the model proposed. The server will take data from the clients in the system. The central server will: “Aggregate the global weights … Discover global pseudo labels ... Construct global pseudo graph.”, during regular training. As stated, it will generate labels from the data sent to the server by the clients.) “transmit, by the cloud server to the at least a portion of the multiple client devices, the at least one or more labeled classes of data; and” (The Framework of FedGL, pp. 8; “Server: global self-supervision discovery. Except aggregating local model parameters to obtain a global model, we propose to discover the global self-supervision information on the server, including global pseudo label and global pseudo graph, to deal with the heterogeneity and complementarity. Specifically, server firstly performs a weighted average fusion on the prediction results 𝑃1, ..., 𝑃𝐾 to obtain the global prediction result P - . Then, server selects the result with higher probability from the predicted probability vector of each row in P - as the pseudo label of each node, which constitutes the one-hot matrix Y - of the global pseudo label. Similarly, server performs weighted average fusion on the node embeddings 𝐻1, ..., 𝐻𝐾 to obtain the global node embedding H - .” The global server will evaluate the data sent from the clients and perform the steps listed in figure 2. After this is complete the central server will send an aggregated global model or parameters, updated labels and a graph to the clients in the system. The clients will then use this data to perform self-supervised learning until a training threshold is met.) “perform, by the cloud server, one or more automated actions based at least in part on the at least one or more labeled classes of data, wherein performing one or more automated actions comprises training one or more machine learning models to perform at least one federated learning task using distributed data from across the multiple client devices, wherein the distributed data comprises portions of data within the at least one or more labeled classes of data.” (Figure 3, pp. 8; The client models will perform the actions listed in figure 3. Once the clients receive the global model, graph and labels the client will automatically perform training steps using the data provided from the server.) And (Global Model, pp. 10; “Following FedAvg [34], we employ the weighted average aggregation method to aggregate the model parameters of 𝐾 clients to obtain the global model: [See Equation (10)] where 𝑁𝑘 is the number of nodes in the graph on the client 𝑘, and 𝑀 is the sum of the number of nodes in the graph of the 𝐾 clients, and 𝑊𝑘 is the model parameters of the client 𝑘. N k M denotes the proportion of the data volume of each client, which is used to measure the importance of its model parameters in aggregation.” The server will receive data from the clients and aggregate the data to generate a global model. This will model or parameters are sent to the clients in response to the clients sending their data. The server will automaticity perform these actions in response to the clients sending their data.) Chen fails to explicitly disclose, “A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computing device, comprising a cloud server, to cause the computing device to:”. However, Liu discloses, “A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computing device, comprising a cloud server, to cause the computing device to:” (Settings, pp. 5; “We consider a hierarchical FL system with 50 clients, 5 edge servers and a cloud server, assuming each edge server authorizes the same number of clients with the same amount of training data. For the ML tasks, image classification tasks are considered and standard datasets MNIST and CIFAR-10 are used.” The model in this article experiments with multiple servers and connected clients. The method is executed on generic computing systems containing processors connected to memory which contain the instructions of the program.) Regarding claim 15, Chen discloses, “wherein performing one or more automated actions comprises performing at least one machine learning based operation within the federated learning environment using at least a portion of the one or more trained machine learning models.” (Figure 3, pp. 8: As seen in figure 3, the clients contain their own local machine learning models. These models are used and trained by the client with information form the central server.) And (The Framework of Fed GL, pp. 7; “Clients: local model training. Each client uses its local graph data to train several rounds of GCN model, obtaining model parameters 𝑊𝑘, node embeddings 𝐻𝑘, and prediction results 𝑃𝑘, then upload them to the server. Note that 𝐾 clients train their local models in parallel.” The client models will receive a global model or parameters, a pseudo graph and pseudo labels from the server during training. The clients will use their own local models and the data from the central server to execute machine learning tasks of prediction or classification.) Regarding claim 16, Chen discloses, “wherein generating the one or more class labels comprises assigning, across the multiple client devices within the federated learning environment, a unique label for each respective one of the one or more classes of data associated with the one or more signature matches.” (The Framework of FedGL, pp. 9; “Clients: global self-supervision utilization. The global pseudo label is regarded as the "real" label to enrich the relatively rare real training labels by constructing a self-supervised learning loss 𝐿𝑆𝑆𝐿 and adding it to the main task loss 𝐿𝐺𝐶𝑁 for joint optimization. For example in Fig. 3, edge (3, 4) in client 1 and edge (2, 4) in client 𝐾 have been well complemented. By exploiting the global pseudo label and global pseudo graph, the quality of each local model can be effectively improved, thereby leading to a high-quality global model.” The central server will take in data from the client devices and use a global model to generate labels from the data. The labels are generated from each of the clients for every client to use all while preserving privacy of each client.) Regarding claim 20, Chen discloses, “determine, by the cloud server using one or more data privacy-preserving techniques, a signature for each of multiple classes of data transmitted to the cloud server by multiple client devices within a federated learning environment, wherein the cloud server is coupled to each of the multiple client devices within the federated learning environment, and” (The Framework of FedGL, pp. 8; “Clients: local model training. Each client uses its local graph data to train several rounds of GCN model, obtaining model parameters 𝑊𝑘, node embeddings 𝐻𝑘, and prediction results 𝑃𝑘, then upload them to the server. Note that 𝐾 clients train their local models in parallel.” This model will perform federated learning tasks and maintain privacy preserving techniques. This model follows the standard federated learning architecture) And (Global Self-supervision Discovery, pp. 11; “Based on P - , we try to discover pseudo labels for self-supervised learning, which has been proven to be effective in the learning of image and graph data [16, 24, 43, 49]. Concretely, we unearth these high-confidence prediction results from P - and take out the predicted labels, thus obtaining the global pseudo labels. For the prediction result vector P i - of the 𝑖-th node in P - , if its predicted probability of a certain class is higher than a certain threshold, then it is selected as a pseudo label: [See Equations (12)] where Y - is the one-hot matrix of the global pseudo label, and 𝜆 ∈ [0, 1) is the confidence threshold for determining the pseudo label.” Further this model is able to use and generate labels from different data and predictions provided by the clients in the system. The central server will evaluate the data and generate labels of for the given data and send the labels back to the clients for further training.) “wherein determining the signature for each of the multiple classes of data comprises (i) constructing multiple graphs for the multiple classes of data, wherein constructing each of the multiple graphs comprises depicting one or more patterns among data points belonging to the respective class of data, and (ii) generating embeddings of at least portions of the multiple graphs;” (The Framework of FedGL, pp. 8; “Server: global self-supervision discovery. Except aggregating local model parameters to obtain a global model, we propose to discover the global self-supervision information on the server, including global pseudo label and global pseudo graph, to deal with the heterogeneity and complementarity. Specifically, server firstly performs a weighted average fusion on the prediction results 𝑃1, ..., 𝑃𝐾 to obtain the global prediction result P - . Then, server selects the result with higher probability from the predicted probability vector of each row in P - as the pseudo label of each node, which constitutes the one-hot matrix Y - of the global pseudo label. Similarly, server performs weighted average fusion on the node embeddings 𝐻1, ..., 𝐻𝐾 to obtain the global node embedding H - .” The model generates pseudo labels and global similarity graph from data sent from the clients. The graph is a used to help determine labels in the data. This model is able to label data types from the clients and construct a graph using that knowledge as well for further training.) And (Global Self-supervision Discovery, pp. 11; “Based on P - , we try to discover pseudo labels for self-supervised learning, which has been proven to be effective in the learning of image and graph data [16, 24, 43, 49]. Concretely, we unearth these high-confidence prediction results from P - and take out the predicted labels, thus obtaining the global pseudo labels. For the prediction result vector P i - of the 𝑖-th node in P - , if its predicted probability of a certain class is higher than a certain threshold, then it is selected as a pseudo label: [See Equations (12)] where Y - is the one-hot matrix of the global pseudo label, and 𝜆 ∈ [0, 1) is the confidence threshold for determining the pseudo label.” This teaches how the model uses the pseudo labels. The clients in the federated system are able to use the global graphs and the label date to execute self-supervised learning.) “identify, by the cloud server, one or more signature matches across at least a portion of the multiple client devices based at least in part on the generated embeddings;” (The Framework of FedGL, pp. 8; “By multiplying H - and its transpose, server can reconstruct the whole adjacency matrix, obtaining the weighted adjacency matrix A - of the global pseudo graph. Server distributes the discovered global pseudo label Y - and global pseudo graph A - to each client to start the next round of training.” The server is able to identify pseudo labels from the data provided by the clients. The pseudo labels are saved by the server and distributed to the clients for training.) “generate, by the cloud server, one or more class labels for at least one or more of the multiple classes of data associated with the one or more signature matches;” (The Framework of FedGL, pp. 9; “Clients: global self-supervision utilization. The global pseudo label is regarded as the "real" label to enrich the relatively rare real training labels by constructing a self-supervised learning loss 𝐿𝑆𝑆𝐿 and adding it to the main task loss 𝐿𝐺𝐶𝑁 for joint optimization. For example in Fig. 3, edge (3, 4) in client 1 and edge (2, 4) in client 𝐾 have been well complemented. By exploiting the global pseudo label and global pseudo graph, the quality of each local model can be effectively improved, thereby leading to a high-quality global model.” The clients in this system will use the generated labels from all of the other clients to train a local model and produce predictions. This model will evaluate its own gathered data and use the pseudo labels to help identify and classify the data.) “label, by the cloud server, the at least one or more of the multiple classes of data associated with the one or more signature matches with the one or more generated class labels” (Figure 2, pp. 8; Figure 2 discloses the general framework of the model proposed. The server will take data from the clients in the system. The central server will: “Aggregate the global weights … Discover global pseudo labels ... Construct global pseudo graph.”, during regular training. As stated, it will generate labels from the data sent to the server by the clients.) “transmit, by the cloud server to the at least a portion of the multiple client devices, the at least one or more labeled classes of data; and” (The Framework of FedGL, pp. 8; “Server: global self-supervision discovery. Except aggregating local model parameters to obtain a global model, we propose to discover the global self-supervision information on the server, including global pseudo label and global pseudo graph, to deal with the heterogeneity and complementarity. Specifically, server firstly performs a weighted average fusion on the prediction results 𝑃1, ..., 𝑃𝐾 to obtain the global prediction result P - . Then, server selects the result with higher probability from the predicted probability vector of each row in P - as the pseudo label of each node, which constitutes the one-hot matrix Y - of the global pseudo label. Similarly, server performs weighted average fusion on the node embeddings 𝐻1, ..., 𝐻𝐾 to obtain the global node embedding H - .” The global server will evaluate the data sent from the clients and perform the steps listed in figure 2. After this is complete the central server will send an aggregated global model or parameters, updated labels and a graph to the clients in the system. The clients will then use this data to perform self-supervised learning until a training threshold is met.) “perform, by the cloud server, one or more automated actions based at least in part on the at least one or more labeled classes of data, wherein performing one or more automated actions comprises training one or more machine learning models to perform at least one federated learning task using distributed data from across the multiple client devices, wherein the distributed data comprises portions of data within the at least one or more labeled classes of data.” (Figure 3, pp. 8; The client models will perform the actions listed in figure 3. Once the clients receive the global model, graph and labels the client will automatically perform training steps using the data provided from the server.) And (Global Model, pp. 10; “Following FedAvg [34], we employ the weighted average aggregation method to aggregate the model parameters of 𝐾 clients to obtain the global model: [See Equation (10)] where 𝑁𝑘 is the number of nodes in the graph on the client 𝑘, and 𝑀 is the sum of the number of nodes in the graph of the 𝐾 clients, and 𝑊𝑘 is the model parameters of the client 𝑘. N k M denotes the proportion of the data volume of each client, which is used to measure the importance of its model parameters in aggregation.” The server will receive data from the clients and aggregate the data to generate a global model. This will model or parameters are sent to the clients in response to the clients sending their data. The server will automaticity perform these actions in response to the clients sending their data.) Chen fails to explicitly disclose, “A system comprising: a cloud server which comprises: a memory configured to store program instructions; and a processor operatively coupled to the memory to execute the program instructions to:”. However, Liu discloses, “A system comprising: a cloud server which comprises: a memory configured to store program instructions; and a processor operatively coupled to the memory to execute the program instructions to:” (Settings, pp. 5; “We consider a hierarchical FL system with 50 clients, 5 edge servers and a cloud server, assuming each edge server authorizes the same number of clients with the same amount of training data. For the ML tasks, image classification tasks are considered and standard datasets MNIST and CIFAR-10 are used.” The model in this article experiments with multiple servers and connected clients. The method is executed on generic computing systems containing processors connected to memory which contain the instructions of the program.) Regarding claim 21, Chen discloses, “wherein determining comprises computing a total number of different class labels associated with the data across the multiple client devices.” (Global Self-supervision Discovery, pp. 11; “Based on P - , we try to discover pseudo labels for self-supervised learning, which has been proven to be effective in the learning of image and graph data [16, 24, 43, 49]. Concretely, we unearth these high-confidence prediction results from P - and take out the predicted labels, thus obtaining the global pseudo labels.” The server will store the labels and pseudo labels. While training and adding more labels to the data structure the model will ensure there are no repeat labels. This teaches that the data is stored in a data structure which contain n number of labels.) Claims 7-10, 18, and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Chen and Liu in view of Briggs et al., (Briggs et al., "Federated learning with hierarchical clustering of local updates to improve training on non-11D data", May 2020, pp. 1-9, hereinafter "Briggs"). Regarding claim 7, Chen and Liu fail to explicitly disclose the elements of this claims. However, Briggs discloses, “clustering the data of each of the multiple client devices, according to class label, in a number of groups equal to the total number of different class labels.” (Hierarchical clustering, pp. 2; "Hierarchical clustering [7] is a natural choice for the purpose of clustering where the number of clusters is unknown and where all examples are assigned to the most relevant cluster. Another benefit of using hierarchical clustering is its ability to scale to large numbers of samples and clusters as well as being reasonably interpretable." This article discloses that the number of clusters is initially undefined at first and more cluster can be added as needed. Under the broadest reasonable interpretation can disclose that the number of clusters.) It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Chen, Liu and Briggs. Chen teaches a federated learning system that is able to label and train machine learning models using client data while maintaining client privacy. Liu teaches a federated system that is able to perform common federated learning tasks using a cloud server and separate and connected client devices. Briggs teaches a federated learning model which is able to improve classification of non-independent and identical data distribution. One of ordinary skill would have motivation to combine a machine learning system that uses federated learning to maintain client privacy and use client data to train and refine machine learning models with a system that discloses the use of a federated system which uses cloud servers to perform the machine learning actions with a system that is able to also use federated learning to evaluate and train models which contain non-independent and identical data distributions, “In 2 of 3 of our non-iid settings, FL+HC allows learning to converge more quickly and allows for more clients (up to 2x) to reach a target accuracy at the end of training. A second range of experiments tested the effect of varying the hyperparameters of the hierarchical clustering algorithm. Results among the non-iid settings show that FL+HC can result in a reduction in communication rounds by >5x when using the Manhattan distance metric. Different distance metrics result in better performance depending the non-iid nature of the data." (Briggs, Conclusion, pp. 8). Regarding claim 8, Chen and Liu fail to explicitly disclose the elements of this claims. However, Briggs discloses, “computing embedding vectors of multiple items of data derived from the data of each of the multiple client devices.” (Client Statistical heterogeneity, pp. 2; "There should be no assumption that clients have access to data drawn independently from the same underlying distribution - P i ≠ P j for all pairs of clients i and j." Each of the clients have separated data pools. Data is drawn from these clients and no client will have access to other client's data.) Regarding claim 9, Chen and Liu fail to explicitly disclose the elements of this claims. However, Briggs discloses, “wherein identifying one or more signature matches comprises, for each pair of embedding vectors, computing a similarity value based at least in part on at least one distance value associated with the two embedding vectors.” (Hierarchical clustering, pp. 2; "In this work we opt to use an agglomerative hierarchical clustering method which begins with all samples belonging to their own singleton cluster. Each sample is simply a vectorized local model update (the parameters of the local model). At each step of the clustering, the pairwise distance between all clusters is calculated to judge their similarity." Each of the samples taken are initially placed into their own cluster. Then the clusters are compared and merged in this case. This discloses that the pairs of vectors are compared and the distance between the pairs is measured.) Regarding claim 10, Chen and Liu fail to explicitly disclose the elements of this claims. However, Briggs discloses, “determining a given number of the embedding vector pairs having a similarity value above a given value.” (Hierarchical clustering, pp. 2; "In this work we opt to use an agglomerative hierarchical clustering method which begins with all samples belonging to their own singleton cluster. Each sample is simply a vectorized local model update (the parameters of the local model). At each step of the clustering, the pairwise distance between all clusters is calculated to judge their similarity." Each of the samples taken are initially placed into their own cluster. Then the clusters are compared and merged in this case. This discloses that the pairs of vectors are compared and they can be paired based on similar values.) Regarding claim 18, Chen discloses, “computing a total number of different class labels associated with the data across the multiple client devices;” (Global Self-supervision Discovery, pp. 11; “Based on P - , we try to discover pseudo labels for self-supervised learning, which has been proven to be effective in the learning of image and graph data [16, 24, 43, 49]. Concretely, we unearth these high-confidence prediction results from P - and take out the predicted labels, thus obtaining the global pseudo labels.” The server will store the labels and pseudo labels. While training and adding more labels to the data structure the model will ensure there are no repeat labels. This teaches that the data is stored in a data structure which contain n number of labels.) Chen and Liu fail to explicitly disclose the remaining elements of this claim. However, Briggs discloses, “clustering the data of each of the multiple client devices, according to class label, in a number of groups equal to the total number of different class labels; and” (Hierarchical clustering, pp. 2; "Hierarchical clustering [7] is a natural choice for the purpose of clustering where the number of clusters is unknown and where all examples are assigned to the most relevant cluster. Another benefit of using hierarchical clustering is its ability to scale to large numbers of samples and clusters as well as being reasonably interpretable." This article discloses that the number of clusters is initially undefined at first and more cluster can be added as needed. Under the broadest reasonable interpretation can disclose that the number of clusters.) “computing embedding vectors of multiple items of data derived from the data of each of the multiple client devices.” (Client Statistical heterogeneity, pp. 2; "There should be no assumption that clients have access to data drawn independently from the same underlying distribution - P i ≠ P j for all pairs of clients i and j." Each of the clients have separated data pools. Data is drawn from these clients and no client will have access to other client's data.) Regarding claim 22, Chen and Liu fail to explicitly disclose the elements of this claims. However, Briggs discloses, “wherein the processor is operatively coupled to the memory to further execute the program instructions to: cluster the data of each of the multiple client devices, according to class label, m a number of groups equal to the total number of different class labels.” (Hierarchical clustering, pp. 2; "Hierarchical clustering [7] is a natural choice for the purpose of clustering where the number of clusters is unknown and where all examples are assigned to the most relevant cluster. Another benefit of using hierarchical clustering is its ability to scale to large numbers of samples and clusters as well as being reasonably interpretable." This article discloses that the number of clusters is initially undefined at first and more cluster can be added as needed. Under the broadest reasonable interpretation can disclose that the number of clusters.) Regarding claim 23, Chen and Liu fail to explicitly disclose the elements of this claims. However, Briggs discloses, “wherein the processor is operatively coupled to the memory to further execute the program instructions to: compute embedding vectors of multiple items of data derived from the data of each of the multiple client devices.” (Client Statistical heterogeneity, pp. 2; "There should be no assumption that clients have access to data drawn independently from the same underlying distribution - P i ≠ P j for all pairs of clients i and j." Each of the clients have separated data pools. Data is drawn from these clients and no client will have access to other client's data.) Regarding claim 24, Chen and Liu fail to explicitly disclose the elements of this claims. However, Briggs discloses, “wherein identifying one or more signature matches comprises, for each pair of embedding vectors, computing a similarity value based at least in part on at least one distance value associated with the two embedding vectors.” (Hierarchical clustering, pp. 2; "In this work we opt to use an agglomerative hierarchical clustering method which begins with all samples belonging to their own singleton cluster. Each sample is simply a vectorized local model update (the parameters of the local model). At each step of the clustering, the pairwise distance between all clusters is calculated to judge their similarity." Each of the samples taken are initially placed into their own cluster. Then the clusters are compared and merged in this case. This discloses that the pairs of vectors are compared and the distance between the pairs is measured.) Claims 11 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, Liu and Briggs in view of Kushmerick et al., (Kushmerick et al., "Automated Email Activity Management: An Unsupervised Learning Approach", January 2005, pp. 67-74, hereinafter "Kushmerick"). Regarding claim 11, Chen, Liu and Briggs fail to explicitly disclose the elements of this claim. However, Kushmerick discloses, “identifying at least a portion of the one or more signature matches by unwrapping the given number of the embedding vector pairs, wherein unwrapping comprises determining which of the class labels of at least a first client device are mapped to which of the class labels of at least a second client device.” (Approach, pp. 70; "Given this revised distance metric, we merge the G most similar pairs of clusters, where G is a user-specified parameter." Each of the pairs are evaluated for similarity and can be merged depending on this evaluation. Under the broadest reasonable interpretation this evaluation discloses matches between the pairs of clusters.) It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Chen, Liu, Briggs and Kushmerick. Chen teaches a federated learning system that is able to label and train machine learning models using client data while maintaining client privacy. Liu teaches a federated system that is able to perform common federated learning tasks using a cloud server and separate and connected client devices. Briggs teaches a federated learning model which is able to improve classification of non-independent and identical data distribution. Kushmerick teaches the use of distributed clients with unsupervised learning to help improve a global model. One of ordinary skill would have motivation to combine a machine learning system that uses federated learning to maintain client privacy and use client data to train and refine machine learning models with a system that discloses the use of a federated system which uses cloud servers to perform the machine learning actions further with a system that is able to also use federated learning to evaluate and train models which contain non-independent and identical data distributions, and finally with a system that uses unsupervised learning with distributed client's data to improve data classification accuracy, “Specifically, we make the following contributions: (1) We formalize email-based activities as finite state automata, where messages represent state transitions; (2) We specify and describe solutions to several unsupervised learning tasks in this context: activity identification, transition identification, automaton induction, and message classification; and (3) We provide empirical evidence demonstrating that our algorithms can learn process models given a small amount of unlabeled training data, and accurately update a user's state in the model as new messages arrive." (Kushmerick, Conclusions, pp. 73). Regarding claim 19, Briggs discloses, “for each pair of embedding vectors, computing a similarity value based at least in part on at least one distance value associated with the two embedding vectors;” (Hierarchical clustering, pp. 2; "In this work we opt to use an agglomerative hierarchical clustering method which begins with all samples belonging to their own singleton cluster. Each sample is simply a vectorized local model update (the parameters of the local model). At each step of the clustering, the pairwise distance between all clusters is calculated to judge their similarity." Each of the samples taken are initially placed into their own cluster. Then the clusters are compared and merged in this case. This discloses that the pairs of vectors are compared and the distance between the pairs is measured.) “determining a given number of the embedding vector pairs having a similarity value above a given value; and” (Hierarchical clustering, pp. 2; "In this work we opt to use an agglomerative hierarchical clustering method which begins with all samples belonging to their own singleton cluster. Each sample is simply a vectorized local model update (the parameters of the local model). At each step of the clustering, the pairwise distance between all clusters is calculated to judge their similarity." Each of the samples taken are initially placed into their own cluster. Then the clusters are compared and merged in this case. This discloses that the pairs of vectors are compared and they can be paired based on similar values.) Chen, Liu and Briggs fail to explicitly discloses, “identifying at least a portion of the one or more signature matches by unwrapping the given number of the embedding vector pairs, wherein unwrapping comprises determining which of the class labels of at least a first client device are mapped to which of the class labels of at least a second client device.”. However, Kushmerick discloses, “identifying at least a portion of the one or more signature matches by unwrapping the given number of the embedding vector pairs, wherein unwrapping comprises determining which of the class labels of at least a first client device are mapped to which of the class labels of at least a second client device.” (Approach, pp. 70; "Given this revised distance metric, we merge the G most similar pairs of clusters, where G is a user-specified parameter." Each of the pairs are evaluated for similarity and can be merged depending on this evaluation. Under the broadest reasonable interpretation this evaluation discloses matches between the pairs of clusters.) Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL MICHAEL GALVIN-SIEBENALER whose telephone number is (571)272-1257. The examiner can normally be reached Monday - Friday 8AM to 5PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /PAUL M GALVIN-SIEBENALER/Examiner, Art Unit 2147 /VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action

Prosecution Timeline

Dec 02, 2021
Application Filed
Apr 03, 2025
Non-Final Rejection — §103
Jun 26, 2025
Interview Requested
Jul 09, 2025
Applicant Interview (Telephonic)
Jul 09, 2025
Examiner Interview Summary
Jul 11, 2025
Response Filed
Sep 25, 2025
Final Rejection — §103
Nov 11, 2025
Interview Requested
Nov 21, 2025
Examiner Interview Summary
Nov 21, 2025
Response after Non-Final Action
Nov 21, 2025
Applicant Interview (Telephonic)
Dec 22, 2025
Request for Continued Examination
Jan 15, 2026
Response after Non-Final Action
Mar 07, 2026
Non-Final Rejection — §103 (current)

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
25%
Grant Probability
0%
With Interview (-25.0%)
3y 3m
Median Time to Grant
High
PTA Risk
Based on 4 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month