Last updated: April 19, 2026
Application No. 17/796,895
A Central Node and Method Therein for Enabling an Aggregated Machine Learning Model from Local Machine Learnings Models in a Wireless Communications Newtork

Final Rejection §103
Filed
Aug 02, 2022
Examiner
DIEP, DUY T
Art Unit
2123
Tech Center
2100 — Computer Architecture & Software
Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
OA Round
2 (Final)
Interview Optional

— +5.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 20 resolved cases, 2023–2026
Examiner Intelligence

DIEP, DUY T View full profile →
Grants only 25% of cases
Career Allow Rate
5 granted / 20 resolved
-30.0% vs TC avg
Moderate +6% lift
Without
With
+5.5%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
39 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
34.1%
-5.9% vs TC avg
§103
54.0%
+14.0% vs TC avg
§102
2.3%
-37.7% vs TC avg
§112
9.6%
-30.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 20 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Arguments
The arguments and amendments filed 11/06/2025 have been entered. Claims 26-40 remain pending in the application.

Applicant’s amendment and argument, with respect to claim rejections of claims 26-40 under 35 U.S.C 103 filed 08/07/2025 have been considered and they are not persuasive. Therefore, the previous rejections as set forth in the previous office action will be maintained.
The applicant argues that the cited combination of Hardy in view of Wu fails to discloses or suggest the claimed “receiving” step of claim 26, particularly, the requirement that a central node receives, from each of at least two local nodes, a parametrized function of a local machine learning model. According to the Applicant, Hardy does not disclose that worker nodes transmit a parametrized function of a local model to a central server, and Wu likewise does not disclose a distributed system involving local nodes each having a local generator and discriminator functions distinct from a central node. The applicant further contends that Wu’s disclosure of a data augmentation unit and discriminator training does not address or remedy this alleged deficiency, nor does the Office Action explain how or why a person ordinary skilled in the art would have modified Hardy to include such functionality.
The Applicant further asserts that neither Hardy nor Wu discloses or suggests the claimed “determining” step, in which a central node determines first and second cross-discrimination values by applying a discriminator function received from one local node to samples generated using the generator function of another local node in a pair-wise manner. The applicant argues that the Office Action relies on Ferdowski to supply this limitation, but that Ferdowski describes a fundamentally different architecture involving a full distributed, multi-agent learning system without a central controller. According to the Applicant, Ferdowski’s system does not perform cross-discrimination in a pair-wise manner corresponding to pairs of local nodes, nor does it produce first and second cross-0discriminator values as recited in the claims.
Finally, the applicant argues the proposed combination would require fundamentally changing the principle of operation of Hardy’s centralized system. The Applicant contends that Ferdowsi teaches away from centralized aggregation by emphasizing a fully distributed architecture intended to reduce communication overhead, and that incorporating Ferdowsi’s teachings into Hardy would eliminate Hardy’s centralized operation. The applicant therefore asserts that the Office Action fails to explain how the references could be combined in a manner that would result in the claimed invention without impermissibly reconstructing the prior art using hindsight, and conclude that the rejections are based on combining disparate systems that could not reasonably be combined to achieve the claimed subject matter.

The examiner respectfully disagrees with Applicant’ assertion that the cites references fail to disclose or suggest the claimed receiving of a parametrized function from each local node. Under the broadest reasonable interpretation, a “parametrized function” encompasses a function whose output depends on one or more parameters, and the claim does not require that such parameters be independently trained or that the parametrized function be limited to a particular model architecture. Hardy teaches a federated GAN framework in which each worker maintains local model components and transmits parameters defining those components to a central server for aggregation, thereby teaching the receive at a central node of parametrized functions of local machine learning models. Wu further teaches a data augmentation unit that determines whether generated or unlabeled samples are added to a labeled dataset based on a decision function comparing posterior probability against a threshold confidence level. Such a decision function is parametrized by the threshold value and therefore corresponds to a parametrized function associated with the local machine learning process. Because Wu teaches that this augmentation function directly influences the training data used by the generator and discriminator, and Hardy teaches transmitting parameters of learned or configured training components that influence local training from local workers to a central parameter server for coordinated training and aggregation, a person ordinary skilled in the art would have understood that parameters defining Wu’s augmentation function constitute model-related parameters that may be likewise be transmitted within Hardy’s federated learning framework. Accordingly, the combination of Hardy and Wu teaches or at least suggests the claimed receiving of a parametrized function at the central node.
The examiner respectfully disagrees that the references fail to discloses or suggest the claimed determining of first and second cross-discrimination values. While Hardy and Wu do not explicitly describe applying a discriminator associated with one local node to samples generated by another local node, Ferdowsi teaches that generated samples produced by one agent may be evaluated by discriminator associated with other agents, producing discriminator output values indicating whether received samples are real or fake. Such evaluation inherently corresponds to applying a discriminator function to samples generated by another generator and obtaining corresponding evaluation values, which reasonably corresponds to the claimed cross-discrimination values under the broadest reasonable interpretation. Hardy teaches that generator and discriminator from multiple local workers are received at a central server coordinated training and aggregation (Figure 1b). Because the central server in Hardy already receives generator and discriminator from multiple workers, a person ordinary skilled in the art would have found it obvious to perform the cross-agent/worker discrimination evaluation taught by Ferdowski at the central node as part of the aggregation or evaluation process, representing a predictable use of known GAN evaluation techniques within Hardy’s centralized training framework.
The examiner respectfully disagrees with Applicant’s argument that the proposed combination would change the principle of operation of Hardy or that Ferdowsi teaches away from the proposed modification. Ferdowsi’s disclosure of a distributed multi-agent learning system describes an alternative implementation intended to improve training performance and data utilization, but does not criticize, discredit, or otherwise discourage centralized aggregation frameworks such as Hardy’s centralized parameter-server architecture, but rather to provide the teaching of cross-agent discriminator evaluation as a known functional technique for improving GAN training. Incorporating this known evaluation operation into Hardy’s framework preserves Hardy’s centralized aggregation and overall operation, while merely augmenting the information used during model aggregation and evaluation, Accordingly, the proposed combination represents the application of known techniques within an existing framework to achieve predictable improvements in model training, and does not constitute a change in principle of operation or rely on impermissible hindsight reconstruction.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 26-33, 35-40 are rejected under 35 U.S.C. 103 as being unpatentable by Hardy et.al (NPL: MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets) in view of Wu et.al (US 20190122120 A1), further in view Ferdowsi et.al (NPL: Brainstorming Generative Adversarial Networks (BGANs): Towards Multi-Agent Generative Models with Distributed Private Datasets).

Regarding claim 26,
Hardy teaches the 1st limitation “A method performed by a central node for enabling a machine learning model to be aggregated from local machine learning models comprised in at least two local nodes, whereby the central node and the at least two local nodes form parts of a wireless communications network” (Page 1 column 2 section I “a novel method to train a GAN in a distributed fashion, that is to say over the data of a set of participating workers (e.g., datacenters connected through WAN [8], or devices at the edge of the Internet)”, and Page 3 column 1-2 section III “Federated learning [27] proposes to train a machine learning model, and in particular a deep neural network, on a set of workers. It follows the parameter server framework, with the particularity that workers perform numerous local iterations between each communication to the server (i.e., a round), instead of sending small updates ... we propose an adapted version of federated learning to GANs. This adaptation considers the discriminator D and generator G on each worker as one computational object to be treated atomically. Workers perform iterations locally on their data and every E epochs ... they send the resulting parameters to the server. The server in turn averages the G and D parameters of all workers, in order to send updates to those workers at the next iteration. We name this adapted version FL-GAN; it is depicted by Figure 1 b)”. Hardy discloses a novel method of multi-discriminator Generative Adversarial Networks for distributed datasets. Within the disclosure, Hardy proposes an adapted version of federated learning to GANs (FL-GAN) as a mean of comparison to the MD-GAN. The FL-GAN comprises of a set of workers connected with a central server, wherein the central server obtain data from each worker and perform an aggregation process of averaging data to update and send these updates back to the worker. The plurality of workers suggests local nodes and the central server suggests central node within the claim. The central server may be connected with workers through a configuration of internet communication, which suggest the wireless communications network within the claim.)
Hardy teaches a part of the 2nd limitation “receiving, from each of the at least two local nodes, ..., a generator function of a local generative model, and a discriminator function of a local discriminative model, ...” (Page 3 column 2 section III “we propose an adapted version of federated learning to GANs. This adaptation considers the discriminator D and generator G on each worker as one computational object to be treated atomically. Workers perform iterations locally on their data and every E epochs (i.e., each worker passes E times the data in their GAN) they send the resulting parameters to the server”. Hardy discloses each worker that suggests each local node within the claim comprises of a discriminator D and a generator G that may send their data of the generator and the discriminator to the central server for aggregation update.)
Hardy teaches the 4th limitation “obtaining an aggregated machine learning model based on the determined first and second cross-discrimination values” (Page 2 column 1 section 1.1 “an adaptation of FL called FLGAN in which every agent trains a global GAN on its own data using a single per-agent discriminator and a per-agent generator and communicates the training updates to a central aggregator that learns a global GAN model”, and Page 3 column 1-2 section III “Federated learning [27] proposes to train a machine learning model, and in particular a deep neural network, on a set of workers. It follows the parameter server framework, with the particularity that workers perform numerous local iterations between each communication to the server (i.e., a round), instead of sending small updates ... we propose an adapted version of federated learning to GANs. This adaptation considers the discriminator D and generator G on each worker as one computational object to be treated atomically. Workers perform iterations locally on their data and every E epochs ... they send the resulting parameters to the server. The server in turn averages the G and D parameters of all workers, in order to send updates to those workers at the next iteration. We name this adapted version FL-GAN; it is depicted by Figure 1 b)”. Hardy discloses the process of FL-GAN in which information is aggregated at the central server to be updated and transmitted back to each worker. The process may be performed in combination with the teaching of Ferdowsi below, such that the result of the discriminator of each worker node may be aggregated at the central server for an update process such as averaging, wherein each worker node may utilize samples obtained from other worker nodes as disclosed by the technique of Ferdowsi.)
Hardy teaches the 5th limitation “transmitting information indicating the obtained aggregated machine learning model to one or more of the at least two local nodes in the wireless communications network” Page 3 column 1-2 section III “Federated learning [27] proposes to train a machine learning model, and in particular a deep neural network, on a set of workers. It follows the parameter server framework, with the particularity that workers perform numerous local iterations between each communication to the server (i.e., a round), instead of sending small updates ... we propose an adapted version of federated learning to GANs. This adaptation considers the discriminator D and generator G on each worker as one computational object to be treated atomically. Workers perform iterations locally on their data and every E epochs ... they send the resulting parameters to the server. The server in turn averages the G and D parameters of all workers, in order to send updates to those workers at the next iteration. We name this adapted version FL-GAN; it is depicted by Figure 1 b)”. Hardy discloses after the central server aggregate and perform update on the data from each worker, the central server sends these updates back to those workers at the next iteration, wherein the update may comprise of an averaging process of data from the generator and the discriminator of each worker, thus the central server may obtain a new GAN model based on the update.)
Hardy does not teach a part of the 2nd limitation “... a parametrized function of a local machine learning model... wherein the generator function and the discriminator function are trained on the same data as the parametrized function” However, Wu teaches this part of the limitation (paragraph 28 “a data augmentation unit is used to compare the posterior probability of a label for each unlabelled sample and each generated sample. The data augmentation unit can be implemented in the GAN or can be a separate module coupled to the GAN. When the posterior probability for the label for a given sample (e.g. the unlabelled sample or the generated sample) exceeds a threshold confidence level, the given sample is assigned the label and converted to a labelled training sample. The newly labelled sample is merged into the labeled training dataset, thereby augmenting the labelled training dataset and expanding the size of the labelled training dataset”, and paragraph 33 “In example embodiments, data augmentation unit 106 is initialized and trained”. Wu discloses self-training method and system for semi-supervised learning with generative adversarial networks which comprises the training of a data augmentation unit to assign label. Within the disclosure, Wu discloses the data augmentation unit is configured to compare the posterior probability for a given sample to assigned the label and converted samples into a labelled training sample, wherein this data augmentation unit corresponds to the parametrized function within the claim as the unit perform labelling function based on a threshold parameter. This labelled training sample may be further used by the generator and discriminator within the GAN model, suggesting the generator and the discriminator are trained on the same data as the data augmentation unit.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the teaching of an adapted version of federated learning to GANs (FL-GAN) by Hardy with the teaching of self-training method and system for semi-supervised learning with generative adversarial networks which comprises the training of a data augmentation unit to assign label by Wu. The motivation to do so is referred to in Wu’s disclosure (paragraph 28 “The newly labelled sample is merged into the labeled training dataset, thereby augmenting the labelled training dataset and expanding the size of the labelled training dataset. Progressively, the growing labeled training data set is used with newly generated data samples to further train the semi-supervised GAN. Augmenting the training dataset by adding labelled samples to the training dataset using the method and system described herein improves the performance of the GAN”. Wu discloses the benefit of proving more labelled samples to the training dataset to further train the GAN model, which improve the performance of the GAN model via the implementation of the data augmentation unit with data augmentation function. Because Wu teaches that this augmentation function directly influences the training data used by the generator and discriminator, and Hardy teaches transmitting parameters of learned or configured training components that influence local training from local workers to a central parameter server for coordinated training and aggregation, a person ordinary skilled in the art would have understood that the threshold parameter defining Wu’s augmentation function constitute model-related parameters that may be likewise be transmitted within Hardy’s federated learning framework to improve the training of the central parameter server with corresponding threshold parameter.)
Hardy/Wu does not teach the 3rd limitation “determining, for each pair of the at least two local nodes, a first cross-discrimination value by applying the received discriminator function from a first local node of the pair to samples generated using the received generator function from the second local node of the pair, and a second cross-discrimination value by applying the received discriminator function from the second local node of the pair to samples generated using the received generator function from the first local node of the pair”. However, Ferdowsi teaches this limitation (Page 2 column 2 section 1.2 “In the BGAN architecture, every agent contains a single generator and a single discriminator and owns a private dataset. At each step of training, the agents share their idea, i.e., their generated data samples with their neighbors in order to communicate some information about their dataset without sharing the actual data samples with other agents. As such, the proposed approach enables the GANs to collaboratively brainstorm in order to generate high quality real-like data samples”, Page 3 column 1 section 2 “we also define another DNN called discriminator ... that gets a data sample x as an input and outputs a value between 0 and 1. When the output of the discriminator is closer to 1, then the received data sample is deemed to be real and when the output is closer to 0 it means the received data is fake”, and Page 3 column 2 section 3 “let Oi be the neighboring agents to whom agent i sends ideas, and let G be the directed graph of connections between the agents as shown in Figure 1. Here, a neighboring agent for agent i is defined as an agent that is connected to agent i in the connection graph G via a direct link. For our BGAN architecture, we propose to modify the classical GAN value function in (1) into a brainstorming value function which integrates the received generated data samples (ideas) from other agents”. Ferdowsi discloses a novel brainstorming GAN (BGAN) architecture which interact with multiple agents that share data with each other. Within the disclosure, Ferdowsi discloses one or more agents, wherein each agent may send their respective generated data samples from their generator to another agents, in which each another agent comprises of a discriminator to perform discrimination on their own data as well as to discriminate the received generated data samples from other agents. When the output of the discriminator is closer to 1, then the received data sample is deemed to be real and when the output is closer to 0 it means the received data is fake, which suggest the cross-discrimination values within the claim.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the teaching of an adapted version of federated learning to GANs (FL-GAN) by Hardy, and self-training method and system for semi-supervised learning with generative adversarial networks which comprises the training of a data augmentation unit to assign label by Wu, with the teaching of a novel brainstorming GAN (BGAN) architecture which interact with multiple agents that share data with each other by Ferdowsi. The motivation to do so is referred to in Ferdowsi’s disclosure (Page 2 column 2 section 1.2 “It preserves data privacy for the agents, since they do not share their owned data with one ... It significantly reduces the communication overhead compared to previous distributed GAN models such as MDGAN, FLGAN ... It allows defining different DNN architectures for different agents depending on their computational and storage capabilities”, and Page 6 column 2 section 4 “We empirically evaluate the proposed BGAN architecture on data samples drawn from multidimensional distributions as well as image datasets. Our goal here is to show how our BGAN architecture can improve the performance of agents by integrating a brainstorming mechanism compared to standalone GAN scenarios.” Ferdowsi provides the benefit of the BGAN, which preserves data privacy for the agents, significantly reduces the communication overhead compared to previous distributed GAN models such as MDGAN, FLGAN and it allows defining different DNN architectures for different agents depending on their computational and storage capabilities. Ferdowsi also provides experiment to demonstrate the improvement in comparison with other conventional GAN methods. Ferdowsi’s disclosure of a distributed multi-agent learning system describes an alternative implementation intended to improve training performance and data utilization, but does not criticize, discredit, or otherwise discourage centralized aggregation frameworks such as Hardy’s centralized parameter-server architecture FLGAN, but rather to provide the teaching of cross-agent discriminator evaluation as a known functional technique for improving GAN training. A person ordinary skilled in the art would have recognized that such cross-agent discriminator evaluation represents a known functional technique for improving GAN training independent of whether the underlying system is fully distributed or centrally coordinated. Applying Ferdowski’s cross agent evaluation within Hardy’s centralized parameter server once the generator and discriminator of multiple workers are available at the central server would have represented a predictable use of prior-art elements according to their established functions to improve model evaluation and training efficiency, while preserving hardy’s centralized aggregation and overall principle of operation. Such modification merely augments the information used during aggregation and training and would have been within the ordinary creativity of a person of ordinary skilled in the art seeking to improve GAN training performance and data utilization.)
 
Regarding claim 27 depends on claim 26, thus the rejection of claim 26 is incorporated,
Hardy teaches a part of limitation “obtaining, ... the aggregated machine learning model by averaging neural network weights of the local machine learning model of the at least two local nodes” (Page 3 column 1-2 section III Figure 1b “we propose an adapted version of federated learning to GANs. This adaptation considers the discriminator D and generator G on each worker as one computational object to be treated atomically. Workers perform iterations locally on their data and every E epochs (i.e., each worker passes E times the data in their GAN) they send the resulting parameters to the server. The server in turn averages the G and D parameters of all workers, in order to send updates to those workers at the next iteration”. Hardy discloses the central server aggregates generator and discriminator data from each worker and in turn, average the G and D parameters of all workers, thus create an updated generator and discriminator at the central server, which suggest the aggregated machine learning model within the claim, wherein the averaging of parameters of G and D suggest the averaging of neural network weights within the claim. The aggregating and averaging process may be performed in combination with the teaching of BGAN by Ferdowsi below as configured by a person ordinary skilled in the art.)
Ferdowsi teaches a part of the limitation “..., in case the determined first and second cross-discrimination values indicate that the local machine learning models of the at least two local nodes originate from data having a determined level of corresponding or overlapping distribution, ...” (Page 9 column 1 section 4.5 “Next, we prove that the BGAN agents can learn nonoverlapping portions of the other agents’ data distributions”, Page 3-4 column 1-2 section 3 “For our BGAN architecture, we propose to modify the classical GAN value function in (1) into a brainstorming value function which integrates the received generated data samples (ideas) from other agents ... pbi is a mixture distribution of agent i’s owned data and the data that agent i received from all neighboring agents”, and Page 4-5 column 1-2 section 3 “However, one key goal of our proposed BGAN is to show that using the brainstorming approach each generator can integrate the data distribution of the other agents into its generator distribution”. Ferdowsi discloses as mentioned above, that the output of the discriminator being a value closer to 1 or 0 of a first agent toward the received generated data samples from a second agent and vice versa suggest a first and second cross discrimination value, wherein each agent may learn distributions of the other agents’ data based on the exchanged information and discrimination and obtain a mixture of distribution that suggest overlapping distribution, thus a level of overlapping distribution may be obtained between agents such that an averaging process may be performed at the central server as disclosed by Hardy based on this mixture distribution between two or more agents, wherein each agent represent each worker node, to obtain an aggregated GAN model.)
Hardy teaches a part of limitation “obtaining, ..., the aggregated machine learning model by using samples generated by the received generator functions of the at least two local nodes” (Page 3 column 1-2 section III Figure 1b “we propose an adapted version of federated learning to GANs. This adaptation considers the discriminator D and generator G on each worker as one computational object to be treated atomically. Workers perform iterations locally on their data and every E epochs (i.e., each worker passes E times the data in their GAN) they send the resulting parameters to the server”. Hardy discloses the process of the generator and discriminator at each worker transmit their data to the central server for aggregating at the central server and further update, wherein the data may include samples generated by the generator of each worker as configured by a person ordinary skilled in the art. The aggregating process may be performed in combination with the teaching of BGAN by Ferdowsi below as configured by a person ordinary skilled in the art.)
Ferdowsi teaches a part of the limitation “... in case the determined first and second cross-discrimination values indicate that the local machine learning models of the at least two local nodes originate from data having a determined level of non-corresponding or non-overlapping distribution ...” (Page 9 column 1 section 4.5 “Next, we prove that the BGAN agents can learn nonoverlapping portions of the other agents’ data distributions”. Ferdowsi discloses as mentioned above, the output of the discriminator being a value closer to 1 or 0 of a first agent toward the received generated data samples from a second agent and vice versa suggest a first and second cross discrimination value, wherein each agent has a nonoverlapping distribution, thus suggest a level of nonoverlapping distribution of data between each agent. The central server may perform their GAN training based on the samples from each agent, wherein each agent represents each worker node, such that the central server may aggregate data from each generator, including generated samples data of each agent, and perform update at the central server based on the aggregated samples to obtain an aggregated GAN model)

Regarding claim 28 depends on claim 27, thus the rejection of claim 27 is incorporated,
	Hardy teaches the limitation “The method of claim 27, wherein obtaining the aggregated machine learning model by averaging neural network weights of the local machine learning models of the at least two local nodes uses one or more Federated Learning techniques” (Page 3 column 1-2 section III “Federated learning [27] proposes to train a machine learning model, and in particular a deep neural network, on a set of workers. It follows the parameter server framework, with the particularity that workers perform numerous local iterations between each communication to the server (i.e., a round), instead of sending small updates ... we propose an adapted version of federated learning to GANs. ... We name this adapted version FL-GAN” Hardy discloses federated learning with a set of workers as well as a central server communicate with each other, wherein the federated learning is configured with GAN model thus create the FL-GAN that perform learning based on federated learning techniques.)

Regarding claim 29 depends on claim 27, thus the rejection of claim 27 is incorporated,
	Hardy teaches the limitation “training an existing aggregated machine learning model, or composing a separate aggregated machine learning model, by using the samples generated by the received generator functions and labels generated by applying the parametrized functions to the samples generated by the received generator functions” (Page 3 column 1-2 section III figure 1b
“Federated learning [27] proposes to train a machine learning model, and in particular a deep neural network, on a set of workers ... we propose an adapted version of federated learning to GANs ... Workers perform iterations locally on their data and every E epochs (i.e., each worker passes E times the data in their GAN) they send the resulting parameters to the server. The server in turn averages the G and D parameters of all workers” Hardy discloses the deep neural network architecture of federated learning incorporated with GAN model to obtain the FL-GAN, wherein fig 1b demonstrate the central server comprises of an updated discriminator and generator based on generator and discriminator from each worker thus suggest an existing GAN model of the central server based on data aggregated from each worker.)

Regarding claim 30 depends on claim 29, thus the rejection of claim 29 is incorporated,
Hardy teaches the limitation “The method of claim 29, wherein the composed separate aggregated machine learning model has a different machine learning model architecture than the local machine learning models of the at least two local nodes” (Page 3 column 1-2 section III “Federated learning [27] proposes to train a machine learning model, and in particular a deep neural network, on a set of workers. It follows the parameter server framework, with the particularity that workers perform numerous local iterations between each communication to the server (i.e., a round), instead of sending small updates ... The server in turn averages the G and D parameters of all workers, in order to send updates to those workers at the next iteration”. Hardy discloses the central server aggregate data from generator and discriminator of each worker, and in turn averages the generator and discriminator parameters of all workers, thus a new architecture of generator and discriminator is obtained at the central server based on the averaging process, which suggest a different machine learning model architecture of a different GAN architecture than the worker model as understood by a person ordinary skilled in the art.)

Regarding claim 31 depends on claim 27, thus the rejection of claim 27 is incorporated,
Wu teaches the limitation “training a parametrized function of an aggregated local machine learning model by using the samples generated by the received generator functions and labels generated by applying the parametrized functions to the samples generated by the received generator functions” (paragraph 28 “a data augmentation unit is used to compare the posterior probability of a label for each unlabelled sample and each generated sample. The data augmentation unit can be implemented in the GAN or can be a separate module coupled to the GAN. When the posterior probability for the label for a given sample (e.g. the unlabelled sample or the generated sample) exceeds a threshold confidence level, the given sample is assigned the label and converted to a labelled training sample. The newly labelled sample is merged into the labeled training dataset, thereby augmenting the labelled training dataset and expanding the size of the labelled training dataset”, and paragraph 33 “In example embodiments, data augmentation unit 106 is initialized and trained”. Wu discloses the training of the data augmentation unit, which is configured to compare the posterior probability for given samples (generated samples by the generator) to assigned the label and converted these samples into a labelled training sample, wherein this data augmentation unit suggest the parametrized function within the claim as the unit perform labelling function based on a threshold parameter.)

Regarding claim 32 depends on claim 31, thus the rejection of claim 31 is incorporated,
Ferdowsi teaches the limitation “training a generator function of an aggregated generative model and a discriminator function of an aggregated discriminative model by using samples generated by the received generator functions” (Page 2 column 2 section 1.2 “In the BGAN architecture, every agent contains a single generator and a single discriminator and owns a private dataset. At each step of training, the agents share their idea, i.e., their generated data samples with their neighbors in order to communicate some information about their dataset...”. Ferdowsi discloses the agents share their idea with their neighbors to communicate information about their dataset such as their generated samples, wherein the other agent that receive this generated samples may perform learning and discriminating using this generated samples to discriminate between its own sample and the generated samples, wherein the agent corresponding to each worker within the FL-GAN.)

Regarding claim 33 depends on claim 26, thus the rejection of claim 26 is incorporated,
Ferdowsi teaches the limitation “The method of claim 26, wherein the determined first and second cross-discrimination values are normalized based on the data from which the local machine learning models of the at least two local nodes originate” (Page 3 column 1 section 2 and Figure 2 “For every agent i ∈ N , we also define another DNN called discriminator Di ... that gets a data sample x as an input and outputs a value between 0 and 1”. Ferdowsi discloses the discriminator output a value between 0 and 1, wherein a person ordinary skilled in the art would have been able to recognize a normalization process at the discriminator such that the output of the discriminator is within a range between 0 and 1. Figure 2 further suggest the normalization process through a sigmoid activation function at the output layer of the discriminator to limit the range of the output to 0 and 1.)

Regarding claim 35 depends on claim 26, thus the rejection of claim 26 is incorporated,
Ferdowsi teaches the limitation “The method of claim 26, wherein the generator function and the discriminator function are the result of a training a generative adversarial network” (Page 1 column 1-2 section 1 “In GANs, a DNN called generator generates data samples while another DNN called discriminator tries to discriminate between the generator’s data and the actual data. The interaction between the generator and discriminator results in optimizing the DNN weights such that the generator’s generated samples look similar to the realistic data”. Ferdowsi discloses the training of GAN comprises training of a generate and a discriminator, wherein each worker/agent is configured with a generator and a discriminator, which are all connected and connected with the central server that also have a generator and a discriminator of its own.)

Regarding claim 36 depends on claim 26, thus the rejection of claim 26 is incorporated,
Hardy teaches the limitation “The method of claim 26, wherein the central node is a single central node in the wireless communications network, or implemented in a number of cooperative nodes in the wireless communications network” (Page 1 column 2 section I “a novel method to train a GAN in a distributed fashion, that is to say over the data of a set of participating workers (e.g., datacenters connected through WAN [8], or devices at the edge of the Internet)”. Hardy discloses the FL-GAN, which may be configured with a central server and several worker nodes connected with each other to provide a communication round, wherein the communication may be implemented via Internet, thus suggest a wireless communication network between each worker nodes and the central server.)

Regarding claim 37, which is rejected under the same rationale as claim 26. The applicant is directed to the rejection of claim 26 above, because the claim recites similar limitations and processing steps.

Regarding claim 38 depends on claim 37, thus the rejection of claim 37 is incorporated. Claim 38 is further rejected under the same rationale as claim 27, because the claim recites similar limitations and processing steps.

Regarding claim 39 depends on claim 38, thus the rejection of claim 38 is incorporated. Claim 39 is further rejected under the same rationale as claim 28, because the claim recites similar limitations and processing steps.

Regarding claim 40 depends on claim 26, thus the rejection of claim 26 is incorporated. 
Wu teaches the limitation “A non-transitory computer-readable medium comprising, stored thereupon, a computer program comprising instructions configured so that, when executed in a processing circuitry, the computer program causes the processing circuitry to carry out the method of claim 26” (paragraph 15 “the computer program product includes a computer readable medium storing program code, wherein the program code, when run on a computer, causes the computer”, paragraph 64 “The processing system 600 may include one or more processing devices 602, such as a graphics processing unit, a processor, a microprocessor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a dedicated logic circuitry, or combinations thereof”, and paragraph 65 “The non-transitory memory(ies) 610 may store instructions for execution by the processing device(s) 602, such as to carry out examples described in the present disclosure ... one or more data sets and/or modules may be provided ... by non-transitory computer-readable medium”. Wu discloses the method may be performed by a computer program product includes a computer readable medium storing program code to perform instructions for execution by the processing device(s), wherein the processing devices may include application-specific integrated circuit or logic circuitry to perform the method, wherein a person ordinary skilled in the art may configured the method of the teaching combination above to be perform through this configuration of physical devices.)


Claims 34 is rejected under 35 U.S.C. 103 as being unpatentable by Hardy et.al (NPL: MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets) in view of in view of Wu et.al (US 20190122120 A1), further in view Ferdowsi et.al (NPL: Brainstorming Generative Adversarial Networks (BGANs): Towards Multi-Agent Generative Models with Distributed Private Datasets), further in view of Chen et.al (NPL: Efficient GAN-based method for cyber-intrusion detection)

Regarding claim 34 depends on claim 33, thus the rejection of claim 33 is incorporated,
Ferdowsi teaches a part of the limitation “The method of claim 33, wherein the normalized first and second cross-discrimination values ..., and wherein the normalized first and second cross-discrimination values ...” (Page 3 column 1 section 2 and Figure 2 “For every agent i ∈ N , we also define another DNN called discriminator Di ... that gets a data sample x as an input and outputs a value between 0 and 1”. Ferdowsi discloses each agent comprises of a discriminator to provide outputs value, wherein person ordinary skilled in the art would have been able to recognize a normalization process at the discriminator such that the output of the discriminator is within a range between 0 and 1. Figure 2 further suggest the normalization process through a sigmoid activation function at the output layer of the discriminator to limit the range of the output to 0 and 1.)
Hardy/Wu/Ferdowsi does not teach a part of the limitation “... indicate that the local machine learning models of the at least two local nodes originate from data having the determined level of non-corresponding or non-overlapping distribution when the normalized first and second cross-discrimination values both are above a first threshold value ... indicate that the local machine learning models of the at least two local nodes originate from data having the determined level of corresponding or overlapping distribution when the normalized first and second cross-discrimination values both are below a second threshold value”. However, Chen teaches this part of the limitation (Page 4-5 section 3.3 “Since GAN is able to learn the distribution of data, it can naturally be used to learn the distribution of normal data, especially where anomalies are scarce in the training set ... The first one is more practical in real life. In simple words, we need to add abundant already-known intrusion samples into a well-pretrained model to procure their anomaly score. Empirically, we can find a threshold to determine the intrusion, .... This method is proper to online detection, for there is no need to make sense of the proportion of normal samples and abnormal samples, all we need is a threshold obtained from experience”. Chen discloses efficient GAN-based method for cyber-intrusion detection, wherein GAN is used to learn the distribution of data to determine intrusion of samples data. Within the disclosure, Chen discloses the method comprises an assessment process, which comprises comparing the output of the discriminator to a threshold based on the process of adding intrusion samples into a well-pretrained model to indicate a level of intrusion of data samples, wherein samples may be added as abundant already-known intrusion samples. A person ordinary skilled in the art would have been able to configure the threshold technique similar to the teaching by Chen such that if the architecture comprises of two agents that transmit data with each other, the discriminated value by each agent may be both compare to the threshold and if both value is greater than the threshold then it may indicate that the original samples of both agents do not intrude or non-overlap with each other. Similarly, another threshold may be configured such that if both value is less than this threshold, it may indicate that the original samples of both agents intrude or overlap with each other.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the teaching of an adapted version of federated learning to GANs (FL-GAN) by Hardy, self-training method and system for semi-supervised learning with generative adversarial networks which comprises the training of a data augmentation unit to assign label by Wu, and the teaching of a novel brainstorming GAN (BGAN) architecture which interact with multiple agents that share data with each other by Ferdowsi, with the teaching of efficient GAN-based method for cyber-intrusion detection, wherein GAN is used to learn the distribution of data to determine intrusion of samples data by Chen. The motivation to do so is referred to in Chen’s disclosure (Page 1 column 1 section Abstract “With its strong generative ability, it only needs to learn the distribution of normal status, and identify the abnormal status through the gap between it and the learned distribution. Nevertheless, existing GAN-based models are not suitable to process data with discrete values, leading to immense degradation of detection performance ... we propose an efficient GAN-based model with specifically-designed loss function. Experiment results show that our model outperforms state-of-the-art models on discrete dataset and remarkably reduce the overhead”, and Page 2 column 1 section 1 “To overcome the above hurdles, in this paper, we proposed a GAN-based model with refined loss function to obtain an outstanding performance on the imbalanced dataset with discrete features. Furthermore, we used the multiple intermediate layers to soften the decision of discriminator to obtain a more moderate result.” Chen discloses the benefit of the method, which is to overcome the hurdles of accurately finding the ’normal version’ of the testing sample to enable the effective recognition of anomalies on unknown data. Chen discloses the benefit of the method as an improvement toward the GAN-based model such as reducing the overhead and outperforms state-of-the-art models. Therefore, the teaching by Chen may be incorporated with the teaching combination for further improvement. A person ordinary skilled in the art may combine Chen’s teaching in accordance with the teaching combination, such that each agent with the discriminated value from samples generated by another agent as disclosed by Ferdowsi may be compared to the threshold. If the architecture comprises of two agents that transmit data with each other, the discriminated value by each agent may be both compare to the threshold and if both value is greater than the threshold then it may indicate that the original samples of both agents does not intrude or non-overlap with each other. Similarly, if both value is less than the threshold then it may indicate that the original samples of both agents intrude or overlap with each other.)


Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DUY TU DIEP whose telephone number is (703)756-1738. The examiner can normally be reached M-F 8-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached at (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DUY T DIEP/Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
Prosecution Timeline

Aug 02, 2022
Application Filed
Aug 05, 2025
Non-Final Rejection — §103
Nov 06, 2025
Response Filed
Feb 10, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/551,821
Patent 12579428
METHOD FOR INJECTING HUMAN KNOWLEDGE INTO AI MODELS
2y 5m to grant Granted Mar 17, 2026
17/557,096
Patent 12488223
FEDERATED LEARNING FOR TRAINING MACHINE LEARNING MODELS
2y 5m to grant Granted Dec 02, 2025
17/317,908
Patent 12412129
DISTRIBUTED SUPPORT VECTOR MACHINE PRIVACY-PRESERVING METHOD, SYSTEM, STORAGE MEDIUM AND APPLICATION
2y 5m to grant Granted Sep 09, 2025
Study what changed to get past this examiner. Based on 3 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
25%
Grant Probability
30%
With Interview (+5.5%)
4y 2m
Median Time to Grant
Moderate
PTA Risk
Based on 20 resolved cases by this examiner. Grant probability derived from career allow rate.