Last updated: April 19, 2026
Application No. 17/013,432
GLOBAL FEDERATED TRAINING FOR NEURAL NETWORKS

Non-Final OA §101§103
Filed
Sep 04, 2020
Examiner
GERMICK, JOHNATHAN R
Art Unit
2122
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
5 (Non-Final)
Interview Optional

— +32.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 91 resolved cases, 2023–2026
Examiner Intelligence

GERMICK, JOHNATHAN R View full profile →
Grants 47% of resolved cases
Career Allow Rate
43 granted / 91 resolved
-7.7% vs TC avg
Strong +32% interview lift
Without
With
+32.1%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
28 currently pending
Career history
119
Total Applications
across all art units
Statute-Specific Performance

§101
29.0%
-11.0% vs TC avg
§103
38.5%
-1.5% vs TC avg
§102
17.3%
-22.7% vs TC avg
§112
14.3%
-25.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 91 resolved cases
Office Action

§101 §103
DETAILED ACTION
This action is responsive to the communication filed on 08/22/2025. Claims 1, 2, 4-30 are pending in the case.  Claims 1, 8, 15 and 23 are independent claims. Claims 1, 8, 15 and 23 have been amended.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 08/22/2025 has been entered.

Response to Arguments
Applicant's arguments filed 08/22/2025 have been fully considered but they are not persuasive. 


With respect to claims being patentable over the cited reference:
Applicant appears to argue that the claim requires that “the first predictions and the second predictions are both made with the one or more neural networks using at least one of the one or more neural network weights
Examiners notes the cited art appears to teach the claims limitation. As highlighted in the rejection the student and teacher network (i.e one or more networks) together make first and second predictions according using at least one or more weights which includes the perturbed weights used in the art. Therefore, both predictions are made with one or more neural networks. 

Applicant appears to point out that that the first predictions are made with augmented version of unlabeled training data, while the second predictions are made with the unlabeled training data. Further the Cited art used both perturbed inputs and perturbed outputs.
Examiner highlights that first and second predictions made with augmented versions of unlabeled training data corresponds to the claim. Examiner notes that claim requires that the second predictions are “of unlabeled training data”. The claim does not disallow perturbed or “augmented” training data to be included in the “unlabeled training data” claimed.

Therefore, the rejection is maintained.
	

With respect to the rejection under 35 U.S.C. 101
	Applicant argues that the claim as a whole integrate the exception into a practical application because the claims improve the functioning of a computer
Examiner notes that any such improvement should be reflected in the additional elements. As evidenced by the rejection the improvement is not reflected. Applicant provides no reason or suggestion as to how the features of the claim reflect or integrate an improvement beyond a reference to the specification that semi-supervised learning improves neural network generalizability. This amounts to claiming the idea of a solution. The disclosure does not describe how semi supervision improves the technology. While the specification explains augmentation ais performed via addition of gaussian noise it is not clear from the disclosure or the arguments provided how addition of Gaussian noise affects the supposed improvement to neural network generalizability.
Even if it were explained how addition of Gaussian noise affects the improvement such data augmentation is not reflected in the claims. Presently, the augmentation process is not claimed. At most the claims describe updating neural networks based on a difference of predictions of neural networks which itself is the recited abstract idea, and as such can not be what reflects the improvement.
Applicant argues further that the claims are not well-understood routine and conventional (WURC). 
Examiner notes that this analysis overlaps with the consideration in step 2A prong 2 with respect to additional elements which are identified as insignificant extra solution activity. The particular additional elements identified as such is explained as WURC because obtaining data in an unspecified manner, i.e mere data gathering, and is recognized in the MPEP as WURC.
The claim (at least claim 1) does not include any other additional elements beyond those which apply the judicial exception (see MPEP 2106.05(f)) and as such the limitations are not indicative of significantly more.
Therefore, the rejection is maintained.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-2, 4-30 are rejected under 35 U.S.C. 101 because the claims are directed to an abstract idea without significantly more.


Regarding Claim 1/8/15/23
	Each of these claims are independent claims.
Claim 1 is directed to a machine. Claim 8 is directed to a process. Claim 15 is directed to a machine. Claim 23 is directed to a process. Each of these claims are directed therefore to a statutory category. The claims recites the following limitations or similar limitations:	“to update one or more first neural  networks, wherein to update the one or more first  neural networks…and  update the one or more first neural networks according to a difference  between first predictions of augmented versions of unlabeled training data  and second predictions of the unlabeled training data, wherein the first and second predictions are generated by the one or more neural networks using at least one of the one or more  second  neural network weights”	 Under Step 2A Prong 1, these limitations correspond to a mental evaluation.  Updating neural network weights via computation of a difference between values describes a mental evaluation. While the update is based on predictions generated by the one or more neural networks the update nevertheless amounts to a mental evaluation.
The claims therefore recite an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the limitations from which the claim depends on recite the additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. (“A computer system …one or more processors to cause one or more client systems… the one or more client systems… from one or more server systems… one or more processors… neural networks”) See MPEP 2106.05(f). Additionally, the claims recite the following limitations: (“obtain… one or more neural network  weights from one or more second neural networks previously  updated using previous neural network weights provided by the one or more client systems”) which are insignificant extra solution activity (see MPEP 2106.05(g)). Such limitations amount to mere data gathering because “the limitation amounts to necessary data gathering and outputting, (i.e., all uses of the recited judicial exception require such data gathering or data output)” because all instances of updating neural network weights requires first obtaining the information to update.
The claim therefore is directed to an abstract idea.
	Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.  Further, the additional elements cited are insignificant extra-solution activities that are considered well-understood, routine, conventional activities. Examiner notes that communicating/transmitting weights from a source system to a target system amounts to receiving or transmitting data over a network (MPEP 2106.05(d)(II)(i). According to MPEP 2106.05(d)(II)(i), “The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner”. As such, the insignificant extra-solution activities are considered well-understood, routine, conventional activities. Thus rendering insignificant extra-solution activities in the claim routine and conventional. As such, the insignificant extra-solution activities are considered well-understood, routine, conventional activities. Therefore, the claim is not patent eligible.

Regarding Claim 2/9/11/24
	Claim 2 is directed to a system. Claim 9 is directed to a process.
	The claim does not recite any additional abstract ideas beyond those recited in the parent claim.
	Under step 2A prong 2 and 2B:
	The claim recites additional elements (“the one or more client systems communicate, to the one  more server systems, one or more second neural network weights from  the updated one or more first neural network the one or more server systems”) which amount to insignificant extra solution activity and mere data gathering as identified in the parent claim (2106.05(g)).
	The claim recites the additional element (“further train the one or more second neural networks based, at least in part, on the one or more second neural network weights”) which amounts to applying the judicial exception of updating weights to generic computer machinery (2106.05(f)).  The claim does not provide any details of the training beyond being based on the abstract idea.
	Similarly, Claim 9 describes transmitting and training which similar to the limitations described in claim 2. These limitations are analyzed in the same way.
	Similarly, Claim 11 describes transmitting similar to the limitations described in claim 2. These limitations are analyzed in the same way.
	Similarly, Claim 24 describes transmitting similar to the limitations described in claim 2. These limitations are analyzed in the same way.
	The claim therefore does not recite a practical application or significantly more.

Regarding claim 4/5/7/10/13/14/17/21/25/27
	Claim 4, 5, 7, 17, 21 are directed to a system.
	Claim 10, 13, 14, 25, 27 are directed to a process.
	The claims recite descriptive details of the weights or other data analyzed/updated via the abstract idea without reciting any additional elements beyond those identified in the parent claim.
	Therefore the claim does not recite a practical application or significantly more.

Regarding claim 6
	The claim is directed to a system.
	The claim does not recite any additional abstract ideas beyond those recited in the parent claim.
	Under step 2A prong 2 and 2B:
	The claim recites additional elements (“by transmitting, to the one or more client systems, the one or more neural network weights comprising combined neural network weights received from the one or more client systems including the previous neural network weights”) which amount to insignificant extra solution activity and mere data gathering as identified in the parent claim (2106.05(g)).
	The claim recites the additional element (“causes the one or more client systems to use the one or more first neural networks”) which amounts to applying the judicial exception of updating weights to generic computer machinery. The claim does not provide any details of the functioning or the processor beyond its application (2106.05(f)).
	The claim therefore does not recite a practical application or significantly more.

Regarding Claim 12
	Claim 12 is directed to a process.
	The claim recites “wherein the one or more second neural network weights are usable to train the one or more second neural networks by combining each of the one or more second neural network weights into one or more combined neural network weights” which further describes the weights updated as part of the abstract idea identified in the parent claim.
	Under step 2A prong 2 and 2B:
	The claim recites additional elements (“the one or more client systems communicate, to the one  more server systems, one or more second neural network weights from  the updated one or more first neural network the one or more server systems”) which amount to insignificant extra solution activity and mere data gathering as identified in the parent claim (2106.05(g)).
	The claim recites the additional element (“and training the one or more second neural networks based, at least in part, on the one or more combined neural network weights.”) which amounts to applying the judicial exception of updating weights to generic computer machinery. (2106.05(f)). The claim does not provide any details of the training beyond being based on the abstract idea.
	The claim therefore does not recite a practical application or significantly more.

Regarding Claim 18
	Claim 18 is directed to a system
	The claim does not recite any additional abstract ideas beyond those recited in the parent claim.
	Under step 2A prong 2 and 2B:
	The claim recites the additional element (“wherein the other client computer systems implement one or more other neural networks trained based, at least in part, on the one or more neural network weights and supervised data, including the labeled data”) which amounts to applying the judicial exception of updating weights to generic computer machinery (2106.05(f)). The claim does not provide any details of the functioning of the neural network.
	The claim therefore does not recite a practical application or significantly more.

Regarding Claim 19/20/22/26/28/29/30
	Claims are directed to a statutory category.
	The claim does not recite any additional abstract ideas beyond those recited in the parent claim.
	Under step 2A prong 2 and 2B:
	The claim recites the additional element (“wherein the one or more first neural networks and are trained to perform segmentation of one or more images.”) which amounts to generally linking the use of the judicial exception to a particular technological environment (2106.05(h).
	Similarly claim 20, 22, 26, 28, 29 and 30 recites limitations which describe the particular field of use. No details about the functioning of the computer technology are claimed
	The claims therefore do not recite a practical application or significantly more.



Claim Rejections - 35 U.S.C. § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA  35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 4-10, 12-18, 21, 23-27 and 30 are rejected under 35 U.S.C. § 103 as being unpatentable over McMahan et al “Federated Learning of Deep Networks using Model Averaging”, further in view of Athiwaratkun et al. “Improving Consistency-Based Semi-Supervised Learning with Weight Averaging”.


Regarding claim 1
	McMahan teaches, A computer system comprising: one or more processors to cause one or more client systems to update one or more first neural  networks, wherein to update the one or more first  neural networks, the one or more client systems: obtain, from one or more server systems, one or more neural network  weights from one or more second neural networks previously  updated using previous neural network weights provided by the one or more client systems;  ( Introduction ¶02 “our approach Federated Learning, since the learning task is solved by a loose federation of participating devices (which we refer to as clients) which are coordinated by a central server. Each client has a local training dataset which is never uploaded to the server. Instead, each client computes an update to the current global model maintained by the server, and only this update is communicated”  pg 5 “That is, each client locally takes one step of gradient descent on the current model using its local data [using its copy of the model, i.e the first neural network], and the server then takes a weighted average of the resulting models [to get the second neural network]” Supervised tasks, like that described in the art involves labeling or inferring labels for data on a local device using a local copy of the model. training data includes labels such as digit labels a server orchestrates training of a global model with local clients which each have their one local training data set, thus each client has different training data and local copies of the model. Which is used to train the final global neural network, resulting in the second neural network)
	McMahan does not teach,  update the one or more first neural networks according to a difference  between first predictions of augmented versions of unlabeled training data  and second predictions of the unlabeled training data, wherein the first and second predictions are generated by the one or more neural networks using at least one of the one or more  second  neural network weights.
	Athiwaratkun however when addressing the use of consistency loss for training in a semi supervised setting teaches, and  update the one or more first neural networks according to a difference  between first predictions of augmented versions of unlabeled training data, wherein the first and second predictions are generated by the one or more neural networks  and second predictions of the unlabeled training data using at least one of the one or more  second  neural network weights. (pg 2-3 Section 2.1 “the consistency loss penalizes the difference between the student’s predicted probablities f(x’ ; w’f ) and the teacher g(x’; w’g )” pg 3 “The consistency of a model (student) can be measured against its own predictions (e.g. Π model) or predictions of a different teacher network… In the semi-supervised setting, we have access to labeled data… and unlabeled data… Given two perturbed inputs x’ , x’’ of x and the perturbed weights w_f and w_g , the consistency loss penalizes the difference between the student’s predicted probabilities… 
    PNG
    media_image1.png
    49
    319
    media_image1.png
    Greyscale
” the loss is used to update the student network using augmented or perturbed training input including labeled and unlabed data. As shown the loss is a difference in first and second predictions using first and second weights. The predictions denoted by the functions f() and g() are the predictions generated by the one or more neural networks.)

Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to the neural network training system of McMahan with the training system described by Athiwaratkun which utilizes unlabeled data to make predictions and is trained based on a difference.  One would have been motivated to make such a combination because as noted by Athiwaratkun “We observe that increasing the number of unlabeled examples improves the generalization performance of the Π model” (pg 5) and “Semi-supervised learning is crucial for reducing the dependency of deep learning on large labeled data…with consistency regularization models achieving the best known results” (pg 10)



Regarding Claim 2
	McMahan/Athiwaratkun teaches claim 1
	Further McMahan teaches, the one or more client systems communicate, to the one  more server systems, one or more second neural network weights from  the updated one or more first neural network the one or more server systems further train the one or more second neural networks based, at least in part, on the one or more second neural network weights. (pg 4 ¶03 “That is, each client locally takes one step of gradient descent on the current model using its local data, and the server then takes a weighted average of the resulting models. Once the algorithm is written this way, we can add more computation to each client by iterating the local update wk ← wk − ηOFk(wk) multiple times before the averaging step. We term this approach FederatedAveraging (or FedAvg)” and Algorithm 1 pg 5 
    PNG
    media_image2.png
    395
    414
    media_image2.png
    Greyscale
, as shown in the federated averaging algorithm each client runs the client update function which receives wt from the server and returns updated weights wk, or second set of weights, to the server. The server then performs a weighted average of the results to train or update the weights.)


Regarding Claim 4
	McMahan/Athiwaratkun teaches claim 2
Further McMahan teaches, wherein the one or more second neural network weights comprise information generated based, at least in part, on one or more of neural network weights and the … training data. ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset. At the beginning of each round, a random fraction C of clients is selected, and the server sends the current global algorithm state to each of these clients (e.g., the current model parameters)… Each selected client then performs local computation based on the global state and its local dataset, and sends an update to the server.” Initial weights, first set of weights, for a round are sent to the client. Then the client creates new weights which corresponds to information generated based on the received first weights.)
Further Athiwaratkun teaches, based, at least in part, on one or more of neural network weights and the unlabeled training data training data (pg 4 “we visualize how test accuracy changes along random rays starting at the solution found by the Π model trained with 4k labeled points and different amounts of additional unlabeled data” the training is based on weights as described above and also based on unlabeled training data.)

Regarding Claim 5
	McMahan/Athiwaratkun teaches claim 2
	Further McMahan teaches, wherein the server systems updates the one or more neural network weights based, at least in part, utilizing one or more third neural network weights, wherein the one or more third neural network weights are generated based, at least in part, by combining  the one or more  neural network weights and the one or more second neural network weights ( pg 3 column 2 ¶04 “and these models are averaged to produce the final global model.” pg 4 ¶03 “the server then takes a weighted average of the resulting models.” And Algorithm 1 
    PNG
    media_image3.png
    69
    318
    media_image3.png
    Greyscale
 as shown in the algorithm first weights are sent to clients, wt. then multiple sets wk are sent to the server. Finally wt+1 is a combination of the second sets of weights to form a third set of weights. The server network is updated according to these weights, and training progresses based on these weights.)

Regarding Claim 6
	McMahan/Athiwaratkun teaches claim 1
	Further McMahan teaches, wherein the computer system causes the one or more client systems to use the one or more first neural networks by transmitting, to the one or more client systems, the one ore more neural network weights comprising combined neural network weights received from the one or more client systems including the previous neural network weights ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset. At the beginning of each round, a random fraction C of clients is selected, and the server sends the current global algorithm state to each of these clients (e.g., the current model parameters)… Each selected client then performs local computation based on the global state and its local dataset, and sends an update to the server.” For each new round the weights sent to the clients are the combined weights from the previous round.)

Regarding Claim 7
	McMahan/Athiwaratkun teaches claim 1
	Further Athiwaratkun teaches, wherein the unlabeled training data further includes the unlabeled training data and at least labeled training data (pg 4 “we visualize how test accuracy changes along random rays starting at the solution found by the Π model trained with 4k labeled points and different amounts of additional unlabeled data” the training is based on weights as described above and also based on unlabeled training data.)

Regarding claim 8
	Claim 8 is rejected in view of  McMahan/Athiwaratkun for the reasons set forth in the rejection of claim 1
	

Regarding claim 9 
	McMahan/Athiwaratkun teaches claim 8
	Further McMahan teaches, transmitting, by the one or more client systems to the one or more server systems, one or more second neural network weights from the updated one or more first neural networks further training, by the one or more server systems, the one or more second neural networks based, at least in part, on the one or more second weights neural network weights. (pg 4 ¶03 “That is, each client locally takes one step of gradient descent on the current model using its local data, and the server then takes a weighted average of the resulting models. Once the algorithm is written this way, we can add more computation to each client by iterating the local update wk ← wk − ηOFk(wk) multiple times before the averaging step. We term this approach FederatedAveraging (or FedAvg)” and Algorithm 1 pg 5 
    PNG
    media_image2.png
    395
    414
    media_image2.png
    Greyscale
, as shown in the federated averaging algorithm each client runs the client update function which receives wt from the server and returns updated weights wk, or second set of weights, to the server. The server then performs a weighted average of the results to train or update the weights.)

Regarding claim 10
	McMahan/Athiwaratkun teaches claim 8
	Further Athiwaratkun teaches, wherein the one or more second neural networks are initialized using pre-trained neural network weights from one or more image classification models, wherein the pre-trained neural network weights include the previous neural network weights. (Section 4 pg 6 “. For the first 
    PNG
    media_image4.png
    25
    46
    media_image4.png
    Greyscale
 epochs the network is pre-trained using the cosine annealing schedule where the learning rate” the neural network is pre-trained. The corresponding weights associated with the pre-trained model, determined in the first epochs, are the previous neural network weights as claimed. Pg 6 “ We train a 13 layer CNN on CIFAR-10 on 50000 images”)




Regarding claim 12
	McMahan/Athiwaratkun teaches claim 9
	McMahan teaches, wherein the one or more second neural network weights are usable to train the one or more second neural networks by combining each of the one or more second neural network weights into one or more combined neural network weights and training the one or more second neural networks based, at least in part, on the one or more combined neural network weights. ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset. At the beginning of each round, a random fraction C of clients is selected, and the server sends the current global algorithm state to each of these clients (e.g., the current model parameters)… Each selected client then performs local computation based on the global state and its local dataset, and sends an update to the server.” For each new round the weights sent to the clients are the combined weights from the previous round.)

Regarding claim 13
	McMahan/Athiwaratkun teaches claim 12
	McMahan teaches, wherein the second one or more neural network weights combined to generate the one or more combined neural networks weights includes at least one first neural network weight ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset. At the beginning of each round, a random fraction C of clients is selected, and the server sends the current global algorithm state to each of these clients (e.g., the current model parameters)… Each selected client then performs local computation based on the global state and its local dataset, and sends an update to the server.” For each new round the weights sent to the clients are the combined weights from the previous round.)
	Athiwaratkun teaches, information from unsupervised training data and at least one second neural network weight comprising information from supervised training data (pg 5 figure 2 caption “Figure 2: CIFAR-10 with 4k labeled examples (and 46k unlabeled examples)”

Regarding claim 14
	McMahan/Athiwaratkun teaches claim 8
	Athiwaratkun teaches, wherein the unlabeled training data includes a first set of unlabeled training data of supervision and a second set of training data having an amount of supervision. (pg 5 figure 2 caption “Figure 2: CIFAR-10 with 4k labeled examples (and 46k unlabeled examples)”

Regarding claim 15
	Claim 15 is rejected in view of  McMahan/Athiwaratkun for the reasons set forth in the rejection of claim 1

Regarding Claim 17
	McMahan/Athiwaratkun teaches claim 15
	McMahan teaches, wherein a subset of other client computer systems include labeled data comprising one or more labels not available to the client computer system including the one or more first neural networks ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset.” Each client has a different data set that is local to that client thus unavailable to other clients. The data necessarily has at least one level of supervision. Pg 6 “where we first sort the data by digit label, divide it into 200 shards of size 300, and assign each of 100 clients 2 shards”)


Regarding claim 18
	McMahan/Athiwaratkun teaches claim 18
	McMahan teaches, wherein the other client computer systems implement one or more other neural networks trained based, at least in part, on the one or more neural network weights (see algorithm 1 pg 5) and supervised data, including the labeled data. (pg 6 “, where we first sort the data by digit label, divide it into 200 shards of size 300, and assign each of 100 clients 2 shards” )


Regarding Claim 21
	McMahan/Athiwaratkun claim 18
	McMahan teaches, wherein the supervised data includes one or more types of supervision to facilitate training of the one or more other neural networks.  (Examiner notes that McMahan as noted in the above rejection teaches using training data to train neural networks, all data use for training can be considered either supervised or unsupervised or a blend of both. Therefore any and all training data has at least one type or level of supervision.)

Regarding Claim 23
	Claim 23 is rejected in view of  McMahan/Athiwaratkun for the reasons set forth in the rejection of claim 1
	
Regarding Claim 24
	McMahan/Athiwaratkun teaches claim 23
	McMahan teaches, and transmitting, by the client computer system to the server computer system, the results including differences between the one or more neural network weights and one or more second neural network weights from the trained one or more first neural networks. (pg 4 ¶03 “That is, each client locally takes one step of gradient descent on the current model using its local data, and the server then takes a weighted average of the resulting models. Once the algorithm is written this way, we can add more computation to each client by iterating the local update wk ← wk − ηOFk(wk) multiple times before the averaging step. We term this approach FederatedAveraging (or FedAvg)” and Algorithm 1 pg 5 
    PNG
    media_image2.png
    395
    414
    media_image2.png
    Greyscale
, as shown in the federated averaging algorithm each client runs the client update function which receives wt from the server and returns updated weights wk, or second set of weights, to the server. The server then performs a weighted average of the results to train or update the weights.)

Regarding Claim 25
	McMahan/Athiwaratkun teaches claim 24
	McMahan teaches,  wherein the one or more second neural network weights comprise information combined from the trained one or more first neural networks and other client computer systems. ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset. At the beginning of each round, a random fraction C of clients is selected, and the server sends the current global algorithm state to each of these clients (e.g., the current model parameters)… Each selected client then performs local computation based on the global state and its local dataset, and sends an update to the server.” For each new round the weights sent to the clients are the combined weights from the previous round.)

Regarding Claim 26
	McMahan/Athiwaratkun teaches claim 23
	McMahan teaches, other client computer systems of the client computer system include access the labeled data comprising one or more levels of supervision unavailable to the client computer system including the one or more first neural networks. ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset.” Each client has a different data set that is local to that client thus unavailable to other clients. The data necessarily has at least one level of supervision. Pg 6 “where we first sort the data by digit label, divide it into 200 shards of size 300, and assign each of 100 clients 2 shards”)

Regarding Claim 27
	McMahan/Athiwaratkun teaches claim 24
	McMahan teaches, wherein the results further comprise one or more data values indicating information about the trained one or more first neural networks.  ( Introduction ¶02 “our approach Federated Learning, since the learning task is solved by a loose federation of participating devices (which we refer to as clients) which are coordinated by a central server. Each client has a local training dataset which is never uploaded to the server. Instead, each client computes an update to the current global model maintained by the server, and only this update is communicated”  a server orchestrates training of a global model with local clients which each have their one local training data set, thus each client has different training data. Further each client shares their results with the server)

Regarding Claim 30
	McMahan/Athiwaratkun teaches claim 23
	McMahan teaches, Wherein the one or more first neural networks are convolutional neural networks ( pg 6  “We study three model families on two datasets. The first two are for the MNIST digit recognition task… A CNN for MNIST with two 5x5 convolution layers” a CNN is a convolutional neural network.)

Claim(s) 11 are rejected under 35 U.S.C. § 103 as being unpatentable McMahan/Athiwaratkun further in view of Wu et al “Personalized Federated Learning for Intelligent IoT Applications: A Cloud-Edge Based Framework” hereinafter Wu.
Regarding Claim 11
	McMahan/Athiwaratkun claim 9
	McMahan teaches, wherein the computer system transmits the one or more second neural network weights (as noted in the rejection of the independent claim the weights are transmitted between the server and the client bi-directionally.)
McMahan does not explicitly teach, using a communication medium to facilitate network-based communications between one or more geographically diverse locations.  
Wu however when addressing the use of federated learning across geographical locations teaches, using a communication medium to facilitate network-based communications between one or more geographically diverse locations. (abstract “Recently, federated learning is proposed to train a globally shared model by exploiting a massive amount of user-generated data samples on IoT devices while preventing data leakage” pg 3 column 2 ¶01 “Therefore, each IoT device can choose to offload its intensive computing tasks to the edge (i.e., edge gateway at home, edge server at office, or 5G MEC server outdoors) via the wireless connections” Iot devices are connected through wireless connections which facilitate communications between a variety of locations such as the home, the office and outdoors.)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to the neural network training system of McMahan with the training system of Wu. One would have been motivated to make such a combination because both McMahan/Athiwaratkun and Wu use federated learning techniques which as noted by Wu allow communication of devices in different locations.


Claim(s) 19, 20, 22, 28 and 29 are rejected under 35 U.S.C. § 103 as being unpatentable over McMahan/Athiwaratkun further in view of Muller et al “Automated Chest CT Image Segmentation of COVID-19 Lung Infection based on 3D U-Net” hereinafter Muller.

Regarding Claim 19
McMahan/Athiwaratkun teaches claim 15
	McMahan/Athiwaratkun does not explicitly teach, wherein the one or more first neural networks and are trained to perform segmentation of one or more images.
	Muller however when addressing the use of neural networks for detection of lung infection teaches, wherein the one or more first neural networks and are trained to perform segmentation of one or more images. (Figure 6 and conclusion “In this paper, we developed and evaluated an approach for automated segmentation of COVID-19 infected regions in CT volumes…Our method focuses on on-the-fly generation of unique and random image patches for training…we utilized the standard 3D U-Net. We proved that our medical image segmentation pipeline is able to successfully train accurate and robust models” infected regions of the lungs are identified or segmented by the trained 3D U net. Training results in multiple versions of neural networks models thus including a first and second neural network)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the neural network training system of McMahan/Athiwaratkun with application of a neural network as described by Muller.  One would have been motivated to make such a combination because both McMahan/Athiwaratkun and Muller use and Muller notes that “In recent studies, medical image segmentation models based on neural networks proved powerful prediction capabilities and achieved similar results as radiologists regarding the performance” (Muller pg 2) and further the system of McMahan is designed to work with private user data which includes medical data.

Regarding Claim 20
	McMahan/Athiwaratkun/Muller teaches claim 19
	Further Muller teaches, wherein the one or more images are medical images from a computed tomography (CT) scan and the one or more first neural networks are trained to segment one or more image regions into a foreground and a background to facilitate detection of disease.  (Figure 6 and conclusion “In this paper, we developed and evaluated an approach for automated segmentation of COVID-19 infected regions in CT volumes…Our method focuses on on-the-fly generation of unique and random image patches for training…we utilized the standard 3D U-Net. We proved that our medical image segmentation pipeline is able to successfully train accurate and robust models” infected regions of the lungs are identified or segmented by the trained 3D U net. Segmentation as shown in the figure separates an identified foreground region from the background. Each new updated version of the model for each iteration is a first and second neural network)
	McMahan/Athiwaratkun and Muller are combined for the reasons set forth in the rejection of claim 19

Regarding Claim 22
	McMahan/Athiwaratkun teaches claim 15
	Muller however when addressing the use of neural networks for detection of lung infection teaches, wherein the one or more first neural networks comprise a 3D U-Net (Figure 6 and conclusion “In this paper, we developed and evaluated an approach for automated segmentation of COVID-19 infected regions in CT volumes…Our method focuses on on-the-fly generation of unique and random image patches for training…we utilized the standard 3D U-Net. We proved that our medical image segmentation pipeline is able to successfully train accurate and robust models” infected regions of the lungs are identified or segmented by the trained 3D U net.)
McMahan/Athiwaratkun and Muller are combined for the reasons set forth in the rejection of claim 19

Regarding Claim 28
	McMahan/Athiwaratkun teaches claim 24
	Muller when combined with McMahan/Athiwaratkun however when addressing the use of neural networks for detection of lung infection teaches, wherein the unlabeled data includes 3D image data and the one or more first neural networks are trained to perform segmentation of the 3D image data (Figure 6 and conclusion “In this paper, we developed and evaluated an approach for automated segmentation of COVID-19 infected regions in CT volumes…Our method focuses on on-the-fly generation of unique and random image patches for training…we utilized the standard 3D U-Net. We proved that our medical image segmentation pipeline is able to successfully train accurate and robust models” infected regions of the lungs are identified or segmented by the trained 3D U net. Examiner notes, as demonstrated in the preceding rejections that Athiwaratkun describes a semi-supervised image processing neural network using unlabeled image data. Under BRI, unlabeled data is data which included both labeled and unlabeled data. This is at least supported by claim 14 which described the unlabeled data as including both unlabeled data and semi-supervised data.)
McMahan/Athiwaratkun and Muller are combined for the reasons set forth in the rejection of claim 19

Regarding Claim 29
	McMahan/Athiwaratkun/Muller teaches claim 28
	Further Muller teaches,  wherein the one or more first neural networks trained to segment one or more image regions into a foreground and a background the one or more foreground regions indicating information usable to facilitate detection of one or more diseases. (Figure 6 and conclusion “In this paper, we developed and evaluated an approach for automated segmentation of COVID-19 infected regions in CT volumes…Our method focuses on on-the-fly generation of unique and random image patches for training…we utilized the standard 3D U-Net. We proved that our medical image segmentation pipeline is able to successfully train accurate and robust models” infected regions of the lungs are identified or segmented by the trained 3D U net.)
McMahan/Athiwaratkun and Muller are combined for the reasons set forth in the rejection of claim 19
Conclusion

Prior art:Bonawitz et al “Practical Secure Aggregation for Federated Learning on User-Held Data” describes a method for sharing not only weight gradients between client and server, but also data set size information.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.R.G./
Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122
Read full office action
Prosecution Timeline

Sep 04, 2020
Application Filed
Nov 02, 2021
Response after Non-Final Action
Feb 27, 2023
Non-Final Rejection — §101, §103
Jul 13, 2023
Applicant Interview (Telephonic)
Jul 13, 2023
Examiner Interview Summary
Sep 11, 2023
Response Filed
Oct 16, 2023
Final Rejection — §101, §103
Dec 12, 2023
Applicant Interview (Telephonic)
Dec 12, 2023
Examiner Interview Summary
Apr 24, 2024
Request for Continued Examination
Apr 25, 2024
Response after Non-Final Action
Oct 15, 2024
Non-Final Rejection — §101, §103
Jan 10, 2025
Interview Requested
Jan 17, 2025
Examiner Interview Summary
Jan 17, 2025
Applicant Interview (Telephonic)
Feb 28, 2025
Examiner Interview Summary
Feb 28, 2025
Applicant Interview (Telephonic)
Mar 31, 2025
Response Filed
Apr 17, 2025
Final Rejection — §101, §103
Aug 22, 2025
Response after Non-Final Action
Sep 19, 2025
Request for Continued Examination
Oct 02, 2025
Response after Non-Final Action
Dec 01, 2025
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/240,514
Patent 12566962
DITHERED QUANTIZATION OF PARAMETERS DURING TRAINING WITH A MACHINE LEARNING TOOL
2y 5m to grant Granted Mar 03, 2026
17/121,871
Patent 12566983
MACHINE LEARNING CLASSIFIERS PREDICTION CONFIDENCE AND EXPLANATION
2y 5m to grant Granted Mar 03, 2026
17/025,845
Patent 12554977
DEEP NEURAL NETWORK FOR MATCHING ENTITIES IN SEMI-STRUCTURED DATA
2y 5m to grant Granted Feb 17, 2026
16/537,752
Patent 12443829
NEURAL NETWORK PROCESSING METHOD AND APPARATUS BASED ON NESTED BIT REPRESENTATION
2y 5m to grant Granted Oct 14, 2025
17/029,290
Patent 12443868
QUANTUM ERROR MITIGATION USING HARDWARE-FRIENDLY PROBABILISTIC ERROR CORRECTION
2y 5m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
47%
Grant Probability
79%
With Interview (+32.1%)
4y 2m
Median Time to Grant
High
PTA Risk
Based on 91 resolved cases by this examiner. Grant probability derived from career allow rate.
GLOBAL FEDERATED TRAINING FOR NEURAL NETWORKS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email