DETAILED ACTION
This action is responsive to the communication filed on 08/22/2025. Claims 1, 2, 4-30 are pending in the case. Claims 1, 8, 15 and 23 are independent claims. Claims 1, 8, 15 and 23 have been amended.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 08/22/2025 has been entered.
Response to Arguments
Applicant's arguments filed 08/22/2025 have been fully considered but they are not persuasive.
With respect to claims being patentable over the cited reference:
Applicant appears to argue that the claim requires that “the first predictions and the second predictions are both made with the one or more neural networks using at least one of the one or more neural network weights
Examiners notes the cited art appears to teach the claims limitation. As highlighted in the rejection the student and teacher network (i.e one or more networks) together make first and second predictions according using at least one or more weights which includes the perturbed weights used in the art. Therefore, both predictions are made with one or more neural networks.
Applicant appears to point out that that the first predictions are made with augmented version of unlabeled training data, while the second predictions are made with the unlabeled training data. Further the Cited art used both perturbed inputs and perturbed outputs.
Examiner highlights that first and second predictions made with augmented versions of unlabeled training data corresponds to the claim. Examiner notes that claim requires that the second predictions are “of unlabeled training data”. The claim does not disallow perturbed or “augmented” training data to be included in the “unlabeled training data” claimed.
Therefore, the rejection is maintained.
With respect to the rejection under 35 U.S.C. 101
Applicant argues that the claim as a whole integrate the exception into a practical application because the claims improve the functioning of a computer
Examiner notes that any such improvement should be reflected in the additional elements. As evidenced by the rejection the improvement is not reflected. Applicant provides no reason or suggestion as to how the features of the claim reflect or integrate an improvement beyond a reference to the specification that semi-supervised learning improves neural network generalizability. This amounts to claiming the idea of a solution. The disclosure does not describe how semi supervision improves the technology. While the specification explains augmentation ais performed via addition of gaussian noise it is not clear from the disclosure or the arguments provided how addition of Gaussian noise affects the supposed improvement to neural network generalizability.
Even if it were explained how addition of Gaussian noise affects the improvement such data augmentation is not reflected in the claims. Presently, the augmentation process is not claimed. At most the claims describe updating neural networks based on a difference of predictions of neural networks which itself is the recited abstract idea, and as such can not be what reflects the improvement.
Applicant argues further that the claims are not well-understood routine and conventional (WURC).
Examiner notes that this analysis overlaps with the consideration in step 2A prong 2 with respect to additional elements which are identified as insignificant extra solution activity. The particular additional elements identified as such is explained as WURC because obtaining data in an unspecified manner, i.e mere data gathering, and is recognized in the MPEP as WURC.
The claim (at least claim 1) does not include any other additional elements beyond those which apply the judicial exception (see MPEP 2106.05(f)) and as such the limitations are not indicative of significantly more.
Therefore, the rejection is maintained.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-2, 4-30 are rejected under 35 U.S.C. 101 because the claims are directed to an abstract idea without significantly more.
Regarding Claim 1/8/15/23
Each of these claims are independent claims.
Claim 1 is directed to a machine. Claim 8 is directed to a process. Claim 15 is directed to a machine. Claim 23 is directed to a process. Each of these claims are directed therefore to a statutory category. The claims recites the following limitations or similar limitations: “to update one or more first neural networks, wherein to update the one or more first neural networks…and update the one or more first neural networks according to a difference between first predictions of augmented versions of unlabeled training data and second predictions of the unlabeled training data, wherein the first and second predictions are generated by the one or more neural networks using at least one of the one or more second neural network weights” Under Step 2A Prong 1, these limitations correspond to a mental evaluation. Updating neural network weights via computation of a difference between values describes a mental evaluation. While the update is based on predictions generated by the one or more neural networks the update nevertheless amounts to a mental evaluation.
The claims therefore recite an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the limitations from which the claim depends on recite the additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. (“A computer system …one or more processors to cause one or more client systems… the one or more client systems… from one or more server systems… one or more processors… neural networks”) See MPEP 2106.05(f). Additionally, the claims recite the following limitations: (“obtain… one or more neural network weights from one or more second neural networks previously updated using previous neural network weights provided by the one or more client systems”) which are insignificant extra solution activity (see MPEP 2106.05(g)). Such limitations amount to mere data gathering because “the limitation amounts to necessary data gathering and outputting, (i.e., all uses of the recited judicial exception require such data gathering or data output)” because all instances of updating neural network weights requires first obtaining the information to update.
The claim therefore is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Further, the additional elements cited are insignificant extra-solution activities that are considered well-understood, routine, conventional activities. Examiner notes that communicating/transmitting weights from a source system to a target system amounts to receiving or transmitting data over a network (MPEP 2106.05(d)(II)(i). According to MPEP 2106.05(d)(II)(i), “The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner”. As such, the insignificant extra-solution activities are considered well-understood, routine, conventional activities. Thus rendering insignificant extra-solution activities in the claim routine and conventional. As such, the insignificant extra-solution activities are considered well-understood, routine, conventional activities. Therefore, the claim is not patent eligible.
Regarding Claim 2/9/11/24
Claim 2 is directed to a system. Claim 9 is directed to a process.
The claim does not recite any additional abstract ideas beyond those recited in the parent claim.
Under step 2A prong 2 and 2B:
The claim recites additional elements (“the one or more client systems communicate, to the one more server systems, one or more second neural network weights from the updated one or more first neural network the one or more server systems”) which amount to insignificant extra solution activity and mere data gathering as identified in the parent claim (2106.05(g)).
The claim recites the additional element (“further train the one or more second neural networks based, at least in part, on the one or more second neural network weights”) which amounts to applying the judicial exception of updating weights to generic computer machinery (2106.05(f)). The claim does not provide any details of the training beyond being based on the abstract idea.
Similarly, Claim 9 describes transmitting and training which similar to the limitations described in claim 2. These limitations are analyzed in the same way.
Similarly, Claim 11 describes transmitting similar to the limitations described in claim 2. These limitations are analyzed in the same way.
Similarly, Claim 24 describes transmitting similar to the limitations described in claim 2. These limitations are analyzed in the same way.
The claim therefore does not recite a practical application or significantly more.
Regarding claim 4/5/7/10/13/14/17/21/25/27
Claim 4, 5, 7, 17, 21 are directed to a system.
Claim 10, 13, 14, 25, 27 are directed to a process.
The claims recite descriptive details of the weights or other data analyzed/updated via the abstract idea without reciting any additional elements beyond those identified in the parent claim.
Therefore the claim does not recite a practical application or significantly more.
Regarding claim 6
The claim is directed to a system.
The claim does not recite any additional abstract ideas beyond those recited in the parent claim.
Under step 2A prong 2 and 2B:
The claim recites additional elements (“by transmitting, to the one or more client systems, the one or more neural network weights comprising combined neural network weights received from the one or more client systems including the previous neural network weights”) which amount to insignificant extra solution activity and mere data gathering as identified in the parent claim (2106.05(g)).
The claim recites the additional element (“causes the one or more client systems to use the one or more first neural networks”) which amounts to applying the judicial exception of updating weights to generic computer machinery. The claim does not provide any details of the functioning or the processor beyond its application (2106.05(f)).
The claim therefore does not recite a practical application or significantly more.
Regarding Claim 12
Claim 12 is directed to a process.
The claim recites “wherein the one or more second neural network weights are usable to train the one or more second neural networks by combining each of the one or more second neural network weights into one or more combined neural network weights” which further describes the weights updated as part of the abstract idea identified in the parent claim.
Under step 2A prong 2 and 2B:
The claim recites additional elements (“the one or more client systems communicate, to the one more server systems, one or more second neural network weights from the updated one or more first neural network the one or more server systems”) which amount to insignificant extra solution activity and mere data gathering as identified in the parent claim (2106.05(g)).
The claim recites the additional element (“and training the one or more second neural networks based, at least in part, on the one or more combined neural network weights.”) which amounts to applying the judicial exception of updating weights to generic computer machinery. (2106.05(f)). The claim does not provide any details of the training beyond being based on the abstract idea.
The claim therefore does not recite a practical application or significantly more.
Regarding Claim 18
Claim 18 is directed to a system
The claim does not recite any additional abstract ideas beyond those recited in the parent claim.
Under step 2A prong 2 and 2B:
The claim recites the additional element (“wherein the other client computer systems implement one or more other neural networks trained based, at least in part, on the one or more neural network weights and supervised data, including the labeled data”) which amounts to applying the judicial exception of updating weights to generic computer machinery (2106.05(f)). The claim does not provide any details of the functioning of the neural network.
The claim therefore does not recite a practical application or significantly more.
Regarding Claim 19/20/22/26/28/29/30
Claims are directed to a statutory category.
The claim does not recite any additional abstract ideas beyond those recited in the parent claim.
Under step 2A prong 2 and 2B:
The claim recites the additional element (“wherein the one or more first neural networks and are trained to perform segmentation of one or more images.”) which amounts to generally linking the use of the judicial exception to a particular technological environment (2106.05(h).
Similarly claim 20, 22, 26, 28, 29 and 30 recites limitations which describe the particular field of use. No details about the functioning of the computer technology are claimed
The claims therefore do not recite a practical application or significantly more.
Claim Rejections - 35 U.S.C. § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA 35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-2, 4-10, 12-18, 21, 23-27 and 30 are rejected under 35 U.S.C. § 103 as being unpatentable over McMahan et al “Federated Learning of Deep Networks using Model Averaging”, further in view of Athiwaratkun et al. “Improving Consistency-Based Semi-Supervised Learning with Weight Averaging”.
Regarding claim 1
McMahan teaches, A computer system comprising: one or more processors to cause one or more client systems to update one or more first neural networks, wherein to update the one or more first neural networks, the one or more client systems: obtain, from one or more server systems, one or more neural network weights from one or more second neural networks previously updated using previous neural network weights provided by the one or more client systems; ( Introduction ¶02 “our approach Federated Learning, since the learning task is solved by a loose federation of participating devices (which we refer to as clients) which are coordinated by a central server. Each client has a local training dataset which is never uploaded to the server. Instead, each client computes an update to the current global model maintained by the server, and only this update is communicated” pg 5 “That is, each client locally takes one step of gradient descent on the current model using its local data [using its copy of the model, i.e the first neural network], and the server then takes a weighted average of the resulting models [to get the second neural network]” Supervised tasks, like that described in the art involves labeling or inferring labels for data on a local device using a local copy of the model. training data includes labels such as digit labels a server orchestrates training of a global model with local clients which each have their one local training data set, thus each client has different training data and local copies of the model. Which is used to train the final global neural network, resulting in the second neural network)
McMahan does not teach, update the one or more first neural networks according to a difference between first predictions of augmented versions of unlabeled training data and second predictions of the unlabeled training data, wherein the first and second predictions are generated by the one or more neural networks using at least one of the one or more second neural network weights.
Athiwaratkun however when addressing the use of consistency loss for training in a semi supervised setting teaches, and update the one or more first neural networks according to a difference between first predictions of augmented versions of unlabeled training data, wherein the first and second predictions are generated by the one or more neural networks and second predictions of the unlabeled training data using at least one of the one or more second neural network weights. (pg 2-3 Section 2.1 “the consistency loss penalizes the difference between the student’s predicted probablities f(x’ ; w’f ) and the teacher g(x’; w’g )” pg 3 “The consistency of a model (student) can be measured against its own predictions (e.g. Π model) or predictions of a different teacher network… In the semi-supervised setting, we have access to labeled data… and unlabeled data… Given two perturbed inputs x’ , x’’ of x and the perturbed weights w_f and w_g , the consistency loss penalizes the difference between the student’s predicted probabilities…
PNG
media_image1.png
49
319
media_image1.png
Greyscale
” the loss is used to update the student network using augmented or perturbed training input including labeled and unlabed data. As shown the loss is a difference in first and second predictions using first and second weights. The predictions denoted by the functions f() and g() are the predictions generated by the one or more neural networks.)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to the neural network training system of McMahan with the training system described by Athiwaratkun which utilizes unlabeled data to make predictions and is trained based on a difference. One would have been motivated to make such a combination because as noted by Athiwaratkun “We observe that increasing the number of unlabeled examples improves the generalization performance of the Π model” (pg 5) and “Semi-supervised learning is crucial for reducing the dependency of deep learning on large labeled data…with consistency regularization models achieving the best known results” (pg 10)
Regarding Claim 2
McMahan/Athiwaratkun teaches claim 1
Further McMahan teaches, the one or more client systems communicate, to the one more server systems, one or more second neural network weights from the updated one or more first neural network the one or more server systems further train the one or more second neural networks based, at least in part, on the one or more second neural network weights. (pg 4 ¶03 “That is, each client locally takes one step of gradient descent on the current model using its local data, and the server then takes a weighted average of the resulting models. Once the algorithm is written this way, we can add more computation to each client by iterating the local update wk ← wk − ηOFk(wk) multiple times before the averaging step. We term this approach FederatedAveraging (or FedAvg)” and Algorithm 1 pg 5
PNG
media_image2.png
395
414
media_image2.png
Greyscale
, as shown in the federated averaging algorithm each client runs the client update function which receives wt from the server and returns updated weights wk, or second set of weights, to the server. The server then performs a weighted average of the results to train or update the weights.)
Regarding Claim 4
McMahan/Athiwaratkun teaches claim 2
Further McMahan teaches, wherein the one or more second neural network weights comprise information generated based, at least in part, on one or more of neural network weights and the … training data. ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset. At the beginning of each round, a random fraction C of clients is selected, and the server sends the current global algorithm state to each of these clients (e.g., the current model parameters)… Each selected client then performs local computation based on the global state and its local dataset, and sends an update to the server.” Initial weights, first set of weights, for a round are sent to the client. Then the client creates new weights which corresponds to information generated based on the received first weights.)
Further Athiwaratkun teaches, based, at least in part, on one or more of neural network weights and the unlabeled training data training data (pg 4 “we visualize how test accuracy changes along random rays starting at the solution found by the Π model trained with 4k labeled points and different amounts of additional unlabeled data” the training is based on weights as described above and also based on unlabeled training data.)
Regarding Claim 5
McMahan/Athiwaratkun teaches claim 2
Further McMahan teaches, wherein the server systems updates the one or more neural network weights based, at least in part, utilizing one or more third neural network weights, wherein the one or more third neural network weights are generated based, at least in part, by combining the one or more neural network weights and the one or more second neural network weights ( pg 3 column 2 ¶04 “and these models are averaged to produce the final global model.” pg 4 ¶03 “the server then takes a weighted average of the resulting models.” And Algorithm 1
PNG
media_image3.png
69
318
media_image3.png
Greyscale
as shown in the algorithm first weights are sent to clients, wt. then multiple sets wk are sent to the server. Finally wt+1 is a combination of the second sets of weights to form a third set of weights. The server network is updated according to these weights, and training progresses based on these weights.)
Regarding Claim 6
McMahan/Athiwaratkun teaches claim 1
Further McMahan teaches, wherein the computer system causes the one or more client systems to use the one or more first neural networks by transmitting, to the one or more client systems, the one ore more neural network weights comprising combined neural network weights received from the one or more client systems including the previous neural network weights ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset. At the beginning of each round, a random fraction C of clients is selected, and the server sends the current global algorithm state to each of these clients (e.g., the current model parameters)… Each selected client then performs local computation based on the global state and its local dataset, and sends an update to the server.” For each new round the weights sent to the clients are the combined weights from the previous round.)
Regarding Claim 7
McMahan/Athiwaratkun teaches claim 1
Further Athiwaratkun teaches, wherein the unlabeled training data further includes the unlabeled training data and at least labeled training data (pg 4 “we visualize how test accuracy changes along random rays starting at the solution found by the Π model trained with 4k labeled points and different amounts of additional unlabeled data” the training is based on weights as described above and also based on unlabeled training data.)
Regarding claim 8
Claim 8 is rejected in view of McMahan/Athiwaratkun for the reasons set forth in the rejection of claim 1
Regarding claim 9
McMahan/Athiwaratkun teaches claim 8
Further McMahan teaches, transmitting, by the one or more client systems to the one or more server systems, one or more second neural network weights from the updated one or more first neural networks further training, by the one or more server systems, the one or more second neural networks based, at least in part, on the one or more second weights neural network weights. (pg 4 ¶03 “That is, each client locally takes one step of gradient descent on the current model using its local data, and the server then takes a weighted average of the resulting models. Once the algorithm is written this way, we can add more computation to each client by iterating the local update wk ← wk − ηOFk(wk) multiple times before the averaging step. We term this approach FederatedAveraging (or FedAvg)” and Algorithm 1 pg 5
PNG
media_image2.png
395
414
media_image2.png
Greyscale
, as shown in the federated averaging algorithm each client runs the client update function which receives wt from the server and returns updated weights wk, or second set of weights, to the server. The server then performs a weighted average of the results to train or update the weights.)
Regarding claim 10
McMahan/Athiwaratkun teaches claim 8
Further Athiwaratkun teaches, wherein the one or more second neural networks are initialized using pre-trained neural network weights from one or more image classification models, wherein the pre-trained neural network weights include the previous neural network weights. (Section 4 pg 6 “. For the first
PNG
media_image4.png
25
46
media_image4.png
Greyscale
epochs the network is pre-trained using the cosine annealing schedule where the learning rate” the neural network is pre-trained. The corresponding weights associated with the pre-trained model, determined in the first epochs, are the previous neural network weights as claimed. Pg 6 “ We train a 13 layer CNN on CIFAR-10 on 50000 images”)
Regarding claim 12
McMahan/Athiwaratkun teaches claim 9
McMahan teaches, wherein the one or more second neural network weights are usable to train the one or more second neural networks by combining each of the one or more second neural network weights into one or more combined neural network weights and training the one or more second neural networks based, at least in part, on the one or more combined neural network weights. ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset. At the beginning of each round, a random fraction C of clients is selected, and the server sends the current global algorithm state to each of these clients (e.g., the current model parameters)… Each selected client then performs local computation based on the global state and its local dataset, and sends an update to the server.” For each new round the weights sent to the clients are the combined weights from the previous round.)
Regarding claim 13
McMahan/Athiwaratkun teaches claim 12
McMahan teaches, wherein the second one or more neural network weights combined to generate the one or more combined neural networks weights includes at least one first neural network weight ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset. At the beginning of each round, a random fraction C of clients is selected, and the server sends the current global algorithm state to each of these clients (e.g., the current model parameters)… Each selected client then performs local computation based on the global state and its local dataset, and sends an update to the server.” For each new round the weights sent to the clients are the combined weights from the previous round.)
Athiwaratkun teaches, information from unsupervised training data and at least one second neural network weight comprising information from supervised training data (pg 5 figure 2 caption “Figure 2: CIFAR-10 with 4k labeled examples (and 46k unlabeled examples)”
Regarding claim 14
McMahan/Athiwaratkun teaches claim 8
Athiwaratkun teaches, wherein the unlabeled training data includes a first set of unlabeled training data of supervision and a second set of training data having an amount of supervision. (pg 5 figure 2 caption “Figure 2: CIFAR-10 with 4k labeled examples (and 46k unlabeled examples)”
Regarding claim 15
Claim 15 is rejected in view of McMahan/Athiwaratkun for the reasons set forth in the rejection of claim 1
Regarding Claim 17
McMahan/Athiwaratkun teaches claim 15
McMahan teaches, wherein a subset of other client computer systems include labeled data comprising one or more labels not available to the client computer system including the one or more first neural networks ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset.” Each client has a different data set that is local to that client thus unavailable to other clients. The data necessarily has at least one level of supervision. Pg 6 “where we first sort the data by digit label, divide it into 200 shards of size 300, and assign each of 100 clients 2 shards”)
Regarding claim 18
McMahan/Athiwaratkun teaches claim 18
McMahan teaches, wherein the other client computer systems implement one or more other neural networks trained based, at least in part, on the one or more neural network weights (see algorithm 1 pg 5) and supervised data, including the labeled data. (pg 6 “, where we first sort the data by digit label, divide it into 200 shards of size 300, and assign each of 100 clients 2 shards” )
Regarding Claim 21
McMahan/Athiwaratkun claim 18
McMahan teaches, wherein the supervised data includes one or more types of supervision to facilitate training of the one or more other neural networks. (Examiner notes that McMahan as noted in the above rejection teaches using training data to train neural networks, all data use for training can be considered either supervised or unsupervised or a blend of both. Therefore any and all training data has at least one type or level of supervision.)
Regarding Claim 23
Claim 23 is rejected in view of McMahan/Athiwaratkun for the reasons set forth in the rejection of claim 1
Regarding Claim 24
McMahan/Athiwaratkun teaches claim 23
McMahan teaches, and transmitting, by the client computer system to the server computer system, the results including differences between the one or more neural network weights and one or more second neural network weights from the trained one or more first neural networks. (pg 4 ¶03 “That is, each client locally takes one step of gradient descent on the current model using its local data, and the server then takes a weighted average of the resulting models. Once the algorithm is written this way, we can add more computation to each client by iterating the local update wk ← wk − ηOFk(wk) multiple times before the averaging step. We term this approach FederatedAveraging (or FedAvg)” and Algorithm 1 pg 5
PNG
media_image2.png
395
414
media_image2.png
Greyscale
, as shown in the federated averaging algorithm each client runs the client update function which receives wt from the server and returns updated weights wk, or second set of weights, to the server. The server then performs a weighted average of the results to train or update the weights.)
Regarding Claim 25
McMahan/Athiwaratkun teaches claim 24
McMahan teaches, wherein the one or more second neural network weights comprise information combined from the trained one or more first neural networks and other client computer systems. ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset. At the beginning of each round, a random fraction C of clients is selected, and the server sends the current global algorithm state to each of these clients (e.g., the current model parameters)… Each selected client then performs local computation based on the global state and its local dataset, and sends an update to the server.” For each new round the weights sent to the clients are the combined weights from the previous round.)
Regarding Claim 26
McMahan/Athiwaratkun teaches claim 23
McMahan teaches, other client computer systems of the client computer system include access the labeled data comprising one or more levels of supervision unavailable to the client computer system including the one or more first neural networks. ( pg 2 column 2 ¶05 “We assume a synchronous update scheme that proceeds in rounds of communication. There is a fixed set of K clients, each with a fixed local dataset.” Each client has a different data set that is local to that client thus unavailable to other clients. The data necessarily has at least one level of supervision. Pg 6 “where we first sort the data by digit label, divide it into 200 shards of size 300, and assign each of 100 clients 2 shards”)
Regarding Claim 27
McMahan/Athiwaratkun teaches claim 24
McMahan teaches, wherein the results further comprise one or more data values indicating information about the trained one or more first neural networks. ( Introduction ¶02 “our approach Federated Learning, since the learning task is solved by a loose federation of participating devices (which we refer to as clients) which are coordinated by a central server. Each client has a local training dataset which is never uploaded to the server. Instead, each client computes an update to the current global model maintained by the server, and only this update is communicated” a server orchestrates training of a global model with local clients which each have their one local training data set, thus each client has different training data. Further each client shares their results with the server)
Regarding Claim 30
McMahan/Athiwaratkun teaches claim 23
McMahan teaches, Wherein the one or more first neural networks are convolutional neural networks ( pg 6 “We study three model families on two datasets. The first two are for the MNIST digit recognition task… A CNN for MNIST with two 5x5 convolution layers” a CNN is a convolutional neural network.)
Claim(s) 11 are rejected under 35 U.S.C. § 103 as being unpatentable McMahan/Athiwaratkun further in view of Wu et al “Personalized Federated Learning for Intelligent IoT Applications: A Cloud-Edge Based Framework” hereinafter Wu.
Regarding Claim 11
McMahan/Athiwaratkun claim 9
McMahan teaches, wherein the computer system transmits the one or more second neural network weights (as noted in the rejection of the independent claim the weights are transmitted between the server and the client bi-directionally.)
McMahan does not explicitly teach, using a communication medium to facilitate network-based communications between one or more geographically diverse locations.
Wu however when addressing the use of federated learning across geographical locations teaches, using a communication medium to facilitate network-based communications between one or more geographically diverse locations. (abstract “Recently, federated learning is proposed to train a globally shared model by exploiting a massive amount of user-generated data samples on IoT devices while preventing data leakage” pg 3 column 2 ¶01 “Therefore, each IoT device can choose to offload its intensive computing tasks to the edge (i.e., edge gateway at home, edge server at office, or 5G MEC server outdoors) via the wireless connections” Iot devices are connected through wireless connections which facilitate communications between a variety of locations such as the home, the office and outdoors.)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to the neural network training system of McMahan with the training system of Wu. One would have been motivated to make such a combination because both McMahan/Athiwaratkun and Wu use federated learning techniques which as noted by Wu allow communication of devices in different locations.
Claim(s) 19, 20, 22, 28 and 29 are rejected under 35 U.S.C. § 103 as being unpatentable over McMahan/Athiwaratkun further in view of Muller et al “Automated Chest CT Image Segmentation of COVID-19 Lung Infection based on 3D U-Net” hereinafter Muller.
Regarding Claim 19
McMahan/Athiwaratkun teaches claim 15
McMahan/Athiwaratkun does not explicitly teach, wherein the one or more first neural networks and are trained to perform segmentation of one or more images.
Muller however when addressing the use of neural networks for detection of lung infection teaches, wherein the one or more first neural networks and are trained to perform segmentation of one or more images. (Figure 6 and conclusion “In this paper, we developed and evaluated an approach for automated segmentation of COVID-19 infected regions in CT volumes…Our method focuses on on-the-fly generation of unique and random image patches for training…we utilized the standard 3D U-Net. We proved that our medical image segmentation pipeline is able to successfully train accurate and robust models” infected regions of the lungs are identified or segmented by the trained 3D U net. Training results in multiple versions of neural networks models thus including a first and second neural network)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the neural network training system of McMahan/Athiwaratkun with application of a neural network as described by Muller. One would have been motivated to make such a combination because both McMahan/Athiwaratkun and Muller use and Muller notes that “In recent studies, medical image segmentation models based on neural networks proved powerful prediction capabilities and achieved similar results as radiologists regarding the performance” (Muller pg 2) and further the system of McMahan is designed to work with private user data which includes medical data.
Regarding Claim 20
McMahan/Athiwaratkun/Muller teaches claim 19
Further Muller teaches, wherein the one or more images are medical images from a computed tomography (CT) scan and the one or more first neural networks are trained to segment one or more image regions into a foreground and a background to facilitate detection of disease. (Figure 6 and conclusion “In this paper, we developed and evaluated an approach for automated segmentation of COVID-19 infected regions in CT volumes…Our method focuses on on-the-fly generation of unique and random image patches for training…we utilized the standard 3D U-Net. We proved that our medical image segmentation pipeline is able to successfully train accurate and robust models” infected regions of the lungs are identified or segmented by the trained 3D U net. Segmentation as shown in the figure separates an identified foreground region from the background. Each new updated version of the model for each iteration is a first and second neural network)
McMahan/Athiwaratkun and Muller are combined for the reasons set forth in the rejection of claim 19
Regarding Claim 22
McMahan/Athiwaratkun teaches claim 15
Muller however when addressing the use of neural networks for detection of lung infection teaches, wherein the one or more first neural networks comprise a 3D U-Net (Figure 6 and conclusion “In this paper, we developed and evaluated an approach for automated segmentation of COVID-19 infected regions in CT volumes…Our method focuses on on-the-fly generation of unique and random image patches for training…we utilized the standard 3D U-Net. We proved that our medical image segmentation pipeline is able to successfully train accurate and robust models” infected regions of the lungs are identified or segmented by the trained 3D U net.)
McMahan/Athiwaratkun and Muller are combined for the reasons set forth in the rejection of claim 19
Regarding Claim 28
McMahan/Athiwaratkun teaches claim 24
Muller when combined with McMahan/Athiwaratkun however when addressing the use of neural networks for detection of lung infection teaches, wherein the unlabeled data includes 3D image data and the one or more first neural networks are trained to perform segmentation of the 3D image data (Figure 6 and conclusion “In this paper, we developed and evaluated an approach for automated segmentation of COVID-19 infected regions in CT volumes…Our method focuses on on-the-fly generation of unique and random image patches for training…we utilized the standard 3D U-Net. We proved that our medical image segmentation pipeline is able to successfully train accurate and robust models” infected regions of the lungs are identified or segmented by the trained 3D U net. Examiner notes, as demonstrated in the preceding rejections that Athiwaratkun describes a semi-supervised image processing neural network using unlabeled image data. Under BRI, unlabeled data is data which included both labeled and unlabeled data. This is at least supported by claim 14 which described the unlabeled data as including both unlabeled data and semi-supervised data.)
McMahan/Athiwaratkun and Muller are combined for the reasons set forth in the rejection of claim 19
Regarding Claim 29
McMahan/Athiwaratkun/Muller teaches claim 28
Further Muller teaches, wherein the one or more first neural networks trained to segment one or more image regions into a foreground and a background the one or more foreground regions indicating information usable to facilitate detection of one or more diseases. (Figure 6 and conclusion “In this paper, we developed and evaluated an approach for automated segmentation of COVID-19 infected regions in CT volumes…Our method focuses on on-the-fly generation of unique and random image patches for training…we utilized the standard 3D U-Net. We proved that our medical image segmentation pipeline is able to successfully train accurate and robust models” infected regions of the lungs are identified or segmented by the trained 3D U net.)
McMahan/Athiwaratkun and Muller are combined for the reasons set forth in the rejection of claim 19
Conclusion
Prior art:Bonawitz et al “Practical Secure Aggregation for Federated Learning on User-Held Data” describes a method for sharing not only weight gradients between client and server, but also data set size information.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.R.G./
Examiner, Art Unit 2122
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122