Last updated: May 04, 2026
Application No. 17/449,165
ADAPTIVE AGGREGATION FOR FEDERATED LEARNING

Non-Final OA §103
Filed
Sep 28, 2021
Examiner
ALGHAZZY, SHAMCY
Art Unit
2128
Tech Center
2100 — Computer Architecture & Software
Assignee
Siemens Healthineers AG
OA Round
3 (Non-Final)
Interview Optional

— +2.0% interview lift. Interview lift (+2.0%) is below the 15.0% threshold. A written response is recommended.
Based on 63 resolved cases, 2023–2026
Examiner Intelligence

ALGHAZZY, SHAMCY View full profile →
Grants 49% of resolved cases
Career Allowance Rate
31 granted / 63 resolved
-5.8% vs TC avg
Minimal +2% lift
Without
With
+2.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
25 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
34.6%
-5.4% vs TC avg
§103
39.8%
-0.2% vs TC avg
§102
11.0%
-29.0% vs TC avg
§112
9.9%
-30.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 63 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submissions filed on 09/08th/2025 (amendment) and 09/22nd/2025 (RCE) have been entered.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 12/02nd/2025. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Examiner's Note
The Examiner respectfully requests of the Applicant in preparing responses, to fully consider the entirety of the reference(s) as potentially teaching all or part of the claimed invention.  It is noted, REFERENCES ARE RELEVANT AS PRIOR ART FOR ALL THEY CONTAIN.  “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned.  They are part of the literature of the art, relevant for all they contain.”  In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)).  A reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art, including non-preferred embodiments (see MPEP 2123).  The Examiner has cited particular locations in the reference(s) as applied to the claim(s) above for the convenience of the Applicant.  Although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim(s), typically other passages and figures will apply as well.

Response to Argument
Applicant’s arguments, see REMARKS page 6-13 filed 09/08th/2025, regarding the rejection of claims 1-2, 4-11, and 13-19 under 35 U.S.C. §101 have been considered and they are persuasive. Therefore, the rejection of claims 1-2, 4-11, and 13-19 under 35 U.S.C. §101 has been withdrawn.

Applicant’s arguments, see REMARKS page 13-17 filed 09/08th/2025, regarding the rejection of claims 1-2, 4-11, and 13-19 under 35 U.S.C. §103 have been considered and they are moot in light of the new rejection.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 7-11, 15, and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over XIAO (A Novel Server-side Aggregation Strategy for Federated Learning in Non-IID situations), in view of ROTH (US20220366220A1), further in view of DAWOUD (US20210350931A1), further in view of BROWN (US20210256707A1).

Regarding claim 1, XIAO teaches “A method for aggregating parameters from a plurality of collaborator devices in a federated learning system that trains a model over multiple iterations of training” (page 19, section 3, paragraphs 1-2, XIAO: “the process of federated learning by ABAVG… In communicational epoch t-1… All the selected clients separately train the global model… The server separately copy the parameters uploaded by different clients… The server aggregate all of the parameters”; (EN): t-1 iterations is encompassed by the BRI of multiple rounds of training. Thus, an iterative process outlined at communication epoch t-1 which trains the models at each iteration is encompassed by the BRI of a system that trains a model over multiple rounds of training. Clients are encompassed by the BRI of collaborator devices).
XIAO further teaches “for each round of training the method comprising: receiving updated model parameters from two or more collaborator devices of the plurality of collaborator devices” (page 20, algorithm 1, XIAO: procedure ServerAggregation… select n clients from the total N clients. for k=1,2,…,n do 
    PNG
    media_image1.png
    45
    418
    media_image1.png
    Greyscale
… function ClientUpdate(k,w)… return wk to server”; (EN): By defining the clients as “for k=1,2,…,n” and drawn from the total collection of N clients as discussed above in Algorithm 1, WANG demonstrates that n must be greater than or equal to two. Thus, the n collaborator devices which provide model parameters to the server is encompassed by the BRI of two or more collaborator devices from the plurality of collaborator devices).
XIAO further teaches “calculating for each of the two or more collaborator devices a model divergence value that approximates how much a respective trained collaborator model for a respective collaborator device of the two or more collaborator devices deviates from a prior aggregated model” (page 19, section 3, paragraph 2, XIAO: “step 5 The server tests temp model’s accuracy by the validate dataset to get client k’s model accuracy (ak, k=1,2,…n)”; (EN): The validate dataset is defined as “dataset from a part of clients to form a validate dataset” (page 19, section3.B, paragraph 3, XIAO), which is encompassed by the BRI of a prior aggregated model, and XIAO clarifies that the algorithm performs this step for all n clients (“test accuracies of all clients”, page 20, algorithm 1, XIAO), which, as discussed above, is encompassed by the BRI of two or more models).
XIAO further teaches “aggregating model parameters for the model from the received updated model parameters based at least on the respective model divergence value for each collaborator device” (page 20, algorithm 1, XIAO: “
    PNG
    media_image2.png
    93
    404
    media_image2.png
    Greyscale
”; (EN): XIAO utilizes the determined accuracy (corresponding to the claimed divergence value) for each client (corresponding to the collaborator device) to weigh the client parameters before aggregating said client parameters to “form new global parameters 
    PNG
    media_image3.png
    45
    42
    media_image3.png
    Greyscale
” (page 19, section 3, paragraph 2, XIAO), where these new global parameters are the aggregated parameters).
XIAO further teaches “and transmitting the aggregated model parameters to the plurality of collaborator devices” (page 19, section 3, paragraph 2, XIAO: “all the selected clients download same global model parameters 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 from the server”; (EN): The BRI of transmitting parameters to a device encompasses the devices downloading the parameters).
XIAO further teaches “wherein the aggregated model parameters are used by the plurality of collaborator devices for a subsequent iteration of training” (page 19, section 3, paragraph 2, XIAO: “all the selected clients download same global model parameters 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 from the server… selected clients separately train the global model 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
… step 6 The server aggregate all the parameters based on the accuracy normalization and form new global parameters 
    PNG
    media_image3.png
    45
    42
    media_image3.png
    Greyscale
”; (EN): At the end of each communicational epoch of federated learning, XIAO’s new global parameters 
    PNG
    media_image3.png
    45
    42
    media_image3.png
    Greyscale
 update or become the global parameters 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 relied upon by the client devices (corresponding to the collaborator devices) at the start of the next communicational epoch, which corresponds to a new round of training).
However, XIAO is not relied upon to explicitly teach:
transmitting global model weights to the plurality of collaborator devices that do not share locally acquired training data.
training, independently by each of the plurality of collaborator devices, respective collaborator models initialized with the global model weights using the locally acquired training data, wherein training results in updated model parameters for each of the respective collaborator models.
wherein the model divergence value is calculated based on a sum of absolute values of a vector for the respective model parameters or calculated based on a square root of a sum of squared vector values for the respective model parameters.
repeating training, receiving, calculating, aggregating, and transmitting for at least ten iterations.
On the other hand, ROTH teaches transmitting global model weights to the plurality of collaborator devices that do not share locally acquired training data ([0100] sending 604 initial neural network weight values, such as a global model and global aggregation weight data, to all edge devices or clients of said global federated training architecture, as described above in conjunction with FIGS. 3-5. The examiner notes that ROTH teaches transmitting global model weights to the plurality of collaborator devices that use their own local data as shown in figure 3. The examiner further notes that XIAO and ROTH are both directed to machine learning and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified XIAO’s aggregation strategy for federated learning to incorporate transmitting global model weights to the plurality of collaborator devices that do not share locally acquired training data as taught by ROTH [0100] to perform learnable federated averaging in a global federated training architecture [0100]).
Furthermore, ROTH teaches training, independently by each of the plurality of collaborator devices, respective collaborator models initialized with the global model weights using the locally acquired training data, wherein training results in updated model parameters for each of the respective collaborator models ([0100] In at least one embodiment, each edge device or client of a global federated training architecture updates 606 its local models and aggregation weights in parallel with other edge devices or clients of said global federated training architecture, as described above in conjunction with FIGS. 3 and 5. The examiner notes that XIAO and ROTH are both directed to machine learning and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified XIAO’s aggregation strategy for federated learning to incorporate training, independently by each of the plurality of collaborator devices, respective collaborator models initialized with the global model weights using the locally acquired training data, wherein training results in updated model parameters for each of the respective collaborator models as taught by ROTH [0100] to perform learnable federated averaging in a global federated training architecture [0100]).
Furthermore, DAWOUD teaches wherein the model divergence value is calculated based on a sum of absolute values of a vector for the respective model parameters or calculated based on a square root of a sum of squared vector values for the respective model parameters ([0144] The parameters IIAII and IINII represent the square root of the sum of squared vector components                         
                            
                                
                                    
                                        A
                                        x
                                    
                                    
                                        2
                                    
                                
                                +
                                
                                    
                                        A
                                        y
                                    
                                    
                                        2
                                    
                                
                                +
                                
                                    
                                        A
                                        z
                                    
                                    
                                        2
                                    
                                
                            
                        
                     and                         
                            
                                
                                    
                                        N
                                        x
                                    
                                    
                                        2
                                    
                                
                                +
                                
                                    
                                        N
                                        y
                                    
                                    
                                        2
                                    
                                
                                +
                                
                                    
                                        N
                                        z
                                    
                                    
                                        2
                                    
                                
                            
                        
                    . The examiner notes that XIAO and DAWOUD are both directed to data analysis and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified XIAO’s training strategy to incorporate wherein the model divergence value is calculated based on a sum of absolute values of a vector for the respective model parameters or calculated based on a square root of a sum of squared vector values for the respective model parameters as taught by DAWOUD [0144] to account for the extent of correlation between the candidate posture and the baseline standing posture (e.g. in a normalized range 0 to 1) [0144]).
Furthermore, BROWN teaches repeating training, receiving, calculating, aggregating, and transmitting for at least ten iterations ([0068] The Adam optimizer was used with learning rate of                         
                            
                                
                                    5.10
                                
                                
                                    -
                                    5
                                
                            
                        
                     ,                         
                            
                                
                                    β
                                
                                
                                    1
                                
                            
                            =
                            0.9
                        
                    ,                         
                            
                                
                                    β
                                
                                
                                    2
                                
                            
                            =
                            0.999
                        
                    , and                         
                            ε
                            =
                            
                                
                                    10
                                
                                
                                    -
                                    10
                                
                            
                        
                    . One million training iterations were performed. The examiner notes that XIAO/ROTH in view of BROWN teaches performing one million iterations of training and transferring of the model weights to the federated server. The examiner further notes that XIAO/ROTH and BROWN are both directed to machine learning and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified XIAO’s training strategy to incorporate repeating training, receiving, calculating, aggregating, and transmitting for at least ten iterations as taught by BROWN [0068] to alternate optimization equally between generator and discriminator [0068]).

Claim 2
XIAO teaches “The method of claim 1”, as discussed above.
XIAO further teaches “further comprising: storing the aggregated model parameters as a preserved test dataset for calculating the model divergence value for the subsequent iteration of training” (page 19, section 3.b, paragraph 3, XIAO: “In practical applications, the server can firstly obtain a small amount of dataset from a part of clients to form a validate dataset… In this way, the server-side validate dataset will be well prepared”; (EN): The dataset of a client is reasonably understood to include the client model’s parameters).

Claim 7
XIAO further teaches “The method of claim 1”, as discussed above.
XIAO further teaches “wherein the plurality of collaborator devices train the model using non-independently and identically distributed datasets” (page 17, section 1, paragraphs 2-3, XIAO: “Non-Independent and Identical (Non-IID)… Non-IID problem can be divided into three types, feature distribution skew, concept shift, and label distribution skew [17], [19], [22]… In this paper, we focus on solving label distribution skew”; (EN): By focusing on a specific feature of learning with Non-IID data, XIAO’s teaching is encompassed by the BRI of collaborator devices which are trained using non-independently and identically distributed datasets).

Claim 8
Regarding claim 8, XIAO teaches the method of claim 1. However, XIAO is not relied upon to explicitly teach wherein the multiple iterations of training comprise more than one hundred iterations rounds of training. On the other hand, BROWN teaches wherein the multiple iterations of training comprise more than one hundred iterations rounds of training ([0068] The Adam optimizer was used with learning rate of                         
                            
                                
                                    5.10
                                
                                
                                    -
                                    5
                                
                            
                        
                     ,                         
                            
                                
                                    β
                                
                                
                                    1
                                
                            
                            =
                            0.9
                        
                    ,                         
                            
                                
                                    β
                                
                                
                                    2
                                
                            
                            =
                            0.999
                        
                    , and                         
                            ε
                            =
                            
                                
                                    10
                                
                                
                                    -
                                    10
                                
                            
                        
                    . One million training iterations were performed. The examiner notes that XIAO/ROTH in view of BROWN teaches performing one million iterations of training and transferring of the model weights to the federated server. The examiner further notes that XIAO/ROTH and BROWN are both directed to machine learning and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified XIAO’s training strategy to incorporate wherein the multiple iterations of training comprise more than one hundred iterations rounds of training as taught by BROWN [0068] to alternate optimization equally between generator and discriminator [0068]).

Claim 9
XIAO teaches “The method of claim 1”, as discussed above.
XIAO further teaches “wherein the model parameters comprise parameter vectors” (page 19, section 3, paragraph 2, XIAO: “step 4 The server separately copy the parameters uploaded by different clients 
    PNG
    media_image5.png
    32
    60
    media_image5.png
    Greyscale
 to temp model”; (EN): Paragraph [0024] of the instant specification states “A parameter vector may be a collection (e.g., set) of parameters from the model or a representation of the set of parameters”, but does not appear to explicitly define a vector. XIAO thus utilizes a vector 
    PNG
    media_image5.png
    32
    60
    media_image5.png
    Greyscale
 for k between 1 and n, in other words, a vector of size n, to encode the weight parameters uploaded by the client (collaborator) devices).

Claim 10
XIAO teaches “A system for federated learning” (page 18, section 1, paragraph 6, XIAO: “we proposed a new federated learning approach… server-side aggregation”; (EN): The BRI of a system includes a server).
XIAO further teaches “the system comprising: a plurality of collaborators, each collaborator of the plurality of collaborators configured to train a local machine learned model using locally acquired training data” (page 19, section 3, paragraph 2, XIAO: “n clients from all the N clients… step 2 All the selected clients separately train the global model 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 by their own datasets. They update the global model parameters 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 to their local model parameters”; (EN): The BRI of a plurality of collaborators includes n clients. The BRI of locally acquired training data encompasses clients’ own datasets).
XIAO further teaches “update local model weights for the local machine learned model” (page 19, section 3, paragraph 2, XIAO: “step 2 All the selected clients separately train the global model 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 by their own datasets. They update the global model parameters 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 to their local model parameters”).
XIAO further teaches “and send the updated local model weights to an aggregation server” (page 19, section 3, paragraph 2, XIAO: “step 3 All the selected clients upload their local model parameters… back to the server… The server aggregate all parameters”; (EN): A server which aggregates weights is encompassed by the BRI of an aggregation server).
XIAO further teaches “and the aggregation server configured to receive the updated model weights from the plurality of collaborators” (page 19, section 3, paragraph 2, XIAO: “step 4 The server separately copy the parameters uploaded by different clients”; (EN): The BRI of receiving data encompasses XIAO’s copying of uploaded data).
XIAO further teaches “calculate a model divergence value for each collaborator from respective updated model weights and a prior model” (page 19, section 3, paragraph 2, XIAO: “step 5 The server tests temp model’s accuracy by the validate dataset to get client k’s model accuracy (ak, k=1,2,…n)”; (EN): XIAO clarifies that the algorithm performs this step for all n clients (“test accuracies of all clients”, page 20, algorithm 1, XIAO), which, as discussed above, is encompassed by the BRI of two or more models. A measure of accuracy calculated based on a validation dataset is encompassed by a divergence value).
XIAO further teaches “calculate aggregated model weights based at least in part on the model divergence values” (page 20, algorithm 1, XIAO: “
    PNG
    media_image2.png
    93
    404
    media_image2.png
    Greyscale
”; (EN): XIAO utilizes the determined accuracy (corresponding to the claimed divergence value) for each client’s model to weigh the client parameters before aggregating said client parameters to “form new global parameters 
    PNG
    media_image3.png
    45
    42
    media_image3.png
    Greyscale
” (page 19, section 3, paragraph 2, XIAO), where these new global parameters are the aggregated parameters).
XIAO further teaches “and transmit the aggregated model weights to the plurality of collaborators to update the local machine learned model” (page 19, section 3, paragraph 2, XIAO: “all the selected clients download same global model parameters 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 from the server… selected clients separately train the global model 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
… They update the global model parameters… step 6 The server aggregate all the parameters based on the accuracy normalization and form new global parameters 
    PNG
    media_image3.png
    45
    42
    media_image3.png
    Greyscale
”; (EN): At the end of each communicational epoch of federated learning, XIAO’s new global parameters 
    PNG
    media_image3.png
    45
    42
    media_image3.png
    Greyscale
 update or become the global parameters 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 relied upon by the client devices (corresponding to the collaborator devices) at the start of the next communicational epoch, which corresponds to a next round of training, which, by definition, includes an update to the local machine learned model).
However, XIAO is not relied upon to explicitly teach:
initialized with global model weights.
wherein the model divergence value is calculated based on a sum of absolute values of a vector for the respective model parameters or calculated based on a square root of a sum of squared vector values for the respective model parameters.
wherein ten or more iterations are performed where the plurality of collaborators send updated local model after receiving aggregated model weights.
On the other hand, ROTH teaches initialized with global model weights ([0100] sending 604 initial neural network weight values, such as a global model and global aggregation weight data, to all edge devices or clients of said global federated training architecture, as described above in conjunction with FIGS. 3-5. In at least one embodiment, each edge device or client of a global federated training architecture updates 606 its local models and aggregation weights in parallel with other edge devices or clients of said global federated training architecture, as described above in conjunction with FIGS. 3 and 5. The examiner notes that ROTH teaches transmitting global model weights to the plurality of collaborator devices that use their own local data as shown in figure 3 and to train their local models. The examiner further notes that XIAO and ROTH are both directed to machine learning and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified XIAO’s aggregation strategy for federated learning to incorporate initialized with global model weights as taught by ROTH [0100] to perform learnable federated averaging in a global federated training architecture [0100]).
Furthermore, DAWOUD teaches wherein the model divergence value is calculated based on a sum of absolute values of a vector for the respective model parameters or calculated based on a square root of a sum of squared vector values for the respective model parameters ([0144] The parameters IIAII and IINII represent the square root of the sum of squared vector components                         
                            
                                
                                    
                                        A
                                        x
                                    
                                    
                                        2
                                    
                                
                                +
                                
                                    
                                        A
                                        y
                                    
                                    
                                        2
                                    
                                
                                +
                                
                                    
                                        A
                                        z
                                    
                                    
                                        2
                                    
                                
                            
                        
                     and                         
                            
                                
                                    
                                        N
                                        x
                                    
                                    
                                        2
                                    
                                
                                +
                                
                                    
                                        N
                                        y
                                    
                                    
                                        2
                                    
                                
                                +
                                
                                    
                                        N
                                        z
                                    
                                    
                                        2
                                    
                                
                            
                        
                    . The examiner notes that XIAO and DAWOUD are both directed to data analysis and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified XIAO’s training strategy to incorporate wherein the model divergence value is calculated based on a sum of absolute values of a vector for the respective model parameters or calculated based on a square root of a sum of squared vector values for the respective model parameters as taught by DAWOUD [0144] to account for the extent of correlation between the candidate posture and the baseline standing posture (e.g. in a normalized range 0 to 1) [0144]).
Furthermore, BROWN teaches wherein ten or more iterations are performed where the plurality of collaborators send updated local model after receiving aggregated model weights ([0068] The Adam optimizer was used with learning rate of                         
                            
                                
                                    5.10
                                
                                
                                    -
                                    5
                                
                            
                        
                     ,                         
                            
                                
                                    β
                                
                                
                                    1
                                
                            
                            =
                            0.9
                        
                    ,                         
                            
                                
                                    β
                                
                                
                                    2
                                
                            
                            =
                            0.999
                        
                    , and                         
                            ε
                            =
                            
                                
                                    10
                                
                                
                                    -
                                    10
                                
                            
                        
                    . One million training iterations were performed. The examiner notes that XIAO/ROTH in view of BROWN teaches performing one million iterations of training and transferring of the model weights to the federated server. The examiner further notes that XIAO/ROTH and BROWN are both directed to machine learning and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified XIAO’s training strategy to incorporate wherein ten or more iterations are performed where the plurality of collaborators send updated local model after receiving aggregated model weights as taught by BROWN [0068] to alternate optimization equally between generator and discriminator [0068]).

Claim 11
Claim 11 is rejected under the same rationale as claim 2 for being substantially similar, mutatis mutandis.

Claim 15
Claim 15 is rejected under the same rationale as claim 7 for being substantially similar, mutatis mutandis.

Claim 18
XIAO teaches “An aggregation server for federated learning of a model” (page 18, section 1, paragraph 6, XIAO: “we proposed a new federated learning approach, Accuracy Based Averaging (ABAVG)… server-side aggregation method based on models’ validated accuracies”; (EN): Server-side aggregation is encompassed by the BRI of an aggregation server).
XIAO further teaches “the aggregation server comprising: a transceiver configured to communicate with a plurality of collaborator devices” (page 19, section 3, paragraph 2, XIAO: “step 1 In communicational epoch t-1, the server randomly selects n clients from all the N clients, all the selected clients download same global model parameters 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 from the server”; (EN): Permitting specific clients to download data is encompassed in the BRI of communication. XIAO’s clients correspond to the collaborator devices. As discussed above, XIAO teaches an aggregation server). 
XIAO further teaches “a memory configured to store model parameters for the model” (page 19, section 3, paragraph 2, XIAO: “step 1… all the selected clients download same global model parameters 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 from the server”; (EN): In order for a client to download model parameters from a server, the server must first store the model parameters). 
XIAO further teaches “calculate for each collaborator device of the plurality of collaborator devices a model divergence value” (page 19, section 3, paragraph 2, XIAO: “step 5 The server tests temp model’s accuracy by the validate dataset to get client k’s model accuracy (ak, k=1,2,…n)”; (EN): XIAO clarifies that the algorithm performs this step for all n clients (“test accuracies of all clients”, page 20, algorithm 1, XIAO), which, as discussed above, is encompassed by the BRI of two or more models. A measure of accuracy calculated based on a validation dataset is encompassed by a divergence value).
XIAO further teaches “aggregate the model parameters from the plurality of collaborator devices at least in part based on the model divergence values” (page 20, algorithm 1, XIAO: “
    PNG
    media_image2.png
    93
    404
    media_image2.png
    Greyscale
”; (EN): XIAO utilizes the determined accuracy (corresponding to the claimed divergence value) for each client’s model to weigh the client parameters before aggregating said client parameters to “form new global parameters 
    PNG
    media_image3.png
    45
    42
    media_image3.png
    Greyscale
” (page 19, section 3, paragraph 2, XIAO), where these new global parameters are the aggregated parameters).
XIAO further teaches “and transmit the aggregated model parameters to the plurality of collaborator devices” (page 19, section 3, paragraph 2, XIAO: “all the selected clients download same global model parameters 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 from the server… selected clients separately train the global model 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
… They update the global model parameters… step 6 The server aggregate all the parameters based on the accuracy normalization and form new global parameters 
    PNG
    media_image3.png
    45
    42
    media_image3.png
    Greyscale
”; (EN): At the end of each communicational epoch of federated learning, XIAO’s new global parameters 
    PNG
    media_image3.png
    45
    42
    media_image3.png
    Greyscale
 update or become the global parameters 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 relied upon by the client devices (corresponding to the collaborator devices) at the start of the next communicational epoch, which corresponds to a next round of training, which, by definition, includes an update to the local machine learned model).
XIAO does not appear to explicitly disclose “and a processor configured to receive model parameters from the plurality of collaborator devices”.
However, in the same field, analogous art LI provides this additional functionality by teaching “and a processor configured to receive model parameters from the plurality of collaborator devices” (page 8, Table 1, LI: “Desktop; 11; Intel i7 CPU”; (EN): LI’s teachings pertain to a setting with “one server and 50 clients… We used one desktop worked as the server and let other 10 desktops work as clients” (page 8, section 5.A, paragraph 1, LI) in the setting of “federated learning” (page 1, ABSTRACT, LI), where, as discussed above, federated learning includes the transmission of model parameters).
XIAO and LI are analogous art because they are from the same field of endeavor as the claimed invention, namely federated learning. XIAO teaches a method for federated learning with divergence aware parameter aggregation, but does not appear to distinctly disclose a processor. LI provides the additional functionality by disclosing a processor in a federated learning context. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of XIAO with LI’s processor because “FL enables multiple local sites to collaboratively train a machine learning model… meanwhile keeping their datasets private” (page 1, section 1, paragraph 1, LI), as suggested by LI.
However, XIAO is not relied upon to explicitly teach:
based on a sum of absolute values of a vector for the respective updated local model weights or a square root of a sum of squared vector values for the respective model parameters.
wherein ten or more iterations are performed where the processor calculates the aggregated model parameters and receives updated model parameters from the collaborator devices.
On the other hand, DAWOUD teaches based on a sum of absolute values of a vector for the respective updated local model weights or a square root of a sum of squared vector values for the respective model parameters ([0144] The parameters IIAII and IINII represent the square root of the sum of squared vector components                         
                            
                                
                                    
                                        A
                                        x
                                    
                                    
                                        2
                                    
                                
                                +
                                
                                    
                                        A
                                        y
                                    
                                    
                                        2
                                    
                                
                                +
                                
                                    
                                        A
                                        z
                                    
                                    
                                        2
                                    
                                
                            
                        
                     and                         
                            
                                
                                    
                                        N
                                        x
                                    
                                    
                                        2
                                    
                                
                                +
                                
                                    
                                        N
                                        y
                                    
                                    
                                        2
                                    
                                
                                +
                                
                                    
                                        N
                                        z
                                    
                                    
                                        2
                                    
                                
                            
                        
                    . The examiner notes that XIAO and DAWOUD are both directed to data analysis and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified XIAO’s training strategy to incorporate based on a sum of absolute values of a vector for the respective updated local model weights or a square root of a sum of squared vector values for the respective model parameters as taught by DAWOUD [0144] to account for the extent of correlation between the candidate posture and the baseline standing posture (e.g. in a normalized range 0 to 1) [0144]).
Furthermore, BROWN teaches wherein ten or more iterations are performed where the processor calculates the aggregated model parameters and receives updated model parameters from the collaborator devices ([0068] The Adam optimizer was used with learning rate of                         
                            
                                
                                    5.10
                                
                                
                                    -
                                    5
                                
                            
                        
                     ,                         
                            
                                
                                    β
                                
                                
                                    1
                                
                            
                            =
                            0.9
                        
                    ,                         
                            
                                
                                    β
                                
                                
                                    2
                                
                            
                            =
                            0.999
                        
                    , and                         
                            ε
                            =
                            
                                
                                    10
                                
                                
                                    -
                                    10
                                
                            
                        
                    . One million training iterations were performed. The examiner notes that XIAO/ROTH in view of BROWN teaches performing one million iterations of training and transferring of the model weights to the federated server. The examiner further notes that XIAO/ROTH and BROWN are both directed to machine learning and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified XIAO’s training strategy to incorporate wherein ten or more iterations are performed where the processor calculates the aggregated model parameters and receives updated model parameters from the collaborator devices as taught by BROWN [0068] to alternate optimization equally between generator and discriminator [0068]).

Claim 19
The combination of XIAO and LI teaches “The aggregation server of claim 18”, as discussed above.
XIAO further teaches “wherein the model divergence value approximates how much an updated collaborator model deviates from a previous aggregated model” (page 19, section 3, paragraph 2, XIAO: “step 5 The server tests temp model’s accuracy by the validate dataset to get client k’s model accuracy (ak, k=1,2,…n)”; (EN): The validate dataset is defined as “dataset from a part of clients to form a validate dataset” (page 19, section3.B, paragraph 3, XIAO), which is encompassed by the BRI of a previous aggregated model, and XIAO clarifies that the algorithm performs this step for all n clients (“test accuracies of all clients”, page 20, algorithm 1, XIAO), which is encompassed by the BRI of two or more models).

Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over XIAO (A Novel Server-side Aggregation Strategy for Federated Learning in Non-IID situations), in view of ROTH (US20220366220A1), further in view of DAWOUD (US20210350931A1), further in view of BROWN (US20210256707A1), further in view of APPALARAJU (US10467526).

Claims 4 and 13
Regarding claim 4, XIAO teaches “The method of claim 1”, as discussed above. XIAO further teaches “further comprising: calculating a class imbalance ratio……for each of the plurality of collaborator devices” (page 19, section 3.b, paragraphs 4-5, XIAO: “Then the server set weights by the following equation: 
    PNG
    media_image6.png
    92
    490
    media_image6.png
    Greyscale
 pk denotes the weight of the kth client. ai denotes the ith client model’s validated accuracy, ak denotes the kth client model’s validated accuracy. n is the number of selected clients”; (EN): XIAO’s defined weight pk corresponds to the kth client’s class imbalance ratio).
XIAO further teaches “wherein the model parameters are aggregated based further on the class imbalance ratios” (page 20, algorithm 1, XIAO: “
    PNG
    media_image2.png
    93
    404
    media_image2.png
    Greyscale
”; (EN): By including the weight term pk for each client k in the aggregation of global parameters, XIAO teaches model parameters which are aggregated further based on the class imbalance ratio).
However, XIAO is not relied upon to explicitly teach a class imbalance ratio comprising a ratio between positive and negative samples. On the other hand, APPALARAJU teaches a class imbalance ratio comprising a ratio between positive and negative samples ([Col. 9, Line] In one embodiment, the sorted subset of candidate pairs may be further shortened based on class imbalance criteria-e.g., if 20 the sorted subset comprises X % positive and (100-X) % negative pairs, and the expected ratio of positive and negative pairs in real-world scenarios is Y % positive and (100-Y % ) negative, some number of positive or negative pairs may be removed to more closely match the expected ratio. In 25 some embodiments, such class imbalance adjustments may not be performed. The examiner notes that XIAO and APPALARAJU are both directed to machine learning and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified XIAO’s aggregation strategy to incorporate a class imbalance ratio comprising a ratio between positive and negative samples as taught by APPALARAJU [Col. 9, Line] to perform class imbalance adjustments to more closely match the expected ratio [Col. 9, Line 25-26]).

Claim 13
Claim 13 is rejected under the same rationale as claim 4 for being substantially similar, mutatis mutandis.

Claims 5, 14, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over XIAO (A Novel Server-side Aggregation Strategy for Federated Learning in Non-IID situations), in view of ROTH (US20220366220A1), further in view of DAWOUD (US20210350931A1), further in view of BROWN (US20210256707A1), in view of “Sample-level Data Selection for Federated Learning” by Anran Li et al., referenced herein as LI.

Claim 5
XIAO teaches “The method of claim 1”, as discussed above.
XIAO further teaches “wherein the model parameters are aggregated based further on the number of data samples” (page 19, section 3, paragraph 2, XIAO: “step 2 All the selected clients separately train the global model 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 by their own datasets. They update 
    PNG
    media_image4.png
    45
    79
    media_image4.png
    Greyscale
 to their local parameters… step 4 the server separately copy the parameters uploaded by different clients… to a temp model. step 5 The server tests temp model’s accuracy by the validate dataset to get client k’s model accuracy… step 6 The server aggregate all the parameters based on the accuracy normalization”; (EN): By teaching a method for aggregating model parameters based on weights updated by the selected data sample, XIAO teaches a method where the model parameters are aggregated based further on the number of data samples).
XIAO does not appear to distinctly disclose “further comprising: determining a number of data samples of each of the plurality of collaborator devices”.
However, in the same field, analogous art LI provides this additional functionality by teaching “further comprising: determining a number of data samples of each of the plurality of collaborator devices” (page 4, section 3.A.2, paragraph 1, LI: “The server dynamically selects collections of samples from selected clients to compose training batches for each epoch by prioritizing error-free important samples”; (EN): Determining the number of data samples of each collaborator device includes determining the collection of data samples from the clients to compose training batches because the magnitude of the training batches is encompassed by the BRI of a determined number of data samples).
XIAO and LI are analogous art because they are from the same field of endeavor as the claimed invention, namely federated learning. XIAO teaches a method for federated learning with divergence aware parameter aggregation, but does not appear to distinctly disclose determining a number of data samples for each of the collaborator devices. LI provides the additional functionality by disclosing a sample level method for data selection in a federated learning context. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of XIAO with LI’s method of data sample selection because traditional approaches to dynamic data selection “however, cannot be directly applied to FL systems for the same reasons that they require access to all training samples, which violates the privacy of participants in FL systems” (page 1, section 1, paragraph 2, LI), as suggested by LI.

Claim 14
Claim 14 is rejected under the same rationale as claim 5 for being substantially similar, mutatis mutandis.

Claim 17

XIAO teaches “The system of claim 10”, as discussed above.
XIAO does not appear to explicitly disclose “wherein the plurality of collaborators each comprise a hospital or medical center”.
However, in the same field, analogous art LI provides this additional functionality by teaching “wherein the plurality of collaborators each comprise a hospital or medical center” (page 1, section 1, paragraph 1, LI: “For example, using FL, three hospitals can collaboratively build a deep learning model for analyzing tumor images while preserving the privacy of their patients”; (EN): Three hospitals collaboratively using FL is encompassed by the BRI of the plurality of collaborators comprising a hospital or medical center).
XIAO and LI are analogous art because they are from the same field of endeavor as the claimed invention, namely federated learning. XIAO teaches a method for federated learning with divergence aware parameter aggregation, but does not appear to distinctly disclose collaborators which comprise a hospital or medical center. LI provides the additional functionality by disclosing an example of federated learning between hospitals. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of XIAO with LI’s medical setting because “FL enables multiple local sites to collaboratively train a machine learning model… meanwhile keeping their datasets private” (page 1, section 1, paragraph 1, LI) for the purpose of “preserving the privacy of their patients” (page 1, section 1, paragraph 1, LI), as suggested by LI.


Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over XIAO (A Novel Server-side Aggregation Strategy for Federated Learning in Non-IID situations), in view of ROTH (US20220366220A1), further in view of DAWOUD (US20210350931A1), further in view of BROWN (US20210256707A1), in view of “Multi-institutional Deep Learning Modeling Without Sharing Patient Data: A Feasibility Study on Brain Tumor Segmentation” by Micah J. Sheller et al., referenced herein as SHELLER.

Claim 6
XIAO teaches “The method of claim 1”, as discussed above.
XIAO does not appear to explicitly disclose “wherein the model comprises a segmentation network configured to automatically quantify abnormal computed tomography patterns”.
However, in the same field, analogous art SHELLER provides this additional functionality by teaching “wherein the model comprises a segmentation network” (page 93, section 1, paragraph 3, SHELLER: “federated learning (FL)… we apply FL… to build an effective segmentation model”).
SHELLER further teaches “configured to automatically quantify abnormal computed tomography patterns” (page 95, section 2.4, paragraph 1, SHELLER: “standard deep learning topologies for image segmentation and has been instrumental in creating prediction models for segmenting… lungs in CT scans”; (EN): With reference to types of data, SHELLER’s method utilizes “radiographically abnormal regions of each brain scan” (page 96, section 2.5, paragraph 1, SHELLER)).
XIAO and SHELLER are analogous art because they are from the same field of endeavor as the claimed invention, namely federated learning. XIAO teaches a method for divergence aware federated learning, but does not appear to distinctly disclose utilizing a segmentation model configured to automatically quantify abnormal tomography patterns. SHELLER provides the additional functionality by disclosing a federated learning approach to brain tumor segmentation. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the machine learning system of XIAO with SHELLER’s segmentation model “considering the difficulty of creating public centralized medical imaging datasets” (page 93, section 1, paragraph 3, SHELLER), as suggested by SHELLER.

Claim 16
Claim 16 is rejected under the same rationale as claim 6 for being substantially similar, mutatis mutandis.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
MIMASSI (US 11,238,849 B1)
“MIMASSI teaches a method for federated context - sensitive language models comprising a federated language model server and a plurality of edge devices”
ALIZADEH (US 2022/0383142 Al)
“ALIZADEH teaches a method for identifying content on social media related
to one or more coordinated influence efforts”
“SCAFFOLD: Stochastic Controlled Averaging for Federated Learning” by Sai Praneeth Karimireddy et al. teaches control variables for variance reduction between the global and local models.
“Federated Optimization in Heterogeneous Networks” by Tian Li et al., referenced herein as TIAN, teaches a proximal term, defined as the L2-Norm of the local model and global model, which is utilized to regulate the influence of local models and reduce model divergence. 
“Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization” by Jianyu Wang et al. utilizes the method outlined by TIAN with additional normalization processing to analyze convergence of federated heterogenous optimization algorithms.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAMCY ALGHAZZY whose telephone number is (571)272-8824.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ RIVAS can be reached on (571) 272-2589.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAMCY ALGHAZZY/Examiner, Art Unit 2128  
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128
Read full office action
Prosecution Timeline

Sep 28, 2021
Application Filed
Feb 20, 2025
Non-Final Rejection — §103
May 20, 2025
Response Filed
Jul 10, 2025
Final Rejection — §103
Sep 08, 2025
Response after Non-Final Action
Sep 22, 2025
Request for Continued Examination
Sep 30, 2025
Response after Non-Final Action
Feb 07, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/613,773
Patent 12596925
SINGLE-STAGE MODEL TRAINING FOR NEURAL ARCHITECTURE SEARCH
4y 4m to grant Granted Apr 07, 2026
18/612,881
Patent 12596922
ACCELERATING NEURAL NETWORKS IN HARDWARE USING INTERCONNECTED CROSSBARS
2y 0m to grant Granted Apr 07, 2026
19/236,733
Patent 12579408
ADAPTIVELY TRAINING OF NEURAL NETWORKS VIA AN INTELLIGENT LEARNING MANAGEMENT SYSTEM
9m to grant Granted Mar 17, 2026
17/704,176
Patent 12572847
SYSTEMS AND METHODS FOR RESOURCE-AWARE MODEL RECALIBRATION
3y 11m to grant Granted Mar 10, 2026
16/678,038
Patent 12566966
TRAINING ADAPTABLE NEURAL NETWORKS BASED ON EVOLVABILITY SEARCH
6y 3m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
49%
Grant Probability
51%
With Interview (+2.0%)
4y 4m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 63 resolved cases by this examiner. Grant probability derived from career allowance rate.