Office Action Analysis: 18327952 — MODEL TRAINING METHOD, DATA PROCESSING METHOD, AND APPARATUS

Examiner Intelligence

DETERDING, GWYNEVERE AMELIA View full profile →
Grants 100% — above average
Career Allow Rate
2 granted / 2 resolved
+45.0% vs TC avg
Strong +100% interview lift
Without
With
+100.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
14 currently pending
Career history
16
Total Applications
across all art units
Statute-Specific Performance

§101
21.3%
-18.7% vs TC avg
§103
32.0%
-8.0% vs TC avg
§102
8.0%
-32.0% vs TC avg
§112
20.0%
-20.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 2 resolved cases
Office Action

§101 §102 §103
Be gDETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-24 are presented for examination.

Information Disclosure Statement
The information disclosure statements (IDS) submitted on September 25, 2024 and December 30, 2024 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Claim Objections
Claims 8 is objected to because of the following informalities:
Claim 8: “the performing weighting combination” should read “the performing weighted combination”
Appropriate correction is required.

Specification
The disclosure is objected to because of the following informalities:
[0003]: "are used" should read "is used"
[0007]: "non-independent and identically" should read "non-independent and identical"
[0077]: "all of embodiments" should read "all of the embodiments"
[00100]: "are package" should read "are a package"
[00106]: "input into the I/O interface 112" should read "input into the I/O interface 212" and "output from the I/O interface 112" should read "output from the I/O interface 212"
[00134]: "non-independent and identically" should read "non-independent and identical"
[00135]: "non-independent and identically" should read "non-independent and identically distributed"
[00169-00170]: there are two equations labeled (2-3), second equation (2-3) should be labeled (2-4), and third equation (2-4) should be labeled (2-5) 
[00171]: the equations in this paragraph should be re-labeled (2-6) (2-7) and (2-8)
[00172]: (2-8) should be re-labeled (2-9)
[00173]: "pc c indicates" should read "pc indicates"
[00174-00175]: (2-9) should be re-labeled (2-10)
[00176]: (2-10) should be re-labeled (2-11)
[00178]: (2-11) should be re-labeled (2-12)
[00180]: (2-11) should be re-labeled (2-13)
[00182]: (2-12) should be re-labeled (2-14)
[00197, 00200]: "first group of sample data set" should read "first group of the sample data set" and "second group of sample data set" should read "second group of the sample data set"
[00199] "second group of sample data set" should read "second group of the sample data set"
[00241]: "step 1011" is repeated twice, it appears that the first one should read “step 1001”
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance (“2019 PEG”).

Claim 1
Step 1: The claim is directed to a model training method, and therefore is directed to the statutory category of processes.
Step 2A Prong 1: The claim recites:
“obtaining a first loss value based on the first prediction result”: This limitation encompasses mentally obtaining a first loss value by mentally determining a difference between the first prediction result and an expected output
“obtaining a second loss value based on the second prediction result”: This limitation encompasses mentally obtaining a second loss value by mentally determining a difference between the second prediction result and an expected output
“performing combination processing on the first loss value and the second loss value to obtain a third loss value, for updating the first private model”: This limitation encompasses mentally combining the first and second loss values to obtain a third loss value, and using the third loss value to mentally update parameters of the first private model
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “receiving a first shared model from the server,” “outputting a first prediction result for a data set through the first shared model,” and “outputting a second prediction result for the data set through a first private model of the first client side device.” However, these limitations recite insignificant extra-solution activity, as they amount to mere data gathering and outputting (MPEP § 2106.05(g)).
Step 2B: The claim does not contain significantly more than the judicial exception. The receiving limitation, in addition to being insignificant extra-solution activity, is also directed to the well-understood, routine, and conventional activity of receiving or transmitting data over a network (MPEP § 2106.05(d)(II); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). The outputting limitations, in addition to being insignificant extra-solution activity, are also directed to the well-understood, routine, and conventional activity of storing and retrieving information in memory (MPEP § 2106.05(d)(II); Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93). As an ordered whole, the claim is directed to a mentally performable process of obtaining loss values for updating a model. Nothing in the claim provides significantly more than this. As such, the claim is not patent eligible. 

Claim 2
Step 1: A process, as above.
Step 2A Prong 1: The claim recites the same judicial exception as claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “wherein the first shared model and the first private model share a feature extraction model.” However, this limitation amounts to mere instructions to apply an exception using a generic class of computer algorithms (MPEP § 2106.05(f)), as it is merely further limiting the generically recited first shared and first private models to share a generically recited feature extraction model.
Step 2B: The claim does not contain significantly more than the judicial exception. The limitation amounts to mere instructions to apply an exception using a generic class of computer algorithms (MPEP § 2106.05(f)) for the same reasons stated above. 

Claim 3
Step 1: A process, as above.
Step 2A Prong 1: The claim recites:
“updating a feature extraction model of the second private model based on the feature extraction model of the first shared model, to obtain the first private model”: This limitation encompasses mentally updating parameters of a feature extraction model of the second private model to match the feature extraction model of the first shared model, to obtain the first private model
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “wherein before the first shared model is received, the first client side device stores a second private model.” However, this limitation is directed to insignificant extra-solution activity as it amounts to mere data gathering (MPEP § 2106.05(g)), since the first client side device is storing data corresponding to a second private model, for use in performing the judicial exception.
Step 2B: The claim does not contain significantly more than the judicial exception. The “stores a second private model” limitation, in addition to being insignificant extra-solution activity, is also directed to the well-understood, routine, and conventional activity of storing and retrieving information in memory (MPEP § 2106.05(d)(II); Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93).

Claim 4
Step 1: A process, as above. 
Step 2A Prong 1: The claim recites:
“wherein the third loss value is used to update the first shared model to obtain a second shared model”: This limitation encompasses mentally updating parameters of the first shared model using the third loss value, to obtain a second shared model
“wherein the second shared model is used…to update the first shared model”: This limitation encompasses mentally updating parameters of the first shared model to match the second shared model
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “sending the second shared model to the server.” However, this limitation is directed to insignificant extra-solution activity, as it amounts to mere data gathering and outputting (MPEP § 2106.05(g)). The claim also recites that the “server” performs the updating of the first shared model.  However, this limitation amounts to mere instructions to apply an exception using a generic computer (MPEP § 2106.05(f)).
Step 2B: The claim does not contain significantly more than the judicial exception. The sending limitation, in addition to being insignificant extra-solution activity, is also directed to the well-understood, routine, and conventional activity of receiving or transmitting data over a network (MPEP § 2106.05(d)(II); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). The server limitation is mere instructions to apply an exception using a generic computer (MPEP § 2106.05(f)) as stated above.

Claim 5
Step 1: A process, as above.
Step 2A Prong 1: The claim recites:
“wherein the first loss value comprises at least one of a cross-entropy loss value and/or a mutual distillation loss value, and the second loss value comprises at least one of a cross-entropy loss value and/or a mutual distillation loss value”: This limitation further limits the obtained first and second loss values to be at least one of a cross-entropy loss value and/or mutual distillation loss value, which are both mathematical calculations
Step 2A Prong 2: This judicial exception is not integrated into a practical application. See analysis of claim 1.
Step 2B: The claim does not contain significantly more than the judicial exception. See analysis of claim 1.

Claim 6
Step 1: A process, as above.
Step 2A Prong 1: The claim recites the same judicial exception as claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “wherein the data set comprises a first data set requiring privacy protection and a second data set not requiring privacy protection,” “outputting the first prediction result for the second data set through the first shared model,” and “outputting the second prediction result for the first data set and the second data set through the first private model.” However, these limitations are directed to insignificant extra-solution activity, as they amount to mere data gathering and outputting (MPEP § 2106.05(g)).
Step 2B: The claim does not contain significantly more than the judicial exception. The data set and outputting limitations, in addition to being insignificant extra-solution activity, are also directed to the well-understood, routine, and conventional activity of storing and retrieving information in memory (MPEP § 2106.05(d)(II); Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93).

Claim 7
Step 1: A process, as above.
Step 2A Prong 1: The claim recites:
“performing weighted combination processing on the first loss value and the second loss value to obtain the third loss value”: This limitation encompasses the mathematical concept of calculating a weighted combination of two values to obtain a third value, and is also mentally performable
Step 2A Prong 2: This judicial exception is not integrated into a practical application. See analysis of claim 1.
Step 2B: The claim does not contain significantly more than the judicial exception. See analysis of claim 1.

Claim 8
Step 1: A process, as above.
Step 2A Prong 1: The claim recites:
“obtaining an average value of the first loss value and the second loss value as the third loss value”: This limitation encompasses the mathematical concept of averaging two values to obtain a third value, and is also mentally performable
Step 2A Prong 2: This judicial exception is not integrated into a practical application. See analysis of claim 1.
Step 2B: The claim does not contain significantly more than the judicial exception. See analysis of claim 1.

Claim 9
Step 1: The claim is directed to a data processing method, and therefore is directed to the statutory category of processes.
Step 2A Prong 1: The claim recites:
“wherein the target model is obtained by updating a first private model based on a third loss value”: this limitation encompasses mentally updating parameters of a first private model based on a third loss value to obtain a target model
“the third loss value is obtained by performing combination processing on a first loss value and a second loss value”: this limitation encompasses mentally combining a first loss value and a second loss value to obtain the third loss value
“the first loss value is obtained based on a first prediction result”: this limitation encompasses mentally obtaining a first loss value by determining a difference between a first prediction result and an expected outcome
“the second loss value is obtained based on a second prediction result”: this limitation encompasses mentally obtaining a second loss value by determining a difference between a second prediction result and an expected outcome
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim further recites “obtaining to-be-processed data,” “the first prediction result is output by a first shared model for a data set,” “the first shared model is obtained from the server,” and “the second prediction result is output by the first private model for the data set.” However, these limitations are directed to insignificant extra-solution activity, as they amount to mere data gathering and outputting (MPEP § 2106.05(g)). The claim also further recites “processing the to-be-processed data based on a target model stored in the first client side device, to obtain a prediction result.” However, this limitation amounts to mere instructions to apply an exception on a generic computer (MPEP § 2106.05(f)), as it is merely applying the model created by the judicial exception on a generically recited client side device.
Step 2B: The claim does not contain significantly more than the judicial exception. The “obtaining to-be processed data” and “the first shared model is obtained from the server” limitations, in addition to being insignificant extra-solution activity, are also directed to the well-understood, routine, and conventional activity of receiving or transmitting data over a network (MPEP § 2106.05(d)(II); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). The output limitations, in addition to being insignificant extra-solution activity, are also directed to the well-understood, routine, and conventional activity of storing and retrieving information in memory (MPEP § 2106.05(d)(II); Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93). The processing data limitation amounts to mere instructions to apply an exception on a generic computer (MPEP § 2106.05(f)) for the same reasons stated above. As an ordered whole, the claim is directed to a mentally performable process of obtaining a target model by updating a model using obtained loss values. Nothing in the claim provides significantly more than this. As such, the claim is not patent eligible.

Claims 10-16
Step 1: A process, as above.
Step 2A Prong 1: Claims 10-16 recite the same judicial exception as claims 2-8, respectively, except insofar as they inherit the abstract ideas from claim 9 rather than claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. Claims 10-16 recite the same additional elements as claims 2-8, respectively, except insofar as they inherit the additional elements recited in claim 9 rather than claim 1.
Step 2B: The claim does not contain significantly more than the judicial exception. The analysis at this step mirrors that of claims 2-8, respectively, except insofar as the claims inherit the additional elements recited in claim 9 rather than claim 1.

Claims 17-24
Step 1: The claims are directed to a client side device, and are therefore directed to the statutory category of machines.
Step 2A Prong 1: Claims 17-24 recite the same judicial exception as claims 1-8, respectively.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. Claims 17-24 recite the same additional elements as claims 1-8, respectively, except insofar as they are directed to a client side device that “comprises a transceiver and a trainer.” However, the client side device limitation amounts to mere instructions to apply a judicial exception using a generic computer (MPEP § 2106.05(f)).
Step 2B: The claims do not contain significantly more than the judicial exception. The analysis at this step mirrors that of claims 1-8, respectively, except insofar as the claims are directed to a client side device, which amounts to mere instructions to apply a judicial exception using a generic computer (MPEP § 2106.05(f)) as stated above.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 4-5, 7-9, 12-13, and 15-16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Lin et al. (NPL: “LINDT: Tackling Negative Federated Learning with Local Adaptation”) (“Lin”). 

Regarding claim 1, Lin discloses “A model training method, wherein the method is applicable to a machine learning system, the machine learning system comprises a server (Figure 2: box labeled “Server”) and at least two client side devices (3.1: “Given a system of N clients”); including a first client side device (Figure 2: box labeled “Client”), and the method comprises:
receiving a first shared model from the server (Figure 2, (1): “broadcast global model” with arrow pointing from “Server” box to “Client” box);
outputting a first prediction result for a data set through the first shared model (Figure 2: “prediction” output from “Layer M” of “Layers in Global Model” in “Client” box, and “Local Data” input into “Layer 1”);
obtaining a first loss value based on the first prediction result (4.2: “For updating the dual-model, we denote by                         
                            
                                    G
                                
                                    M
                                
                                    x
                                    ,
                                    
                                            w
                                        
                                            i
                                        
                                            r
                                        
                            a
                            n
                            d
                             
                                    L
                                
                                    i
                                
                                    M
                                
                                    x
                                    ,
                                    
                                            v
                                        
                                            i
                                        
                            ,
                             
                    the respective pre-softmax logit outputs from the last layer of the dual-model (one global and one local), and let                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                     be cross entropy”; Equation (10):                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            G
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    w
                                                
                                                    i
                                                
                                                    r
                                                
                                    ,
                                    y
                                
                    ; the examiner notes that                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            G
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    w
                                                
                                                    i
                                                
                                                    r
                                                
                                    ,
                                    y
                                
                     corresponds to a first loss value based on the first prediction result, as it is the cross entropy loss of the output from the global model);
outputting a second prediction result for the data set through a first private model of the first client side device (Figure 2: “prediction” output from “Layer M” of “Layers in Local Model” in “Client” box, and “Local Data” input into “Layer 1”);
obtaining a second loss value based on the second prediction result (4.2: “For updating the dual-model, we denote by                         
                            
                                    G
                                
                                    M
                                
                                    x
                                    ,
                                    
                                            w
                                        
                                            i
                                        
                                            r
                                        
                            a
                            n
                            d
                             
                                    L
                                
                                    i
                                
                                    M
                                
                                    x
                                    ,
                                    
                                            v
                                        
                                            i
                                        
                            ,
                             
                    the respective pre-softmax logit outputs from the last layer of the dual-model (one global and one local), and let                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                     be cross entropy”; Equation (10):                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            L
                                        
                                            i
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    v
                                                
                                                    i
                                                
                                    ,
                                    y
                                
                    ; the examiner notes that                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            L
                                        
                                            i
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    v
                                                
                                                    i
                                                
                                    ,
                                    y
                                
                      corresponds to a second loss value based on the second prediction result, as it is the cross entropy loss of the output from the local model);
and performing combination processing on the first loss value and the second loss value to obtain a third loss value, for updating the first private model (4.2: “For updating the dual-model… Each client i objects to minimize                         
                            l
                            (
                            
                                    w
                                
                                    i
                                
                                    r
                                
                    ,                         
                            
                                    v
                                
                                    i
                                
                    ) = …”; see equation (10); the examiner notes that a third loss value                         
                            l
                            (
                            
                                    w
                                
                                    i
                                
                                    r
                                
                    ,                         
                            
                                    v
                                
                                    i
                                
                    ) is obtained by finding the expected value                         
                            E
                        
                     of the combination of the first and second loss values, and that updating the dual-model requires updating the local model).”

Regarding claim 4, the rejection of claim 1 is incorporated. Lin further discloses “wherein the third loss value is used to update the first shared model (4.2: “For updating the dual-model… Each client i objects to minimize                         
                            l
                            (
                            
                                    w
                                
                                    i
                                
                                    r
                                
                    ,                         
                            
                                    v
                                
                                    i
                                
                    ) = …”; see equation (10); the examiner notes that updating the dual-model requires updating the global model) to obtain a second shared model (Figure 2: “new global model”); and the method further comprises:
sending the second shared model to the server (Figure 2, (3) “upload new global model” with arrow pointing from “Client” box to “Server” box),
wherein the second shared model is used by the server to update the first shared model (Figure 2, (4) “aggregation” with arrow pointing from “Server” where the “new global model” was received to the original global model; 4.2: "At the end of each round r of dual-model training, clients are required to return the updated                         
                            
                                    w
                                
                                    i
                                
                                    r
                                
                     to the server for aggregation (same as line c5 in Algorithm 1). The server, upon receiving the first K updates from the active clients (Cr), aggregates the parameters in the same way as conventional FL.”).”

Regarding claim 5, the rejection of claim 1 is incorporated. Lin further discloses “wherein the first loss value comprises at least one of a cross-entropy loss value and/or a mutual distillation loss value, and the second loss value comprises at least one of a cross-entropy loss value and/or a mutual distillation loss value (4.2, equation (10); the examiner notes that the first loss value                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            G
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    w
                                                
                                                    i
                                                
                                                    r
                                                
                                    ,
                                    y
                                
                     and the second loss value                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            L
                                        
                                            i
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    v
                                                
                                                    i
                                                
                                    ,
                                    y
                                
                     are both cross-entropy loss values).

Regarding claim 7, the rejection of claim 1 is incorporated. Lin further discloses “wherein the performing combination processing on the first loss value and the second loss value to obtain a third loss value comprises: 
performing weighted combination processing on the first loss value and the second loss value to obtain the third loss value (4.2, equation (10); the examiner notes that the third loss value is obtained by finding the expected value                         
                            E
                        
                     of the combination of the first and second loss values, and expected value is a weighted average calculation).”

Regarding claim 8, the rejection of claim 7 is incorporated. Lin further discloses “wherein the performing weighting combination processing on the first loss value and the second loss value to obtain the third loss value comprises: 
obtaining an average value of the first loss value and the second loss value as the third loss value (4.2, equation (10); the examiner notes that the third loss value is obtained by finding the expected value                         
                            E
                        
                     of the combination of the first and second loss values, and expected value is a weighted average calculation).”

Regarding claim 9, Lin discloses “A data processing method, wherein the method is applicable to a machine learning system, the machine learning system comprises a server (Figure 2: box labeled “Server”) and at least two client side devices (3.1: “Given a system of N clients); including a first client side device (Figure 2: box labeled “Client”), and the method comprises:
obtaining to-be-processed data (Figure 2: “Local Data”);
and processing the to-be-processed data based on a target model stored in the first client side device, to obtain a prediction result (Figure 2 and Figure 2 Caption: “But when testing, each client only considers the prediction from its local model”; the examiner notes that the “Prediction” output from “Layers in Local Model” during testing (as opposed to training) corresponds to a “prediction result” and the “Layers in Local Model” during testing (which were updated during training) correspond to “a target model”),
wherein the target model is obtained by updating a first private model based on a third loss value (4.2: “For updating the dual-model… Each client i objects to minimize                         
                            l
                            (
                            
                                    w
                                
                                    i
                                
                                    r
                                
                    ,                         
                            
                                    v
                                
                                    i
                                
                    ) = …”; see equation (10); the examiner notes that updating the dual-model by minimizing a third loss value                         
                            l
                            (
                            
                                    w
                                
                                    i
                                
                                    r
                                
                    ,                         
                            
                                    v
                                
                                    i
                                
                    ) requires updating the local model, and the updated local model corresponds to the target model),
the third loss value is obtained by performing combination processing on a first loss value and a second loss value (4.2: “For updating the dual-model… Each client i objects to minimize                         
                            l
                            (
                            
                                    w
                                
                                    i
                                
                                    r
                                
                    ,                         
                            
                                    v
                                
                                    i
                                
                    ) = …”; see equation (10); the examiner notes that the third loss value                         
                            l
                            (
                            
                                    w
                                
                                    i
                                
                                    r
                                
                    ,                         
                            
                                    v
                                
                                    i
                                
                    ) is obtained by finding the expected value                         
                            E
                        
                     of the combination of a first loss value                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            G
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    w
                                                
                                                    i
                                                
                                                    r
                                                
                                    ,
                                    y
                                
                     and a second loss value                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            L
                                        
                                            i
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    v
                                                
                                                    i
                                                
                                    ,
                                    y
                                
                    ),
the first loss value is obtained based on a first prediction result (4.2: “For updating the dual-model, we denote by                         
                            
                                    G
                                
                                    M
                                
                                    x
                                    ,
                                    
                                            w
                                        
                                            i
                                        
                                            r
                                        
                            a
                            n
                            d
                             
                                    L
                                
                                    i
                                
                                    M
                                
                                    x
                                    ,
                                    
                                            v
                                        
                                            i
                                        
                            ,
                             
                    the respective pre-softmax logit outputs from the last layer of the dual-model (one global and one local), and let                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                     be cross entropy”; Equation (10):                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            G
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    w
                                                
                                                    i
                                                
                                                    r
                                                
                                    ,
                                    y
                                
                    ; the examiner notes that                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            G
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    w
                                                
                                                    i
                                                
                                                    r
                                                
                                    ,
                                    y
                                
                    corresponds to a first loss value obtained based on a first prediction result, as it is the cross entropy loss of the output prediction from the global model),
the first prediction result is output by a first shared model for a data set (Figure 2: “prediction” output from “Layer M” of “Layers in Global Model” in “Client” box, and “Local Data” input into “Layer 1”),
the first shared model is obtained from the server (Figure 2, (1): “broadcast global model” with arrow pointing from “Server” box to “Client” box),
the second loss value is obtained based on a second prediction result (4.2: “For updating the dual-model, we denote by                         
                            
                                    G
                                
                                    M
                                
                                    x
                                    ,
                                    
                                            w
                                        
                                            i
                                        
                                            r
                                        
                            a
                            n
                            d
                             
                                    L
                                
                                    i
                                
                                    M
                                
                                    x
                                    ,
                                    
                                            v
                                        
                                            i
                                        
                            ,
                             
                    the respective pre-softmax logit outputs from the last layer of the dual-model (one global and one local), and let                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                     be cross entropy”; Equation (10):                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            L
                                        
                                            i
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    v
                                                
                                                    i
                                                
                                    ,
                                    y
                                
                    ; the examiner notes that                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            L
                                        
                                            i
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    v
                                                
                                                    i
                                                
                                    ,
                                    y
                                
                     corresponds to a second loss value based on a second prediction result, as it is the cross entropy loss of the output prediction from the local model),
and the second prediction result is output by the first private model for the data set (Figure 2: “prediction” output from “Layer M” of “Layers in Local Model” in “Client” box, and “Local Data” input into “Layer 1”).”

Regarding claim 12, the rejection of claim 9 is incorporated. The further limitations of the claim correspond to those of claim 4, and the remainder of the rejection follows the same rationale as the rejection of claim 4 above.

Regarding claim 13, the rejection of claim 9 is incorporated. The further limitations of the claim correspond to those of claim 5, and the remainder of the rejection follows the same rationale as the rejection of claim 5 above.

Regarding claim 15, the rejection of claim 9 is incorporated. The further limitations of the claim correspond to those of claim 7, and the remainder of the rejection follows the same rationale as the rejection of claim 7 above.

Regarding claim 16, the rejection of claim 15 is incorporated. The further limitations of the claim correspond to those of claim 8, and the remainder of the rejection follows the same rationale as the rejection of claim 8 above.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 2-3 and 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Lin in view of Yang et al. (“Heterogeneous Data-Aware Federated Learning”) (“Yang”).

Regarding claim 2, the rejection of claim 1 is incorporated. Lin does not appear to explicitly disclose the further limitations of the claim.
However, Yang discloses “wherein…[a] first shared model and…[a] first private model share a feature extraction model (Yang, Section 4, paragraph 2: “Following this split of generic and specific layers, we propose to share the generic feature extraction, like the convolutional layers, between the servers and the clients and more precisely to keep the specific layer as the classification layer only on the clients side”; paragraph 4: “The server then broadcasts the initialized parameters to all the clients. At each round t of communications, the clients update the local parameters by copying the generic parameters wbt. For the first round of communication, the specific parameters are copied as well”; the examiner notes that the generic feature extraction convolutional layers that are shared between the servers and the clients correspond to “a feature extraction model”, the “initialized parameters” that are broadcast to all the clients corresponds to “a first shared model” and the client’s “local parameters” correspond to “a first private model”).”
Yang and the instant application both relate to federated learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified the first shared model and the first private model disclosed by Lin to share a feature extraction model as disclosed by Yang, and one would have been motivated to do so for the purposes of allowing the feature extraction part of the model to benefit from all the data and therefore increase its robustness, and decreasing the amounts of bytes exchanged between the client and the server compared to sharing the whole model (Yang, Section 4, paragraph 2).

Regarding claim 3, the rejection of claim 2 is incorporated. Lin in view of Yang further discloses “wherein before the first shared model is received, the first client side device stores a second private model (Yang, Section 4, paragraph 4: “The server then broadcasts the initialized parameters to all the clients. At each round t of communications, the clients update the local parameters by copying the generic parameters wbt.”; the examiner notes that the client’s “local parameters” prior to the first round of communications correspond to a second private model); and the method further comprises: 
updating a feature extraction model of the second private model based on the feature extraction model of the first shared model, to obtain the first private model (Yang, Section 4, paragraph 4: “The server then broadcasts the initialized parameters to all the clients. At each round t of communications, the clients update the local parameters by copying the generic parameters wbt. For the first round of communication, the specific parameters are copied as well. The clients update the received model on their local data with fixed epochs E and send back the updated generic parameters                         
                            w
                            
                                    b
                                
                                    t
                                
                                    k
                                
                     to server; the examiner notes that updating the generic local parameters by copying the generic shared parameters corresponds to “updating a feature extraction model of the second private model based on the feature extraction model of the first shared model,” and the updated local parameters correspond to “the first private model”)
Yang and the instant application both relate to federated learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified the disclosure of Lin to include storing a second private model before the first shared model is received, and updating a feature extraction model of the second private model based on the feature extraction model of the first shared model to obtain the first private model, as disclosed by Yang, and one would have been motivated to do so for the purposes of allowing the feature extraction part of the model to benefit from all the data and therefore increase its robustness, and decreasing the amounts of bytes exchanged between the client and the server compared to sharing the whole model (Yang, Section 4, paragraph 2).

Regarding claim 10, the rejection of claim 9 is incorporated. The further limitations of the claim correspond to those of claim 2, and the remainder of the rejection follows the same rationale as the rejection of claim 2 above. 

Regarding claim 11, the rejection of claim 10 is incorporated. The further limitations of the claim correspond to those of claim 3, and the remainder of the rejection follows the same rationale as the rejection of claim 3 above.

Claims 6 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Lin in view of Zhao et al. (“Federated Learning with Non-IID Data”) (“Zhao”).

Regarding claim 6, the rejection of claim 1 is incorporated. Lin does not appear to disclose the further limitations of the claim. However, Zhao discloses “wherein the data set comprises a first data set requiring privacy protection and a second data set not requiring privacy protection (Zhao, Section 4.1, Figure 6: “Private Data” and “Shared Data”);
…outputting a first prediction result for a data set through… [a] first shared model comprises: outputting the first prediction result for the second data set through the first shared model (Zhao, Section 4.2, Paragraph 1: “Herein, we propose a data-sharing strategy in the federated learning setting as illustrated in Figure 6. A globally shared dataset G that consists of a uniform distribution over classes is centralized in the cloud. At the initialization stage of FedAvg, the warm-up model trained on G and a random α portion of G are distributed to each client”; the examiner notes that the “warm-up model” that is shared to each client corresponds to “a first shared model,” and that training the warm-up model on the shared dataset (corresponding to the second data set) inherently involves outputting a first prediction result for the second data set, as the output is needed to train the model);
… outputting a second prediction result for the data set through a first private model of… [a] first client side device comprises: outputting the second prediction result for the first data set and the second data set through the first private model (Zhao, Section 4.2, Paragraph 1: “The local model of each client is trained on the shared data from G together with the private data from each client”; the examiner notes that the “local model” of one of the clients corresponds to “a first private model of a first client side device,” and that training the local model on the shared data from G (the random portion of G that was distributed to each client can be the whole set) and the private data from each client inherently involves outputting a second prediction result for the first data set and the second data set, as the output is needed to train the model).”
Zhao and the instant application both relate to federated learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified the outputting prediction result steps disclosed by Lin, to include using a second data set not requiring privacy protection with the first shared model, and both a first data set requiring privacy protection and the second data set not requiring privacy protection with the first private model as disclosed by Zhao. One would have been motivated to do so for the purpose of improving training on non-IID data by creating a subset of data which is globally shared, decreasing weight divergence and leading to improved accuracy of federated learning (see Zhao, Abstract).

Regarding claim 14, the rejection of claim 9 is incorporated. The further limitations of the claim correspond to those of claim 6, and the remainder of the rejection follows the same rationale as the rejection of claim 6 above.

Claims 17, 20, 21, and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Lin in view of Alabbasi et al. (WO2021121585) (“Alabbasi”).

Regarding claim 17, Lin discloses “A client side device (Figure 2: box labeled “Client”), wherein the client side device is applicable to a machine learning system, the machine learning system comprises at least two client side devices (3.1: “Given a system of N clients); including the client side device… wherein the… [client side device] is configured to receive a first shared model sent by a server (Figure 2, (1): “broadcast global model” with arrow pointing from “Server” box to “Client” box); and the… [client side device] is configured to: 
output a first prediction result for a data set through the first shared model (Figure 2: “prediction” output from “Layer M” of “Layers in Global Model” in “Client” box, and “Local Data” input into “Layer 1”); 
obtain a first loss value based on the first prediction result (4.2: “For updating the dual-model, we denote by                         
                            
                                    G
                                
                                    M
                                
                                    x
                                    ,
                                    
                                            w
                                        
                                            i
                                        
                                            r
                                        
                            a
                            n
                            d
                             
                                    L
                                
                                    i
                                
                                    M
                                
                                    x
                                    ,
                                    
                                            v
                                        
                                            i
                                        
                            ,
                             
                    the respective pre-softmax logit outputs from the last layer of the dual-model (one global and one local), and let                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                     be cross entropy”; Equation (10):                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            G
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    w
                                                
                                                    i
                                                
                                                    r
                                                
                                    ,
                                    y
                                
                    ; the examiner notes that                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            G
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    w
                                                
                                                    i
                                                
                                                    r
                                                
                                    ,
                                    y
                                
                     corresponds to a first loss value based on the first prediction result, as it is the cross entropy loss of the output from the global model); 
output a second prediction result for the data set through a first private model of the client side device (Figure 2: “prediction” output from “Layer M” of “Layers in Local Model” in “Client” box, and “Local Data” input into “Layer 1”); 
obtain a second loss value based on the second prediction result (4.2: “For updating the dual-model, we denote by                         
                            
                                    G
                                
                                    M
                                
                                    x
                                    ,
                                    
                                            w
                                        
                                            i
                                        
                                            r
                                        
                            a
                            n
                            d
                             
                                    L
                                
                                    i
                                
                                    M
                                
                                    x
                                    ,
                                    
                                            v
                                        
                                            i
                                        
                            ,
                             
                    the respective pre-softmax logit outputs from the last layer of the dual-model (one global and one local), and let                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                     be cross entropy”; Equation (10):                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            L
                                        
                                            i
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    v
                                                
                                                    i
                                                
                                    ,
                                    y
                                
                    , the examiner notes that                         
                            
                                    l
                                
                                    c
                                    r
                                    o
                                    s
                                    s
                                
                                            L
                                        
                                            i
                                        
                                            M
                                        
                                            x
                                            ,
                                            
                                                    v
                                                
                                                    i
                                                
                                    ,
                                    y
                                
                     corresponds to a second loss value based on the second prediction result, as it is the cross entropy loss of the output from the local model); 
and perform combination processing on the first loss value and the second loss value to obtain a third loss value for updating the first private model (4.2: “For updating the dual-model… Each client i objects to minimize                         
                            l
                            (
                            
                                    w
                                
                                    i
                                
                                    r
                                
                    ,                         
                            
                                    v
                                
                                    i
                                
                    ) = …”; see equation (10); the examiner notes that a third loss value                         
                            l
                            (
                            
                                    w
                                
                                    i
                                
                                    r
                                
                    ,                         
                            
                                    v
                                
                                    i
                                
                    ) is obtained by finding the expected value                         
                            E
                        
                     of the combination of the first and second loss values, and that updating the dual-model requires updating the local model).”
Lin does not appear to explicitly disclose that the client side device “comprises a transceiver and a trainer.”
However, Alabbasi discloses “…[a] client side device comprises a transceiver and a trainer (Alabbasi, Figure 7 and [00101]: Client Computing Device 700, Transceiver 701, and Processing Circuit 732; the examiner notes that the processing circuit corresponds to a “trainer” because it “performs respective operations discussed herein” of the client computing device, including training, see [0008]: “The client computing device can perform further operations training the aggregated machine learning model in iterations with inputs”).” 
Alabbasi and the instant application both relate to federated learning and are analogous. It would have been obvious to one of ordinary skill in the art, prior to the effective filing date of the claimed invention, to have modified the client side device disclosed by Lin to comprise a transceiver and a trainer as disclosed by Alabbasi, and one would have been motivated to do so for the purpose of providing the hardware necessary to allow the client device to send and receive communications to/from a central server, and train a model with the client’s own data to generate model updates for an aggregate model (see Alabbasi, [0008] and [00101]).

Regarding claim 20, the rejection of claim 17 is incorporated. The further limitations of the claim correspond to those of claim 4, and the remainder of the rejection follows the same rationale as the rejection of claim 4 above.

Regarding claim 21, the rejection of claim 17 is incorporated. The further limitations of the claim correspond to those of claim 5, and the remainder of the rejection follows the same rationale as the rejection of claim 5 above.

Regarding claim 23, the rejection of claim 17 is incorporated. The further limitations of the claim correspond to those of claim 7, and the remainder of the rejection follows the same rationale as the rejection of claim 7 above.

Regarding claim 24, the rejection of claim 23 is incorporated. The further limitations of the claim correspond to those of claim 8, and the remainder of the rejection follows the same rationale as the rejection of claim 8 above.

Claims 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Lin in view of Alabbasi, and further in view of Yang.

Regarding claim 18, the rejection of claim 17 is incorporated. The further limitations of the claim correspond to those of claim 2, and the remainder of the rejection follows the same rationale as the rejection of claim 2 above, except insofar as the combination of Lin and Alabbasi (as opposed to only Lin) is modified to include the teachings of Yang.

Regarding claim 19, the rejection of claim 18 is incorporated. The further limitations of the claim correspond to those of claim 3, and the remainder of the rejection follows the same rationale as the rejection of claim 3 above, except insofar as the combination of Lin and Alabbasi (as opposed to only Lin) is modified to include the teachings of Yang.

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Lin in view of Alabbasi, and further in view of Zhao. 

Regarding claim 22, the rejection of claim 17 is incorporated. The further limitations of the claim correspond to those of claim 6, and the remainder of the rejection follows the same rationale as the rejection of claim 6 above, except insofar as the combination of Lin and Alabbasi (as opposed to only Lin) is modified to include the teachings of Zhao.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GWYNEVERE A DETERDING whose telephone number is (571)272-7657. The examiner can normally be reached Mon-Fri. 7:30am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached at (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/G.A.D./Examiner, Art Unit 2125                                                                                                                                                                                                        

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125
Read full office action
Prosecution Timeline

Jun 02, 2023
Application Filed
Mar 05, 2024
Response after Non-Final Action
Jan 20, 2026
Non-Final Rejection — §101, §102, §103 (current)
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

1-2
Expected OA Rounds
100%
Grant Probability
99%
With Interview (+100.0%)
3y 3m
Median Time to Grant
Low
PTA Risk
Based on 2 resolved cases by this examiner. Grant probability derived from career allow rate.
MODEL TRAINING METHOD, DATA PROCESSING METHOD, AND APPARATUS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

MODEL TRAINING METHOD, DATA PROCESSING METHOD, AND APPARATUS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email