Last updated: May 04, 2026

Application No. 18/227,535

GROUP BIAS MITIGATION IN FEDERATED LEARNING SYSTEMS

Non-Final OA §102§103

Filed

Jul 28, 2023

Examiner

KATZ, DYLAN MICHAEL

Art Unit

3657

Tech Center

3600 — Transportation & Electronic Commerce

Assignee

Cisco Technology Inc.

OA Round

1 (Non-Final)

Interview Optional

— +21.0% interview lift. Examiner has a relatively high allowance rate (87%); +21.0% interview lift. A written response may suffice.

Based on 286 resolved cases, 2023–2026

Examiner Intelligence

KATZ, DYLAN MICHAEL View full profile →

Grants 87% — above average

Career Allowance Rate

248 granted / 286 resolved

+34.7% vs TC avg

Strong +21% interview lift

Without

With

+21.0%

Interview Lift

resolved cases with interview

Typical timeline

2y 5m

Avg Prosecution

40 currently pending

Career history

326

Total Applications

across all art units

Statute-Specific Performance

§101

7.7%

-32.3% vs TC avg

§103

50.3%

+10.3% vs TC avg

§102

20.2%

-19.8% vs TC avg

§112

16.4%

-23.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 286 resolved cases

Office Action

§102 §103

DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale , or otherwise available to the public before the effective filing date of the claimed invention. (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. Claim(s) 1 -4, 6-1 4, 16-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zhou et al ( US 20230084507 , hereinafter Zhou ) . Regarding Claim 1, Zhou teaches: 1. A method (see at least " a method for training a primary machine learning model using vertical federated learning. " in par. 0022) comprising: generating, by a supervisory device in a federated learning system, an aggregated model that aggregates a plurality of machine learning models trained by trainer nodes in the federated learning system during a training round (see at least " The server 110 may be used to train a centralized global model (referred to hereinafter as a global model) using FL. " in par. 0061 and “ Instead, the federated learning module 125 executed by the server 110 aggregates the predictions of the local models 136 and propagates an aggregated prediction 658 to the computing systems 102. ” In par. 0155 ) ; computing, by the supervisory device, an accuracy loss metric for the aggregated model (see at least " At 404, the global model 126 is trained. The computing systems 102 of all data owners 1 through k compute their model outputs 360, o.sub.1, . . . , o.sub.k, based on their respective local datasets 140, and send the outputs 360 to the server 110. Based on the outputs 360 received from the computing systems 102 of the data owners 1 through k, the server 110 computes its global prediction 358, o.sub.0=f.sub.0(o.sub.1, . . . , o.sub.k|θ.sub.0) (i.e. the output of the global model 126), and uses its global prediction 358 (i.e. predicted labels) to compute: [0132] The loss related to the task T, using the labels 302; " in par. 0131) ; computing, by the supervisory device, a fairness loss metric for the aggregated model based on fairness-related metrics associated with the plurality of machine learning models trained by the trainer nodes (see at least “ The computing system 102 of an active party can calculate both the loss function with respect to the task T, and the fairness, locally. ” In par. 0098 and " The DEO (i.e. |{circumflex over (l)}.sup.a(θ)−{circumflex over (l)}.sup.b(θ)|) for each pair of protected classes a and b relative to each other, using the protected class information 304; " in par. 0131 and “ Thus, the server 110 has to communicate with the computing system 102a of the task owner to compute the loss and the fairness constraint function, and to generate partitions 354. Thus, those calculations are instead performed by the computing system 102a of the task owner, and the server 110 acts only to distribute this information to the other computing systems 102 and aggregate the outputs 360 of the local models 136. ” In par. 0176 ) ; and initiating, by the supervisory device, an additional training round during which the trainer nodes retrain their machine learning models for aggregation by the supervisory device, in accordance with a constrained optimization problem that seeks to optimize a tradeoff between accuracy and fairness associated with the aggregated model. (see at least “ At 404, the global model 126 is trained. ” In par. 0131, “ Based on the calculated DEO (i.e. |{circumflex over (l)}.sup.a(θ)−{circumflex over (l)}.sup.b(θ)|) for a given fairness constraint, the server 110 updates the variable λ associated with the constraint. The server 110 then broadcasts λ and respective local gradients 352 ” in par. 0137 and " As shown in FIG. 4, step 404 is iterated until a convergence condition is satisfied at step 414. " in par. 0141 ) Regarding Claim 2, Zhou teaches: The method as in claim 1, wherein the supervisory device generates the aggregated model based on model parameters associated with the plurality of machine learning models trained by the trainer nodes. (see at least "s teps of method 400 shown in FIG. 4 may be performed in parallel, and their sub-steps as described below may include overlapping operations: for example, the training of the global model at 404 may be performed in parallel with training of each local model at 408, wherein each global model and local model is trained during each round of mutual communication. Thus, the sub-steps of 404 and each iteration of 408 described below may refer to the same sub-steps performed in each of the other training steps for other models (i.e., 404 or another iteration of 408). " in par. 0147) Regarding Claim 3, Zhou teaches: The method as in claim 2, wherein the trainer nodes do not share their training data on which they trained the plurality of machine learning models with the supervisory device. (see at least " In contrast, examples described herein may preserve the privacy of the local datasets when training a model using vertically partitioned data 230. In some examples described herein, none of the computing systems 102 of data owners custom-character.sub.1, . . . , custom-character.sub.N exposes its respective private data custom-character.sub.1, . . . , custom-character.sub.N or model parameters, but the computing systems 102 of all the data owners collaboratively use their private data custom-character.sub.1, . . . , custom-character.sub.N to train a model custom-character.sub.FED which has comparable performance to a hypothetical model custom-character.sub.SUM which had been trained using data collected from the computing systems of the data owners. " in par. 0075) . Regarding Claim 4, Zhou teaches: 4. The method as in claim 1, further comprising: receiving, at the supervisory device, the fairness-related metrics from the trainer nodes. (see at least " The computing system 102 of an active party can calculate both the loss function with respect to the task T, and the fairness, locally. " in par. 0098 ) Regarding Claim 6, Zhou teaches: 6. The method as in claim 1, further comprising: determining, by the supervisory device, that the additional training round resulting in an optimized aggregated model. (see at least " As shown in FIG. 4, step 404 is iterated until a convergence condition is satisfied at step 414. " in par. 0141) Regarding Claim 7, Zhou teaches: 7. The method as in claim 1, wherein the trainer nodes are geographically distributed. (see at least " The system 100 includes a plurality of computing systems 102 wherein each computing system 102 is controlled by one of a plurality of different data owners. The computing system 102 of each data owner collects and stores a respective set of private data (also referred to as a local dataset or private dataset) … A computing system 102 may be a server, a collection of servers, an edge device, an end user device (which may include such devices (or may be referred to) as a client device/terminal, user equipment/device (UE), wireless transmit/receive unit (WTRU), mobile station, fixed or mobile subscriber unit, cellular telephone, station (STA), personal digital assistant (PDA), smartphone, laptop, computer, tablet, wireless sensor, wearable device, smart device, machine type communications device, smart (or connected) vehicles, or consumer electronics device, among other possibilities), or may be a network device (which may include (or may be referred to as) a base station (BS), router, access point (AP), personal basic service set (PBSS) coordinate point (PCP), eNodeB, or gNodeB, among other possibilities). " in par. 0059 ) Regarding Claim 8, Zhou teaches: 8. The method as in claim 1, further comprising: determining, by the supervisory device, whether a further training round is needed after the additional training round to optimize a tradeoff between accuracy and fairness associated with the aggregated model. (see at least " As shown in FIG. 4, step 404 is iterated until a convergence condition is satisfied at step 414. " in par. 0141) Regarding Claim 9, Zhou teaches: 9. The method as in claim 1, wherein the aggregated model is configured to classify sensitive or confidential information. (see at least "M L models trained on datasets of personal and financial data can often become biased with respect to certain sensitive attributes such as gender, age etc. This may be the result of strong correlation of such sensitive attributes with other non-sensitive attributes such as salary and education. " in par. 0077) Regarding Claim 10, Zhou teaches: 10. The method as in claim 1, wherein the aggregated model is configured to classify image data. (see at least " As used herein, an “input sample” may refer to any data sample used as an input to a machine learning model, such as image data. It may refer to a training data sample used to train a machine learning model, or to a data sample provided to a trained machine learning model which will infer (i.e. predict) an output based on the data sample for the task for which the machine learning model has been trained. Thus, for a machine learning model that performs a task of image classification, an input sample may be a single digital image. " in par. 0014) Regarding Claim 11, Zhou also teaches: a n apparatus (see at least "server 110" in par. 0064) , comprising: one or more network interfaces (see at least " The server 110 may include one or more network interfaces 122 for wired or wireless communication with the network 104, the computing systems 102, or other entity in the system 100. " in par. 0064) ; a processor coupled to the one or more network interfaces and configured to execute one or more processes (see at least " The server 110 may include one or more processing devices 114, such as a processor, a microprocessor, a digital signal processor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a dedicated logic circuitry, a dedicated artificial intelligence processor unit, a tensor processing unit, a neural processing unit, a hardware accelerator, or combinations thereof. The one or more processing devices 114 may be jointly referred to herein as a processor 114, processor device 114, or processing device 114. " in par. 0063) ; and a memory configured to store a process that is executable by the processor, the process when executed configured to (see at least " The server 110 may include one or more memories 128, which may include a volatile or non-volatile memory (e.g., a flash memory, a random access memory (RAM), and/or a read-only memory (ROM)). " in par. 0066) : implement the method of Claim 1 (see Claim 1 analysis for rejection of the method) Regarding Claim 12, Zhou also teaches: An apparatus for implementing the method of Claim 2 (see Claim 2 analysis for rejection of the method) Regarding Claim 1 3 , Zhou also teaches: An apparatus for implementing the method of Claim 3 (see Claim 3 analysis for rejection of the method) Regarding Claim 14, Zhou also teaches: An apparatus for implementing the method of Claim 4 (see Claim 4 analysis for rejection of the method) Regarding Claim 16, Zhou also teaches: An apparatus for implementing the method of Claim 6 (see Claim 6 analysis for rejection of the method) Regarding Claim 17, Zhou also teaches: An apparatus for implementing the method of Claim 7 (see Claim 7 analysis for rejection of the method) Regarding Claim 18, Zhou also teaches: An apparatus for implementing the method of Claim 8 (see Claim 8 analysis for rejection of the method) Regarding Claim 19, Zhou also teaches: An apparatus for implementing the method of Claim 9 (see Claim 9 analysis for rejection of the method) Regarding Claim 20, Zhou also teaches: A tangible, non-transitory, computer-readable medium storing program instructions that cause a supervisory device in a federated learning system to execute a process comprising (see at least " The server 110 may include one or more memories 128, which may include a volatile or non-volatile memory (e.g., a flash memory, a random access memory (RAM), and/or a read-only memory (ROM)). The one or more non-transitory memories 128 may be jointly referred to herein as a memory 128 for simplicity. The memory 128 may store processor executable instructions 129 for execution by the processing device(s) 114, such as to carry out examples described in the present disclosure. " in par. 0066) : The method of Claim 1 (see Claim 1 analysis for rejection of the method) Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 5 , 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al ( US 20230084507 , hereinafter Zhou ) in view of Watanabe et al ( US 20230050708 , hereinafter Watanabe ). Regarding Claim 5, Zhou teaches: 5. The method as in claim 1, Zhou does not appear to explicitly teach all of the following, but Watanabe does teach : wherein a particular one of the trainer nodes computes a fairness-related metric for its machine learning model based on a difference in ratios of populations of training data that it used to train that machine learning model to that of global populations of training data used across the trainer nodes. (see at least “ Node analysis module 120 may analyze statistical data that describes the local training data of distributed computing nodes 130A-130N, rather than analyzing the training data itself. ” In par. 0031 and " In some embodiments, node analysis module 120 identifies bias in local training data sets by comparing the count of data samples for an overrepresented label to the count of data samples for other labels. For example, one or more outliers may be identified corresponding to labels that have more samples as compared to other labels, and the counts of those one or more outlying labels can be compared to the counts of the other labels to determine whether the training data set is biased. In some embodiments, the label having the highest count of data samples may be selected, and compared with the counts of the other labels; if the ratio of counts of one or more of the other labels to the count of most-represented label does not surpass a threshold value, then the training data set may be considered to be biased. ” in par. 00 32 and “ This data sample may initially be used for a first iteration of testing, or a small amount of other data samples may also be included before performing a first iteration of testing. At each iteration, the set of training data is used to train a test model, whose accuracy is tested; additionally, at each iteration, the set of training data becomes less biased, as more data samples corresponding to the underrepresented labels are included in the training data. For example, a first iteration may include a test model that is trained using training data that includes one thousand data samples of one particular label and only ten data samples of each other label, and a second iteration may increase the size of the other labels to include twenty samples each, etc. ” in par. 0035 and “ Threshold selection module 125 may select a threshold value corresponding to the ratio of the count of data samples for the one or more underrepresented data labels to the count of data samples for the overrepresented data label when the model's performance does not improve beyond a threshold amount for a number of iterations. ” In par. 0037) It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method taught by Zhou to incorporate the teachings of Watanabe wherein statistical metrics are computed to asses underrepresented and overrepresented samples in local training datasets of a federated learning system and bias is addressed during the training process by balancing the label representation, in order to arrive at the local fairness metric taught by Zhou being based on how appropriately labels are represented in each local dataset relative to the global dataset. The motivation to incorporate the teachings of Watanabe would be to reduce bias in the global model and reduce the time and computation resources needed for training (see par. 0020) . Regarding Claim 15, Zhou as modified by Watanabe also teaches: An apparatus for implementing the method of Claim 5 (see Claim 5 analysis for rejection of the method) Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT DYLAN M KATZ whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)272-2776 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT Mon-Thurs. 8:00-6:00 . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FILLIN "SPE Name?" \* MERGEFORMAT Abby Lin can be reached on FILLIN "SPE Phone?" \* MERGEFORMAT (571) 270-3976 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /DYLAN M KATZ/ Primary Examiner, Art Unit 3657

Read full office action

Prosecution Timeline

Jul 28, 2023

Application Filed

Feb 26, 2026

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/488,772

Patent 12604807

AGRICULTURAL SYSTEM FOR CONTROLLABLY OPTIMIZING HARVESTING OF FORAGE

4y 6m to grant Granted Apr 21, 2026

18/775,055

Patent 12605839

WASTE DISPOSAL AUTOMATION ROBOT SYSTEM

1y 9m to grant Granted Apr 21, 2026

18/522,504

Patent 12596378

Autonomous Control and Navigation of Unmanned Vehicles

2y 4m to grant Granted Apr 07, 2026

18/844,616

Patent 12594663

ROBOT SYSTEM AND CART

1y 7m to grant Granted Apr 07, 2026

18/306,293

Patent 12589499

Mobile Construction Robot

2y 11m to grant Granted Mar 31, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

87%

Grant Probability

99%

With Interview (+21.0%)

2y 5m (~0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 286 resolved cases by this examiner. Grant probability derived from career allowance rate.