DETAILED ACTION
This non-final office action is responsive to application 18/127,272 as submitted on 28 March 2023.
Claim status is currently pending and under examination for claims 1-20 of which independent claims are 1 and 12.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3-5, 7, 12, 14-15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnaswamy et al. (US 20220414464 A1), hereinafter Krishnaswamy, in view of Hassanzadeh et al. (US 20230025754 A1), hereinafter Hassanzadeh.
With respect to claim 1, Krishnaswamy teaches:
A system for machine-learning …, the system comprising (Krishnaswamy discloses “a method of federated machine learning, and a system thereof, that seek to … improving the accuracy and/or reliability of federated machine learning” [0040].):
a global server comprising one or more processors configured to (Krishnaswamy discloses “the method 100 may be performed by a server (e.g., may be referred to as a central server or an aggregator server) configured to provide or coordinate (e.g., implement/execute and/or control/manage) federated machine learning as a cloud-based distributed service (e.g., a federated learning plan)” [0042].
Krishnaswamy discloses “The server 200 comprises a memory 202, and at least one processor 204 communicatively coupled to the memory 202” [0059].):
select a local participant subset from a plurality of local participants … (Krishnaswamy discloses data sources (‘local participants’), “the plurality of data sources may be referred to as participants (e.g., a federated learning population) in the federated machine learning provided by the server” [0042].
Krishnaswamy discloses “With respect to the server function 602, at each round t, the aggregator server may select a subset (e.g., a random subset) of m data sources (1 to M) from a set of data sources (1 to N), and sends to the selected subset of data sources (e.g., corresponding to “the plurality of data sources” as described hereinbefore according to various embodiments) the most up-to-date (i.e., current) global model Gt. In various example embodiments, prior to selecting the subset of data sources, the method 600 further comprises binning the set of data sources (1 to N) into a plurality of intervals (bins) of K quality ranges, and then selecting the subset of data sources from the set of data sources (binned in the plurality of intervals) for federation for the current round t” [0096].);
subscribe the local participant subset to a federated learning system to contribute to a global machine-learning model configured to perform a task … (Krishnaswamy discloses “The aggregator server may then receive a plurality of training updates (e.g., a difference
∆
L
S
t
in the example) from the selected subset of data sources, respectively (after the respective data source has generated the respective training update in response to the current global model received), and then update the current global model based on the plurality of training updates received and the plurality of data quality parameters (e.g., data quality indices
σ
m
2
in the example) associated with the subset of data sources, respectively, to generate an updated global model, which then serves as a new current global model” [0096].
Krishnaswamy discloses “a global machine learning model may refer to a machine learning model configured as desired to be trained based on data residing in or stored by a plurality of data sources, that is, based on decentralized data, for a particular desirable practical application, such as a classification task” [0043].);
obtain an update matrix and a quality score from each local participant in the local participant subset (Krishnaswamy discloses “each of the plurality of training updates comprises a difference between the current global machine learning model and the local machine learning model trained by the respective data source based on the current global machine learning model and labelled data stored by the respective data source” [0050].
Krishnaswamy discloses data quality parameter (‘quality score’), “the current global machine learning model comprises determining a weighted average of the plurality of training updates based on the plurality of data quality parameters associated with the plurality of data sources, … each of the plurality of training updates is weighted based on the data quality parameter (e.g., data quality measure or index) associated with the corresponding data source. In various embodiments, the labelled data stored by the respective data source comprises features and labels, and the data quality parameter associated with the respective data source comprises at least one of a feature quality parameter associated with the features and a label quality parameter associated with the labels. In this regard, the feature quality parameter provides a measure or an indication of the quality of the features stored by the data source, and the label quality parameter provides a measure or an indication of the quality of the labels stored by the data source” [0051-0052].
Each data source (‘local participant’) trains a local machine learning model. A training update (‘update matrix’) is obtained by finding a difference between a current global machine learning model and a local machine learning model. A training update is weighted based on a data quality parameter (‘quality score’) associated with a corresponding data source. The current global model then determines a weighted average of training updates and data quality parameters of each data source (therefore training updates and data quality parameters are obtained from each data source).);
update the global machine-learning model based on the update matrix and the quality score from each local participant in the local participant subset (Krishnaswamy discloses “then update the current global model based on the plurality of training updates received and the plurality of data quality parameters (e.g., data quality indices
σ
m
2
in the example) associated with the subset of data sources, respectively, to generate an updated global model, which then serves as a new current global model” [0096].);
and deploy parameters and parameter weights of the global machine-learning model to each local participant in the local participant subset for building a local machine-learning model configured to perform the task … (Krishnaswamy discloses “the current global model Gt−1 is updated based on the plurality of training updates received and the plurality of data quality parameters associated with the subset of data sources (selected in the immediately previous round), respectively, to generate an updated global model Gt as a new current global model for the current round t, which may then then be transmitted to the selected subset of data sources” [0097].
Krishnaswamy discloses “With respect to the local data source function 606, at each round t, each of the selected subset of data sources (m local data sources) may update the current global model received to a new local model Lt+1(m) by training on their private data, such as shown in FIG. 6.” [0098].
Krishnaswamy discloses a local model performs classification (‘performs a task’), “Lclass(L,D): classification loss of model L tested on dataset D” [0090].
Krishnaswamy discloses Figure 6 (reproduced below) depicting an algorithm for weighted federated learning. An updated global model Gt is updated by aggregating Federation weights Wm and training updates of local models
∆
L
S
t
(
m
)
, see line 3. Each of the local data sources (‘local participants’) generates a new local model Lt+1(m) by setting the received updated global model Gt to the new local model (see line 13) and performing local training on their private data Dm (see line 17).
By setting a received updated global model Gt to a new local model, all of a global model’s parameters and parameter weights are received by a local data source (and therefore a global model’s parameters and parameter weights are deployed to each local data source).
PNG
media_image1.png
972
1115
media_image1.png
Greyscale
).
However, Krishnaswamy does not teach performing a machine-learning task in a hybrid operating room environment and local participants associated with a hybrid operating room, which is taught by Hassanzadeh:
A system for machine-learning in a hybrid operating room environment, the system comprising (The Examiner interprets “hybrid operating room environment” according to its broadest reasonable interpretation (BRI) in view of the applicant’s specification at [0036]. Accordingly, the examiner interprets this term as encompassing a surgery center that diagnoses patients using medical imaging as disclosed by Hassanzadeh below.
Hassanzadeh discloses “systems, methods, and computer-readable storage media that support secure training of machine learning (ML) models that preserves privacy in untrusted environments using distributed executable file packages” [Abstract].
Hassanzadeh discloses clients can be surgery centers (‘hybrid operating room environment’), “The client devices 140 and 150 may be owned, operated by, or otherwise associated with different clients of an entity associated with the server 102. … The various clients may be in the same industry, or related industries. For example, the clients may include different medical institutions, such as hospitals, doctors' offices, medical groups, specialists' offices, emergency rooms, surgery centers, urgent care clinics, outpatient clinics, dentists, psychiatrists, psychologists, therapists, or the like” [0028].
Hassanzadeh discloses “To illustrate, the ML model may be trained to output a predicted diagnosis of a patient based on input health data associated with the patient, and the clients may include or correspond to hospitals, doctors' offices, medical groups, specialists' offices, emergency rooms, surgery centers, urgent care clinics, outpatient clinics, dentists, psychiatrists, psychologists, therapists, or the like. In some implementations, the input medical data may include or indicate symptoms, test results, observations by medical personnel, scan results, medical images (e.g., magnetic resonance imaging (MRI), X-rays, fluoroscopic images, positron-emission tomography (PET) scan images, nuclear medicine imaging, computed tomography (CT) scan images, etc.)” [0034].):
a global server (Hassanzadeh discloses “the techniques described above may be extended to client-side training implementations to provide secure cooperative learning, such as federated learning or split learning. In some such implementations, the server 102 may provide a respective copy of the initial ML model parameter set 112 to each of the client devices 140 and 150 for client-side training” [0043].):
… a plurality of local participants associated with a hybrid operating room (Hassanzadeh discloses clients (‘local participants’) can correspond to surgery centers (‘hybrid operation room’), “various clients may be in the same industry, or related industries. For example, the clients may include different medical institutions, such as hospitals, doctors' offices, medical groups, specialists' offices, emergency rooms, surgery centers, urgent care clinics, outpatient clinics, dentists, psychiatrists, psychologists, therapists, or the like” [0028].);
subscribe the local participant … to a federated learning system to contribute to a global machine-learning model configured to perform a task in the hybrid operating room (Hassanzadeh discloses “the server 102 may provide a respective copy of the initial ML model parameter set 112 to each of the client devices 140 and 150 for client-side training, and the client devices 140 and 150 may execute respective copies of the executable file package 144 to cause the client devices 140 and 150 to implement a respective ML model, encrypt respective client data, and provide the encrypted client data to the respective ML model as training data. After the client devices 140 and 150 train the respective ML models, the client devices may provide respective trained ML model parameter sets to the server 102 for use in constructing the aggregated ML model parameter set 116” [0043].
Hassanzadeh discloses “The aggregate ML model implemented based on the aggregated ML model parameter set 116 may be configured to predict a diagnosis for a patient based on input data associated with the patient” [0040]. See [0034] describing input medical data can be medical images from surgery centers.);
obtain an update matrix … from each local participant … (Hassanzadeh discloses “the client devices may provide respective trained ML model parameter sets to the server 102 for use in constructing the aggregated ML model parameter set 116” [0043].);
and deploy parameters and parameter weights of the global machine-learning model to each local participant … for building a local machine-learning model configured to perform the task in the hybrid operating room (Hassanzadeh discloses “the server 102 may provide a respective copy of the initial ML model parameter set 112 to each of the client devices 140 and 150 for client-side training, and the client devices 140 and 150 may execute respective copies of the executable file package 144 to cause the client devices 140 and 150 to implement a respective ML model, encrypt respective client data, and provide the encrypted client data to the respective ML model as training data” [0043].
Hassanzadeh discloses “Each of the client devices may decrypt a respective received encrypted ML model parameter set based on a respective private key to implement a client-side diagnosis prediction model” [0040]. See [0034] describing medical images from surgery centers can be used to predict a diagnosis.).
Hassanzadeh teaches performing federated learning to train an aggregate ML model and client-side models using client data from surgery centers is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify the federated machine learning system of Krishnaswamy with the client devices disclosed by Hassanzadeh to train a machine learning model to perform diagnostic imaging in surgery centers. By training a machine learning model to perform diagnostic imaging in surgery centers, rapid and automated detection of anomalies can be achieved, thereby reducing the time needed by medical staff to interpret imaging results.
With respect to claim 3, the combination of Krishnaswamy in view of Hassanzadeh teaches:
the system of claim 1, wherein the local participant subset is selected based on a predefined selection criteria comprising at least one of: an availability schedule for each local participant in the federated learning system, the quality score for each local participant in the federated learning system, a mode of subscription to the federated learning system by each local participant in the federated learning system, and a quality of data generated from each local participant in the federated learning system (Krishnaswamy discloses a data source subset (‘local participant subset’) is selected based on quality ranges (‘quality of data’) for each data source, “With respect to the server function 602, at each round t, the aggregator server may select a subset (e.g., a random subset) of m data sources (1 to M) from a set of data sources (1 to N), and sends to the selected subset of data sources (e.g., corresponding to “the plurality of data sources” as described hereinbefore according to various embodiments) the most up-to-date (i.e., current) global model Gt. In various example embodiments, prior to selecting the subset of data sources, the method 600 further comprises binning the set of data sources (1 to N) into a plurality of intervals (bins) of K quality ranges, and then selecting the subset of data sources from the set of data sources (binned in the plurality of intervals) for federation for the current round t. In other words, the selection of the subset of M data sources may be based on data sources having been binned into a plurality (e.g., K) of quality ranges, which advantageously accounts for the varying quality ranges amongst the data sources” [0096].).
With respect to claim 4, the combination of Krishnaswamy in view of Hassanzadeh teaches:
the system of claim 1, wherein, to update the global machine-learning model, the one or more processors are further configured to: aggregate weights of parameters in the update matrix from each local participant based on the respective quality score from each local participant (Krishnaswamy discloses “The aggregator server may then receive a plurality of training updates (e.g., a difference
∆
L
S
t
in the example) from the selected subset of data sources, respectively (after the respective data source has generated the respective training update in response to the current global model received), and then update the current global model based on the plurality of training updates received and the plurality of data quality parameters (e.g., data quality indices
σ
m
2
in the example) associated with the subset of data sources, respectively, to generate an updated global model, which then serves as a new current global model” [0096].
Krishnaswamy discloses “the data quality parameter associated with the respective data source comprises at least one of a feature quality parameter associated with the features and a label quality parameter associated with the labels. In this regard, the feature quality parameter provides a measure or an indication of the quality of the features stored by the data source, and the label quality parameter provides a measure or an indication of the quality of the labels stored by the data source” [0052].
An aggregator server receives training updates from each data source (‘local participant’). A training update
∆
L
S
t
(‘update matrix’) is the difference between a current global machine learning model and a local machine learning model trained by a respective data source (see [0050]), therefore a training update contains updated parameters.
Krishnaswamy discloses Figure 6 (reproduced above) depicting a global model Gt is updated by aggregating Federation weights Wm and training updates of local models
∆
L
S
t
(
m
)
, see line 3. Federation weights Wm for a local model are computed using a data quality index
σ
m
2
, see line 21. A data quality index (data quality parameters) measures the quality of the features and labels stored by a data source (therefore data quality parameters are a “quality score”).);
and incorporate the aggregated weights of the parameters in the update matrix into parameter weights of corresponding parameters in the global machine-learning model (Krishnaswamy discloses Figure 6 depicting a global model Gt is updated by aggregating Federation weights Wm and training updates of local models
∆
L
S
t
(
m
)
, see line 3. By updating a global model using aggregated federation weights and training updates, the weights and parameters of local models are therefore incorporated into the global model.).
With respect to claim 5, the combination of Krishnaswamy in view of Hassanzadeh teaches:
the system of claim 1, wherein to deploy the parameters of the global machine-learning model, the one or more processors are further configured to: send the parameters of the global machine-learning model to each local participant in the federated learning system (Krishnaswamy discloses “an updated global model Gt as a new current global model for the current round t, which may then then be transmitted to the selected subset of data sources” [0097].
Krishnaswamy discloses “With respect to the local data source function 606, at each round t, each of the selected subset of data sources (m local data sources) may update the current global model received to a new local model Lt+1(m) by training on their private data, such as shown in FIG. 6.” [0098].
By setting a received updated global model Gt to a new local model, all of a global model’s parameters and parameter weights are received by a local data source (and therefore a global model’s parameters are sent to each local data source).).
With respect to claim 7, the combination of Krishnaswamy in view of Hassanzadeh teaches:
the system of claim 1, wherein the global machine-learning model comprises weighting of local machine-learning model parameter contributions to the parameters of the global machine-learning model according to one or more of: institute criteria on organizational characteristics of each local participant (Krishnaswamy discloses “each training update received may be modified or adjusted (e.g., weighted) based on the data quality parameter associated with the corresponding data source (i.e., the data source which the training update is received from)” [0045]. See [0096] describing how training updates from data sources (‘local participants’) are aggregated to update a global model.
Krishnaswamy discloses “the data quality parameter associated with the respective data source comprises at least one of a feature quality parameter associated with the features and a label quality parameter associated with the labels. In this regard, the feature quality parameter provides a measure or an indication of the quality of the features stored by the data source … one or more of the plurality of data quality parameters are each based on at least one of a first data quality factor” [0052-0053].
Krishnaswamy discloses “in the case of the data source belonging to a centre or an organization, the first data quality factor may be a centre quality index (Qc) determined based on a centre reputation (R), an annotator competence (C), and the method of annotation (M). In this regard, the centre reputation (R), the annotator competence (C), and the method of annotation (M) may each be assigned (or graded) a value ranging from 0 to 1. For example, a value of 0 may correspond to the worst level and a value of 1 may correspond to the best level. In relation to the centre reputation (R), for example, the most reputable centre may be assigned a value of 1, less reputable centres may be assigned values between 0 and 1 accordingly. In various example embodiments, prospective centres having higher reputations than the centre assigned a value of 1 may be assigned values higher than 1 to reflect their expected superiority in data quality. In relation to the annotator competence (C), for example, annotators may be ranked by amount of experience and specialisation/subspecialisation relevant to the labelling task. The highest ranked annotator may be assigned a value of 1, and other annotators may be assigned values between 0 and 1 according to rank. Similarly, prospective annotators deemed more skillful than the highest ranked annotator may be assigned values higher than 1. In relation to method of annotation (M), manual annotation may be assumed to be the best and assigned a value of 1. The first data quality factor may then be determined by multiplying the values of the centre reputation (R), the annotator competence (C), and the method of annotation (M) together, such as illustrated in FIG. 7A” [0103].
A data source belonging to an organization can determine a centre quality index based on centre reputation, annotator competence, and method of annotation (‘institute criteria’). Each of the criterion can be assigned values to reflect a centre’s reputation, annotator competence, or method of annotation. Therefore, these values describe each criterion (and therefore the values are organizational characteristics).); imaging system criteria based on technical specifications of imaging systems in the hybrid operating rooms of each local participant; procedure criteria based on characteristics of procedures performed in the hybrid operating rooms; healthcare processional criteria based on characteristics of medical personnel working for each local participant; and patient criteria based on characteristics of patients receiving the procedures performed in the hybrid operating rooms.
With respect to claim 12, the rejection of claim 1 is incorporated. The difference in scope being:
A computer-implemented method for machine-learning (Krishnaswamy discloses “a method of federated machine learning, and a system thereof, that seek to … improving the accuracy and/or reliability of federated machine learning” [0040].).
With respect to claim 14, the claim recites similar limitations corresponding to claim 3, therefore the same rationale of rejection is applicable.
With respect to claim 15, the claim recites similar limitations corresponding to claim 4, therefore the same rationale of rejection is applicable.
With respect to claim 17, the claim recites similar limitations corresponding to claim 7, therefore the same rationale of rejection is applicable.
Claims 2 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnaswamy in view of Hassanzadeh, further in view of Aghaei et al. (WO 2021163213 A1), hereinafter Aghaei.
With respect to claim 2, the combination of Krishnaswamy in view of Hassanzadeh teaches the system of claim 1, however, the combination does not teach validating an updated global machine-learning model prior to deployment, which is taught by Aghaei:
wherein the one or more processors are further configured to: ascertain that the updated global machine-learning model needs to be validated prior to the deployment to each local participant in the federated learning system (Aghaei discloses “the performance of the updated global model 112, 114 may also be validated with a validation dataset. If the global model 112, 114 has been improved, the updated global model 112, 114 may be distributed to all or some of the client devices 120, 130, 140” [0055].);
and validate that the updated global machine-learning model performs better than a pre-updated version of the global machine-learning model, by use of a validation dataset stored in the global server from which the global machine-learning model is trained (See [0055] disclosing how a global model can be validated with a validation dataset prior to distributing the global model to client devices. To determine if the performance of a global model has improved, prior performance of the global model must be known (therefore a global model’s performance is validated based on a previous version of the global model).
Aghaei discloses global model training and deployment is performed at a centralized server, “one or more servers 110 configured to maintain and distribute one or more global models 112, 114 … the centralized server 110 may then utilize the local models 128,138, 148, 150 to update the global model 112” [0054].).
Aghaei teaches determining global model performance with a validation dataset prior to global model deployment is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify the federated machine learning system of Krishnaswamy with the technique disclosed by Aghaei to validate a global model before deployment to client devices. By validating a global model before deployment to client devices, only global model updates that have performance-enhancing improvements can be applied to local models, thereby maintaining or improving local model accuracy.
With respect to claim 13, the claim recites similar limitations corresponding to claim 2, therefore the same rationale of rejection is applicable.
Claims 6, 8-10, 16 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnaswamy in view of Hassanzadeh, further in view of Mu et al. (US 20240023082 A1), hereinafter Mu.
With respect to claim 6, the combination of Krishnaswamy in view of Hassanzadeh teaches the system of claim 1, however, the combination does not teach obtaining local and global statistics, which is taught by Mu:
wherein the one or more processors are further configured to: obtain local statistics on predetermined aspects of each local participant in the local participant subset (Mu discloses “Herein, since the base station may perform data interaction with a plurality of UE, the plurality of UE may participate in the federated learning corresponding to the base station. There is a difference in the probability distribution of data of the local dataset of each of the at least one UE and data of the data set of the plurality of UE associated with the base station, or in the probability distribution of data of the local dataset of each of the at least one UE and data of the global dataset obtained through operations” [0054].
Mu discloses “the probability distribution of the local dataset obtained by the UE through statistics is denoted as P(Xm)=[P(x1), P(x2), …, P(xn)], where P(xi) represents the probability of Xm taking the event as xi. The base station counts the distribution of the global dataset based on the statistical result of the probability distribution of the local dataset reported by each UE, and the probability distribution of the global dataset is denoted as
P
(
X
g
)
=
Σ
P
(
X
m
)
” [0057].
The local dataset of a UE (‘local participant’) is a predetermined aspect of the UE since the local dataset was stored, organized, and structured before obtaining a probability distribution. A probability distribution of a UE’s local dataset is then obtained using statistical probabilities (therefore a probability distribution is local statistics). A probability distribution is obtained for each of the UE’s that participate in federated learning.);
analyze correlations between the predetermined aspects and a task of the global machine-learning model as represented by differences in the local statistics and global statistics on the predetermined aspects in the federated learning system (Mu discloses “The base station counts the distribution of the global dataset based on the statistical result of the probability distribution of the local dataset reported by each UE, and the probability distribution of the global dataset is denoted as
P
X
g
=
Σ
P
X
m
. The base station may obtain the above statistical information of the distribution difference based on the probability distribution of the UE and the probability distribution of the global dataset described above. The statistical information of the distribution difference is denoted as
Δ
P
m
=
∥
P
(
X
g
)
-
P
(
X
m
)
∥
, the meaning of which may be the difference in values of the probability distribution of each data type, or the difference in data types included in the probability distribution” [0057].
A probability distribution of the global dataset
P
(
X
g
)
(‘global statistics’) is obtained by summing the probability distributions of each UE. Statistical information
Δ
P
m
(‘correlations’) is obtained by finding the distribution difference between a probability distribution of a UE
P
(
X
m
)
and a probability distribution of the global dataset
P
(
X
g
)
.
See [0054] above describing how a probability distribution difference can be calculated for each UE.);
and adjust the parameter weights of the parameters in the global machine-learning model based on the correlations (Mu discloses “It should be noted that the above federated learning is a process of model training jointly participated by the base station and each UE. The UE trains the local model locally and reports the training result to the base station. The base station performs weighted averaging and other processing based on the reported result and the weight coefficient of each UE to obtain the global learning model” [0153].
Mu discloses “the weight coefficient of the UE in the federated average learning is calculated based on the probability distribution difference between the local dataset of the UE and the global dataset, which may be expressed as the following formula:
PNG
media_image2.png
178
244
media_image2.png
Greyscale
where M represents the total number of the UE participating in the federated learning, am represents the weight of the local learning model of the UEm in the federated averaging process, and
Δ
P
m
represents the probability distribution difference between the local dataset of each user and the global dataset. In the step S804, the base station performs the federated averaging to obtain an updated result of the global learning model” [0230-0232].).
Mu teaches updating a global learning model by using weight coefficients based on probability distribution differences is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify the federated machine learning system of Krishnaswamy with the weight coefficients disclosed by Mu to adjust global model parameters based on local and global distribution differences. By adjusting global model parameters based on local and global distribution differences, a local model’s contribution to a global model can be weighted based on how different their distributions are, thereby improving global model convergence.
With respect to claim 8, Krishnaswamy teaches:
the system of claim 1, further comprising: a local client of a local participant in the local participant subset (Krishnaswamy discloses data sources (‘local participants’) can be portable computers (‘local clients’), “the plurality of data sources may be referred to as participants (e.g., a federated learning population) in the federated machine learning provided by the server. For example, the plurality of data sources may each be embodied as a device or system having data (labelled data for training) stored therein, such as but not limited to, a storage system (e.g., for an enterprise or an organization, such as a local data storage server) or a storage device (e.g., for an individual, such as a mobile phone, a tablet, a portable computer” [0042].
Krishnaswamy discloses “With respect to the server function 602, at each round t, the aggregator server may select a subset (e.g., a random subset) of m data sources (1 to M) from a set of data sources (1 to N)” [0096].),
the local client comprising one or more local processors configured to (See [0042] describing how a data source (‘local participant’) can be a portable computer (‘local client’). A processor is implied by the use of a portable computer.):
build the local machine-learning model configured to perform the task … of the local participant based on the parameters and the parameter weights of the deployed global machine-learning model (Krishnaswamy discloses a local model performs classification (‘performs a task’), “Lclass(L,D): classification loss of model L tested on dataset D” [0090].
Krishnaswamy discloses Figure 6 (reproduced above) depicting an algorithm for weighted federated learning. An updated global model Gt is updated by aggregating Federation weights Wm and training updates of local models
∆
L
S
t
(
m
)
, see line 3. Each of the local data sources (‘local participants’) generates a new local model Lt+1(m) by setting the received updated global model Gt to the new local model (see line 13) and performing local training on their private data Dm (see line 17).
By setting a received updated global model Gt to a new local model, all of a global model’s parameters and parameters weight are received by a local data source (and therefore a global model’s parameters and parameter weights are deployed to each local data source for local training).),
wherein the local participant includes a local dataset with image data and parameters associated with medical imaging equipment … of the local participant (Krishnaswamy discloses “the plurality of data sources may each be embodied as a device or system having data (labelled data for training) stored therein” [0042].
Krishnaswamy discloses “the features of the labelled data are related to images (i.e., features of images), and the second data quality factor is based on at least one of the image acquisition characteristics and the level of image artifacts in the images. By way of examples only and without limitation, the image acquisition characteristics may be defined based on specifications of the imaging equipment employed, parameter settings for image acquisition, and/or consistency of the patient history to requirements for high quality scans. For example, it may be possible that images acquired with different equipment or settings could be of lower quality (lower feature quality). Further, over or under exposure and/or presence of motion artifacts may make interpretation difficult for some images” [0104].);
train the local machine-learning model with selected instances from the local dataset … of the local participant (Krishnaswamy discloses “With respect to the local data source function 606, at each round t, each of the selected subset of data sources (m local data sources) may update the current global model received to a new local model Lt+1(m) by training on their private data, such as shown in FIG. 6.” [0098].);
generate an update matrix representing a collection of respective differences in weights between respective parameters of the local machine-learning model before and after a learning cycle based on the local dataset (Krishnaswamy discloses Figure 6 (reproduced above) depicting an algorithm for weighted federated learning. An updated global model Gt is updated by aggregating Federation weights Wm and training updates of local models
∆
L
S
t
(
m
)
from current round t, see line 3. Each of the local data sources (‘local participants’) generates a new local model Lt+1(m) by setting the received updated global model Gt to the new local model (see line 13) and performing local training on their private data Dm (see line 17). A training update
∆
L
S
t
+
1
(‘update matrix’) for the next round t+1 is then acquired by finding the difference between updated global model Gt and the trained new local model Lt+1(m), see line 19. Global model Gt represents a previous round’s aggregate local model parameters, and a trained local model Lt+1(m) represents local model parameters after the current round t.);
send, to the global server, the update matrix, the quality score … (Krishnaswamy discloses “The aggregator server may then receive a plurality of training updates (e.g., a difference
∆
L
S
t
in the example) from the selected subset of data sources, respectively (after the respective data source has generated the respective training update in response to the current global model received), and then update the current global model based on the plurality of training updates received and the plurality of data quality parameters (e.g., data quality indices
σ
m
2
in the example) associated with the subset of data sources, respectively, to generate an updated global model, which then serves as a new current global model” [0096].
A global model is updated based on training updates and data quality parameters (‘quality score’), therefore training updates and data quality parameters are sent to an aggregator server.);
receive the parameters and the parameter weights of the global machine-learning model updated based on the update matrix, the quality score, … sent from each local participant in the federated learning system (Krishnaswamy discloses [0096] above describing how a global model can be updated based on training updates and data quality parameters received from the subset of data sources (‘local participants’).
Krishnaswamy discloses “With respect to the local data source function 606, at each round t, each of the selected subset of data sources (m local data sources) may update the current global model received to a new local model Lt+1(m) by training on their private data, such as shown in FIG. 6.” [0098].
By setting a received updated global model to a new local model Lt+1(m), all of a global model’s updated parameters and parameters weights are received by a local data source.);
and update the local machine-learning model for the local participant based on the updated parameters and the updated parameter weights of the global machine-learning model (See Figure 6 disclosed above describing how each of the local data sources (‘local participants’) generates a new local model Lt+1(m) by setting the received updated global model Gt to the new local model (see line 13) and performing local training on their private data Dm (see line 17).).
However, Krishnaswamy does not teach performing a machine-learning task in a hybrid operating room environment of a local participant, which is taught by Hassanzadeh:
build the local machine-learning model configured to perform the task in the hybrid operating room of the local participant (Hassanzadeh discloses client devices (‘local participants’) can implement local models, “the server 102 may provide a respective copy of the initial ML model parameter set 112 to each of the client devices 140 and 150 for client-side training, and the client devices 140 and 150 may execute respective copies of the executable file package 144 to cause the client devices 140 and 150 to implement a respective ML model, encrypt respective client data, and provide the encrypted client data to the respective ML model as training data” [0043].
Hassanzadeh discloses local models can predict a diagnosis, “Each of the client devices may decrypt a respective received encrypted ML model parameter set based on a respective private key to implement a client-side diagnosis prediction model” [0040].
Hassanzadeh discloses medical images from surgery centers (‘hybrid operating room’) can be used to predict a diagnosis, “To illustrate, the ML model may be trained to output a predicted diagnosis of a patient based on input health data associated with the patient, and the clients may include or correspond to … surgery centers, … the input medical data may include or indicate symptoms, test results, observations by medical personnel, scan results, medical images (e.g., magnetic resonance imaging (MRI), X-rays, fluoroscopic images, positron-emission tomography (PET) scan images, nuclear medicine imaging, computed tomography (CT) scan images, etc.)” [0034].),
wherein the local participant includes a local dataset with image data and parameters associated with medical imaging equipment of the hybrid operating room of the local participant (Hassanzadeh discloses “input medical data may include or indicate symptoms, test results, observations by medical personnel, scan results, medical images (e.g., magnetic resonance imaging (MRI), X-rays, fluoroscopic images, positron-emission tomography (PET) scan images, nuclear medicine imaging, computed tomography (CT) scan images, etc.), patient information (e.g., height, weight, age, gender, etc.), patient medical history, patient family history, current medications, currently diagnosed conditions, other information” [0034].);
Hassanzadeh teaches training client-side models using client data from surgery centers is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify the federated machine learning system of Krishnaswamy with the client-side machine learning models disclosed by Hassanzadeh to train a machine learning model to perform diagnostic imaging in surgery centers. By training a machine learning model to perform diagnostic imaging in surgery centers, rapid and automated detection of anomalies can be achieved, thereby reducing the time needed by medical staff to interpret imaging results.
Furthermore, the combination of Krishnaswamy in view of Hassanzadeh does not teach training a machine-learning model with local statistics of a local participant, which is taught by Mu:
train the local machine-learning model with selected instances from the local dataset and local statistics of the local participant (Mu discloses a UE (‘local participant’) can participate in federated learning, “Herein, since the base station may perform data interaction with a plurality of UE, the plurality of UE may participate in the federated learning corresponding to the base station” [0054].
Mu discloses “the probability distribution of the local dataset obtained by the UE through statistics is denoted as P(Xm)=[P(x1), P(x2), …, P(xn)], where P(xi) represents the probability of Xm taking the event as xi. The base station counts the distribution of the global dataset based on the statistical result of the probability distribution of the local dataset reported by each UE, and the probability distribution of the global dataset is denoted as
P
(
X
g
)
=
Σ
P
(
X
m
)
. The base station may obtain the above statistical information of the distribution difference based on the probability distribution of the UE and the probability distribution of the global dataset described above. The statistical information of the distribution difference is denoted as
Δ
P
m
=
∥
P
(
X
g
)
-
P
(
X
m
)
∥
, the meaning of which may be the difference in values of the probability distribution of each data type, or the difference in data types included in the probability distribution” [0057].
A probability distribution of a UE’s local dataset is obtained using statistical probabilities (therefore a probability distribution of the UE is “local statistics”). A probability distribution
P
(
X
m
)
is obtained for each of the UE’s that participate in federated learning. A probability distribution of the global dataset
P
(
X
g
)
is obtained by summing the probability distributions of each UE. Statistical information
Δ
P
m
is obtained by finding the distribution difference between a probability distribution of a UE
P
(
X
m
)
and a probability distribution of the global dataset
P
(
X
g
)
.
Mu discloses a distribution difference can be used to calculate a weight coefficient, “the weight coefficient of the UE in the federated average learning is calculated based on the probability distribution difference between the local dataset of the UE and the global dataset, which may be expressed as the following formula … the base station performs the federated averaging to obtain an updated result of the global learning model” [0230-0232].
Mu discloses “It should be noted that the above federated learning is a process of model training jointly participated by the base station and each UE. The UE trains the local model locally and reports the training result to the base station. The base station performs weighted averaging and other processing based on the reported result and the weight coefficient of each UE to obtain the global learning model” [0153].
Mu discloses “in the federated learning, the UE perceives and collects data to generate the local dataset, and processes the local dataset to generate a local training set; the UE randomly initializes a local model parameter, performs local learning model training by using a local training set, and uploads the training result to the core network or data center; the base station requests the local training result of the UE from the core network or data center, and obtains an updated result of the global learning model by performing the federated average learning using the local learning result of each UE; the base station feeds back the updated result to the UE via the network, and the UE fine tunes the local model based on the result of the feedback; the above process is repeated until the model accuracy satisfies the requirements” [0059].
A weight coefficient (derived from a probability distribution of a UE’s local dataset (local statistics)) and local learning results of each UE can be used to update a global learning model. The updated global learning model then sends an updated result that a UE can use to fine tune its local model (therefore a local model is trained using a local dataset and local statistics of a UE).);
send, to the global server, the update matrix, … and the local statistics for the local participant (See [0059, 0153] above describing how a weight coefficient and local learning results (‘update matrix’) from each UE (‘local participant’) are used to update a global learning model (and therefore, weight coefficients and local learning results are sent to the base station that performs federated learning).);
receive the parameters and the parameter weights of the global machine-learning model updated based on the update matrix, …, and the local statistics sent from each local participant in the federated learning system (See [0059, 0153] above describing how a weight coefficient and local learning results from each UE (‘local participant’) are used to update a global learning model.
See also [0059] describing how an updated global learning model sends an updated result to each UE so that each UE can fine tune its local model.
Mu discloses an updated result of a global learning model is weights, “the updated result of the global learning model is:
w
t
=
∑
m
=
1
M
a
m
w
t
-
1
m
(
K
)
” [0233].).
Mu teaches training local models by using weight coefficients based on probability distribution differences is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify the federated machine learning system of Krishnaswamy with the weight coefficients disclosed by Mu to train local models with weight coefficients based on distribution differences. By training local models with weight coefficients based on distribution differences, a local model’s contribution to a global model can be weighted based on how different their distributions are, thereby improving global model convergence.
With respect to claim 9, the combination of Krishnaswamy in view of Hassanzadeh, further in view of Mu teaches:
the system of claim 8, wherein the one or more local processors are further configured to: compute the quality score of the local participant to indicate a level of contribution to the global machine-learning model by the local participant based on the local dataset (Krishnaswamy discloses data quality parameter (‘quality score’) can be used to weigh a training update, “each training update received may be modified or adjusted (e.g., weighted) based on the data quality parameter associated with the corresponding data source (i.e., the data source which the training update is received from)” [0045]. See [0096] describing how training updates from data sources (‘local participants’) are aggregated to update a global model.
Krishnaswamy discloses “one or more of the plurality of data quality parameters are each based on at least one of a first data quality factor” [0052-0053].
Krishnaswamy discloses “in the case of the data source belonging to a centre or an organization, the first data quality factor may be a centre quality index (Qc) determined based on a centre reputation (R), an annotator competence (C), and the method of annotation (M). In this regard, the centre reputation (R), the annotator competence (C), and the method of annotation (M) may each be assigned (or graded) a value ranging from 0 to 1. For example, a value of 0 may correspond to the worst level and a value of 1 may correspond to the best level. In relation to the centre reputation (R), for example, the most reputable centre may be assigned a value of 1, less reputable centres may be assigned values between 0 and 1 accordingly” [0103].
By adjusting a training update based on a data quality parameter, the resulting weight represents how much a data source’s training update contributes towards updating a global model.).
With respect to claim 10, the combination of Krishnaswamy in view of Hassanzadeh, further in view of Mu teaches:
the system of claim 8, wherein the local statistics stored in the local dataset comprises historical data generated from or observed at the local participant (Mu discloses “in the federated learning, the UE perceives and collects data to generate the local dataset, and processes the local dataset to generate a local training set; the UE randomly initializes a local model parameter, performs local learning model training by using a local training set, and uploads the training result to the core network or data center” [0059].
Mu discloses “During the process of the federated learning, the UE needs to utilize locally collected data. The locally collected data may be wireless network data, that is, data generated in the process of business use of a user. The UE generates the local dataset based on the collected data. If the data volume of the local dataset is large, data extraction may also be carried out, for example, a portion of data is extracted using sampling as the local training dataset” [0158].
See [0057] describing how a probability distribution (‘local statistics’) of a UE’s local dataset is obtained using statistical probabilities.).
Mu teaches generating a local dataset based on data collected from a user participating in federated learning is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify the federated machine learning system of Krishnaswamy with the local dataset disclosed by Mu to train local models with user-specific data. By training local models with user-specific data, a user can train a model that is highly customized to their data, thus developing a personalized, domain-specific model.
With respect to claim 16, the claim recites similar limitations corresponding to claim 6, therefore the same rationale of rejection is applicable.
With respect to claim 18, the claim recites similar limitations corresponding to claim 8, therefore the same rationale of rejection is applicable.
With respect to claim 19, the claim recites similar limitations corresponding to claim 9, therefore the same rationale of rejection is applicable.
Claims 11 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnaswamy in view of Hassanzadeh, further in view of Mu and Luo et al. (“No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data”), hereinafter Luo.
With respect to claim 11, the combination of Krishnaswamy in view of Hassanzadeh, further in view of Mu teaches the system of claim 8, however, the combination does not teach downloading a local learning tool from a global server, which is taught by Luo:
wherein the one or more local processors are further configured to: download one or more local learning tools from the global server (Luo discloses “the server first sends the feature extractor
f
θ
̂
of the trained global model to clients” (P. 6, Sec. 4, Last Paragraph). A server sends a feature extractor (‘local learning tool’) to clients (therefore clients must download the feature extractor to use it).);
analyze image data stored in the local dataset of the local participant by use of the one or more local learning tools (Luo discloses “Client k has a local dataset Dk, and we set D … as the whole dataset. … Denote by
x
,
y
∈
X
×
[
C
]
a sample in D, where x is an image in the input space X and y is its corresponding label. … Given a sample (x, y), the feature extractor
f
θ
: X → Z, parameterized by θ, maps the input image x into a feature vector
z
=
f
θ
x
ϵ
R
d
in the feature space Z. Then the classifier … parameterized by
φ
, produces a probability distribution
g
φ
z
as the prediction for x. Denote by
w
=
(
θ
,
φ
)
the parameter of the classification model” (P. 3-4, Sec. 3.1, ¶1).
A feature extractor (‘local learning tool’) is used to extract features of an image sample into a feature vector z (therefore analyzing image data stored in a dataset of a client). A classifier uses feature vector z to generate a prediction for the image sample.);
and send statistics resulting from the analyzed image data to the global server with weights of features, from the image data, parametrized in the local machine-learning model (Luo discloses “Client k produces features
{
z
c
,
k
,
1
,
.
.
.
,
z
c
,
k
,
N
c
,
k
}
for class c, … and computes local mean
μ
c
,
k
and covariance
Σ
c
,
k
of
D
c
k
” (P. 6, Sec. 4, Last Paragraph).
Luo discloses Equation 2 on P. 6 (reproduced below) depicting equations for calculating local mean and covariance using features extracted from images in a client dataset (therefore local mean and covariance are statistics resulting from analyzing image data).
PNG
media_image3.png
328
1117
media_image3.png
Greyscale
Luo discloses local mean and covariance are sent to a server, “Then client k uploads {(
μ
c
,
k
,
Σ
c
,
k
): c ∈ [C]} to server. For the server to compute the global statistics of D, it is sufficient to represent the global mean
μ
c
and covariance
Σ
c
using
μ
c
,
k
’s and
Σ
c
,
k
’s for each class c” (P. 7, Sec. 4, ¶1).
Luo discloses optimized parameters are sent to a server, “Federated learning proceeds through the communication between clients and the server in a round-by-round manner. In round t of the process, the server sends the current model parameter w(t−1) to a set U(t) of selected clients. Then each client k ∈ U(t) locally updates the received parameter w(t−1) to
w
k
(
t
)
with the following objective … where L is the loss function. … In the end of round t, the selected clients send the optimized parameter back to the server and the server updates the parameter by aggregating heterogeneous parameters” (P. 4, Sec. 3.1, ¶2).
Luo discloses Equation 1 on P. 4 (reproduced below) depicting an equation for updating a received parameter w(t−1) to
w
k
(
t
)
. A client updates the received parameter using a loss function that uses image samples from a local dataset.
PNG
media_image4.png
311
1141
media_image4.png
Greyscale
),
wherein the weights of the features are calculated based on contribution of the parametrized features from the image data to a task set configured to be performed by the local machine-learning model (Luo discloses Equation 1 on P. 4 (reproduced above) depicting an equation for updating a received parameter w(t−1) to
w
k
(
t
)
. A client updates the received parameter using a loss function that uses image samples from a local dataset. Local weights updated by a loss function indicate how much influence a client’s local image samples have towards a prediction.).
Luo teaches sending a feature extractor to clients for feature image extraction is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify the federated machine learning system of Krishnaswamy with the feature extractor disclosed by Luo to have local models use the same feature extractor tool. By having each local model use the same feature extractor tool, each local model can extract features from images in a consistent and uniform manner, thereby ensuring extracted features are comparable and stable for aggregation.
With respect to claim 20, the claim recites similar limitations corresponding to claim 11, therefore the same rationale of rejection is applicable.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PEDRO J MORALES whose telephone number is (571)272-6106. The examiner can normally be reached 8:30 AM - 6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA M HUANG can be reached at (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PEDRO J MORALES/Examiner, Art Unit 2124
/VINCENT GONZALES/Primary Examiner, Art Unit 2124