Office Action Analysis: 17934098 — MODEL DECORRELATION AND SUBSPACING FOR FEDERATED LEARNING

Office Action

§102 §103
Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the amendment filed on Nov. 25th, 2025. The amendments are linked to the original application filed on Sept. 21st. 2022.

Response to Amendment
The Examiner thanks the applicant for the remarks, edits and arguments.
Regarding Claim Rejections – 35 USC 102
Applicant Remarks:
	The applicant argues that Goodsitt does not disclose every limitation set for in the claims. The applicant states as an example that Goodsitt does not disclose, “partitioning a machine learning model into a plurality of partitions”.
	Further, the applicant argues that Goodsitt fails to disclose “receiving, from a server, information defining an orthogonal partition a machine learning model to be update and constrains for updating the orthogonal partition.” As stated in claim 15.
	According to the applicant since Goodsitt does not explicitly teach each and every limitation set forth in the claims, therefore the applicant argues that the rejection under 35 U.S.C. 102 should be withdraw.

Examiner Response:
	The applicant argues that Godsitt fails to disclose, “partitioning a machine learning model into a plurality of partitions”. The examiner would like to point to the MPEP 2131 which discloses Anticipation for 35 U.S.C. 102 rejections, it states, “A claimed invention may be rejected under 35 U.S.C. 102 when the invention is anticipated (or is "not novel") over a disclosure that is available as prior art. To reject a claim as anticipated by a reference, the disclosure must teach every element required by the claim under its broadest reasonable interpretation”. This section states that an art can be used to reject claims if the art discloses every element of the claim under the broadest reasonable interpretation. The broadest reasonable interpretation of the limitation “partitioning a machine learning model into a plurality of partitions” would be dividing a machine learning model in multiple subsets of the original model. This interpretation includes the elements of dividing, or partitioning, a machine learning model into a set of sub models, plurality of partitions. Considering this, Goodsitt discloses the use of “distributed instances” of a machine learning model. These “distributed instances” are sent to different client devices in a federated system, which is stated in the specification. Goodsitt further discloses a “distributed instance” as, “As an example, some embodiments may characterize a distributed instance of a machine learning model with a target machine learning hyperparameter, where the hyperparameter may include a number of hidden layers, a learning rate for an optimization algorithm, a value indicating an activation function, a number of activation units in a layer, a number of clusters for a clustering analysis operation, a pooling size of a convolutional neural network, a filter size of a convolutional neural network, etc. Some embodiments may determine a required set of metrics based on an obtained machine learning hyperparameter, where the hyperparameter may be mapped to a set of requirements for client computing device metrics.” (Goodsitt, Col. 11, Ln. 29-41). As stated, a “distributed instance” can be characterized as different hyperparameters, this would include a “number of hidden layers” and other machine learning parameters such as activations, weights and layer values. Using this, the examiner believes that the “distributed instances” disclosed in Goodsitt can be considered to be sub models of a global model. This is further disclosed in Goodsitt which discloses, “Some embodiments may send a distributed instance of a machine learning model having multiple sub-models to a client computing device, where the sub-models of the distributed instance stored on the client computing device may be described as client sub-models. For example, some embodiments may distribute an instance of a machine learning model having a first sub-model to determine context tokens from a sequence of tokens and a second sub-model to determine whether a target token is sensitive based on the context tokens. As described elsewhere, a client computing device may then update the first client sub-model and the second client sub-model by using information stored on the client computing device, entered via a UI, or accessed via an application program interface (API).” (Goodsitt, Col. 11, Ln. 3-16). This again teaches that the distributed instances disclosed in Goodsitt can be considered to be portioned instances of a global machine learning model. Finally, Goodsitt discloses the actual partitioning of a model as well. Goodsit discloses, “Some embodiments may select a hyperparameter based on information provided by a client computing device and send the selected hyperparameter to the client computing device. For example, based on a determination that a first client computing device reported that it had 10 GB of RAM available or that it had a processor of a first set of processors, some embodiments may select a first hyperparameter value "4" representing four hidden layers. Based on a determination that a second client computing device reported that it had 4 GB of RAM available or that it had a processor of a second set of processors, some embodiments may select a second hyperparameter value "2" representing two hidden layers.” (Goodsitt, Col. 11, Ln. 53-65). This discloses the server is able to evaluate a client’s device and determine what distributed instance should be sent. This teaches that a global model would need to be split, or partition, in some way to accommodate the system requirements of a client’s device. The examiner, as stated above, believes that Goodsitt discloses every element of the limitation in question. Goodsitt discloses the use of partitions, splitting/dividing, a machine learning model into distributed instances and sending the distributed instance to different computing devices in a federated system.
	Next the applicant argues that Goodsitt fails to disclose, “receiving from a server, information defining an orthogonal partition in a machine learning model to be updated and contains for updating the orthogonal partition.”. The examiner would fist like to point to the above description of a “distributed instance” and the definition for Anticipation stated in MPEP 2131. Taking those into consideration, the examiner would like to point to Goodsitt which discloses a server, “The client computing device 102 or other computing devices may send and receive messages through the network 150 to communicate with a server 120, where the server 120 may include a non-transitory storage medium storing program instructions to perform one or more operations of subsystems 124-126.” (Goodsitt, Col. 3, Ln. 31-36) which discloses a computing system that communicates with client devices via a network connection. Further Goodsitt discloses, “For example, the server 120 may send, to a client computing device, a set of parameters representing a distributed instance of a random forest model. The client computing device may then perform a set of learning operations that causes the client computing device to update the distributed instance and send the updated model parameters back to the server 120.” (Goodsitt, Col. 3, Ln 59-65), which states that the server communicates with client devices and is able to send the client devices a “distributed instance”, again the examiner is using the definition of a distributed instance as stated above. This would disclose that Goodsitt teaches a system which contains a server which is able to send and receive information with client devices. Further as stated, the client devices will receive the sent information and is able to update the sent information and send that updated information back to the server for processing. Finally, the examiner would like to look at the interpretation of an Orthogonal partition. The examiner would like to point to the applicant specification which states, “The orthogonal partition generally includes a partition for a first participating device in a federated learning scheme that is decorrelated from a partition for a second participating device in the federated learning scheme.” (Brief Summary, pp. 2, [0006]). The definition of decorrelation, which is not a valid word in either the Oxford English Dictionary and the Merriam-Webster dictionary, is split into the prefix “de-” and the valid word “correlate”. Per Merriam-Webster the prefix “de-” means: “do the opposite of”, “reduce”, “remove (a specified thing) from” or “remove from (a specified thing)” (https://www.merriam-webster.com/dictionary/DE).  The definition of “correlate” per the Merriam-Webster dictionary, “either of two things so related that one directly implies or is complementary to the other (such as husband and wife)” (https://www.merriam-webster.com/dictionary/correlate). Combining the prefix and the base word would lead to the interpretation: removing correlations between correlated objects or things. With this definition and the limitation, “receiving from a server, information defining an orthogonal partition in a machine learning model to be updated and contains for updating the orthogonal partition.” would lead one to believe that the “orthogonal partition” is a decorrelated partition, meaning it contains information which is different from one partition to the next. Goodsitt also teaches this, it is stated in Goodsitt, “In some embodiments, the cloud system 210 may distribute the machine learning model 202 to the computing devices 222-224. The first computing device 222 may receive the first distributed instance 232, the second computing device 223 may receive the second distributed instance 242, and the third computing device 224 may receive the third distributed instance 252. In some embodiments, different computing devices may receive different hyperparameters that cause the different computing devices to have different initial versions of their respective distributed instances. For example, the first computing device 222 and the second computing device 223 may receive a first hyperparameter value that causes each computing device to implement the first distributed instance 232 and the second distributed instance 242, respectively, where each respective instance of the distributed instances has three hidden layers. Similarly, the third computing device 224 may receive a second hyperparameter value that causes the third computing device 224 to implement the third distributed instance 252, where the third distributed instance 252 has two hidden layers.” (Goodsitt, Col. 9, Ln. 6-26), which discloses the use of distributing different distributed instances to different client devices. This, as the examiner believes discloses the that the sever is able to send different, decorrelated, distributed instances, which as stated above can be hyperparameters, weights, activations, sub-models etc., to different computing devices in a federated system as claimed in the independent claims.
	Finally, the examiner has reviewed the claims and the arguments made by the applicant. The examiner has disclosed the interpretations used for the claims and has disclosed support for the proposed art. Further the examiner has performed a through and complete search, as performed after each amendment, and has not found art which discloses the current claims as well as the cited previous art. For the reasons above and the reasons stated in the rejection under 35 U.S.C. 102, the examiner believes the rejection under 35 U.S.C. 102 is upheld.

Regarding Claim Rejections – 35 USC 103
Applicant Remarks:
	The applicant argues that Goodsitt fails to teach the limitations the independent claims, 1, 15, and 22 as stated above. Because of this, the applicant argues that the arts used in the rejection under 35 U.S.C. 103 would fails to cure the deficiencies of Goodsitt and would therefore not properly teach the claims as a whole. For these reasons, the applicant argues that the rejection under 35 U.S.C. 103 should be withdrawn.

Examiner Response:
	The examiner has reviewed the claims and arguments made by the applicant on the rejection under 35 U.S.C. 102 and has found the rejection is upheld. This would mean that the examiner believes that Goodsitt properly teaches the amended claims as stated and that the art used for the rejection under 35 U.S.C. 103 would not need to cure the deficiency of Goodsitt. Further, the applicant has not given any further arguments as to why they believe the proposed arts fail to teach the elements of the dependent claims besides their dependency to the independent claims. Finally, the examiner has performed a through and complete search, as required for each amendment, and has not found art which is able to teach the proposed claims in their current state as well as the proposed art. Therefore, for the reasons stated above, the examiner believes the rejection under 35 U.S.C. 103 is properly applied and the rejection is upheld.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 2, 11-18, 21-23, and 27-30 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Goodsitt (Goodsitt, "Hidden Machine Learning for Federated Learning", US 12,242,589 B2, Filed Aug 10th, 2022, hereinafter "Goodsitt").

Regarding claim 1, Goodsitt discloses, "A computer-implemented method, comprising:" (Abstract, pp. 1; "A method includes providing a distributed instance of a machine learning model to a client computing device and access to a predetermined token associated with a predetermined label to a client-side application at the client computing device." This Patent discloses a method which divided a global model into multiple models. This method will distribute portions of the global model to different devices in the network.)
"partitioning a machine learning model into a plurality of partitions;" (Detailed Description, Col. 13, In. 11-17; "In some embodiments, the set of model parameters of the distributed instances of the machine learning model may include weights, biases, hyperparameters, or other values characterizing a neural network. As should be understood, when the distributed instance is stored on a client computing device, the distributed instance may be described as a client model instance." A global model in a federated system is divided into multiple distributed instances. Each of the instances is linked to a client-based device. There are multiple client devices in this system, meaning there are multiple distributed instances of the global model.)
"transmitting, to each respective participating device of a plurality of participating devices in a federated learning scheme, a request to update a respective partition of the plurality of partitions in the machine learning model based on unique data at the respective participating device;" (Detailed Description, Col. 10, In. 43-53; "Some embodiments may send a distributed instance of a machine learning model to a set of client computing devices, as indicated by block 302. Sending a distributed instance of a machine learning model data may include sending parameter values, values indicating the architecture of the neural network model itself, functions, operators, or other data characterizing a machine learning model. Furthermore, some embodiments may send a client-side training application to the client computing device, where the client computing device may be for training the distributed instance." The global model is divided into distributed instances. The instances are then sent to the different devices in the network for the purpose of training the global model.)
"receiving, from the plurality of participating devices, updates to one or more partitions in the machine learning model; and" (Detailed Description, Col. 10, Ln. 13-22; "A device of the computing devices 222-224 may send results of their respective training operation back to the cloud system 210. For example, the first computing device 222 may send neural network weights of the first distributed instance 232 to the cloud system 210, the second computing device 223 may send neural network weights of the second distributed instance 242 to the cloud system 210, and the third computing device 224 may send neural network weights of the third distributed instance 252 to the cloud system 210." Each of the devices have received a portion of the global model. Each device in turn trains their respected portion and will send the updates back to server to update the global model.)
"updating the machine learning model based on the received updates." (Detailed Description, Col. 10, Ln. 23-36; "After receiving different sets of machine learning parameters from different devices, the cloud system 210 may combine model parameters from different devices. In some embodiments, the cloud system 210 may segregate different machine learning parameters based on their corresponding hyperparameters. For example, the cloud system 210 may combine neural network weights of the first distributed instance 232 and the second distributed instance 242 by determining a measure of central tendency for their respective weights. Additionally, the cloud system 210 may combine neural network weights of the third distributed instance 252 with neural network weights of other distributed instances by determining a measure of central tendency for their respective weights." After the models have been receive by the different devices the global server reviews the update for the distributed instances. The server will then update the global model accordingly.)

Regarding claim 2, Goodsitt discloses, "wherein partitioning the machine learning model into the plurality of partitions comprises partitioning the machine learning model into a common subnetwork and one or more non-common subnetworks." (Detailed Description, Col. 11, Ln. 3-7; "Some embodiments may send a distributed instance of a machine learning model having multiple sub-models to a client computing device, where the sub-models of the distributed instance stored on the client computing device may be described as client sub-models." The global model is divided into different distributed instances. These instances can contain model parameters as well as sub models of the overall global model.)

Regarding claim 11, Goodsitt discloses, "receiving reports from each participating device of the plurality of participating devices; and" (Detailed Description, Col. 12, Ln. 8-15; "In some embodiments, the set of client computing devices may report information to a server or other computing devices used to distribute or update machine learning models. For example, the set of client computing devices may report information indicating the use of an application or a component of an application, where such use may indicate the availability of a target sensitive information type." The client devices may communicate with the central server. This commination can be for multiple reasons including but not limited to information about training or parameters.)
"configuring, for each respective participating device of the plurality of participating devices, constraints for updating a partition in the machine learning model based on the received reports for the respective participating device." (Detailed Description, Col. 9, Ln. 39-52; "Each respective device of the computing devices 222-224 may perform respective training operations to update their respective distributed instances. For example, the first computing device 222 may perform training operations to update the first distributed instance 232, the second computing device 223 may perform training operations to update the second distributed instance 242, and the third computing device 224 may perform training operations to update the third distributed instance 252. Each of the training operations for each device may be performed independently, synchronously, semi-asynchronously, asynchronously, etc. Each of the computing devices 222-224 may perform different numbers of training operations, use different data for training, perform training at different times, etc." Each of the distributed instances can be distributed to different client devices. During the training phase the global model may send each client device in the network different portions of the model or different constraints about the training including but not limited to training schedule and/or training operations.)

Regarding claim 12, Goodsitt discloses, "the one or more participating devices comprise one or more user equipments (UEs) in a wireless communication system," (Detailed Description, Col. 3, In. 19-26; "FIG. 1 shows an illustrative system for updating a federated learning model, in accordance with one or more embodiments. A system 100 includes a client computing device 102. While shown as a mobile computing device, the client computing devices 102 may include other types of computing devices, such as a desktop computer, a wearable headset, a smartwatch, another type of mobile computing device, etc." Each of the user devices in the network can be different user devices. This includes mobile devices and other devices connected to a wireless network.)
"the machine learning model comprises a model for predicting parameters for wireless communications between a network entity and a user equipment (UE), and" (Detailed Description, Col. 4, Ln. 14-20; "The communication subsystem 124 may retrieve 15 the learning model from the set of databases 130 or another memory accessible to the server 120. For example, the server 120 may access a set of neural network weights for a plurality of layers of a neural network and send the set of neural network weights to one or more client computing 20 devices, such as the client computing device 102." This system can be used with many different machine learning models. This system is connected to a set of databased which can contain many different models and model parameters, which can include prediction models.)
"the data at each respective device comprises one or more radio measurements at each respective device." (Detailed Description, Col. 5, Ln. 19-26; "In some implementations, the client computing device 102 may update its distributed instance of a machine learning model stored on a client memory ("client model instance") by performing a training operation based on the inputs received by the client computing device 102. Alternatively, or additionally, the client computing device 102 may update its client model instance based on data stored or otherwise accessible to the client computing device 102." This system proposed can perform many different actions on different types of machine learning models. The data of these models can pertain to any form of data that the model uses. The client during an update may use its own data to update the distributed instance.)

Regarding claim 13, Goodsitt discloses, "wherein updating the machine learning model based on the received updates comprises aggregating weight updates from each participating device of the plurality of participating devices." (Detailed Description, Col. 10, Ln. 23- 36; "After receiving different sets of machine learning parameters from different devices, the cloud system 210 may combine model parameters from different devices. In some embodiments, the cloud system 210 may segregate different machine learning parameters based on their corresponding hyperparameters. For example, the cloud system 210 may combine neural network weights of the first distributed instance 232 and the second distributed instance 242 by determining a measure of central tendency for their respective weights. Additionally, the cloud system 210 may combine neural network weights of the third distributed instance 252 with neural network weights of other distributed instances by determining a measure of central tendency for their respective weights." The client devices will update their instance of the model and send their data to a cloud system. This system can choose to update the global model by evaluating and combining weights and parameters sent by the client devices. This can choose to combine weights or keep the separate depending on the model parameters.)

Regarding claim 14, Goodsitt discloses, "wherein the method is performed by a network entity in a wireless communication system, and wherein the plurality of participating devices comprises user equipments (UEs) served by the network entity." (Detailed Description, Col. 3, In. 19-36; "FIG. 1 shows an illustrative system for updating a federated learning model, in accordance with one or more embodiments. A system 100 includes a client computing device 102. While shown as a mobile computing device, the client computing devices 102 may include other types of computing devices, such as a desktop computer, a wearable headset, a smartwatch, another type of mobile computing device, etc. In some embodiments, the client computing devices 102 may communicate with various other computing devices via a network 150, where the network 150 may include the Internet, a local area network, a peer-to-peer network, etc. The client computing device 102 or other computing devices may send and receive messages through the network 150 to communicate with a server 120, where the server 120 may include a non-transitory storage medium storing program instructions to perform one or more operations of subsystems 124-126." This system as seen in figure 1 consist of client devices connected via a network to other network entities like servers and databases. The Communication subsystem, Data Aggregation Subsystem and the Model Update Subsystem are implemented on a server to communicate through network to the client device.)

Regarding claim 15, Goodsitt discloses, "A computer-implemented method, comprising:" (Abstract, pp. 1; "A method includes providing a distributed instance of a machine learning model to a client computing device and access to a predetermined token associated with a predetermined label to a client-side application at the client computing device." This Patent discloses a method which divided a global model into multiple models. This method will distribute portions of the global model to different devices in the network.)
"receiving, from a server, information defining an orthogonal partition in a machine learning model to be updated and constraints for updating the orthogonal partition, wherein the orthogonal partition comprises a partition for a first participating device in a federated learning scheme that is decorrelated from a partition for a second participating device in the federated learning scheme;" (Detailed Description, Col. 9, In. 6-16; "In some embodiments, the cloud system 210 may distribute the machine learning model 202 to the computing devices 222-224. The first computing device 222 may receive the first distributed instance 232, the second computing device 223 may receive the second distributed instance 242, and the third computing device 224 may receive the third distributed instance 252. In some embodiments, different computing devices may receive different hyperparameters that cause the different computing devices to have different initial versions of their respective distributed instances." This system will send model information and parameters to different client devices within the network. Each of the devices are not designed to communicate their private data with other clients within the network. Each device will be given a distributed instance of the global model to update separately from other client devices.)
"updating the orthogonal partition in the machine learning model based on local data; and" (Detailed Description, Col. 9, In. 39-52; "Each respective device of the computing devices 222-224 may perform respective training operations to update their respective distributed instances. For example, the first computing device 222 may perform training operations to update the first distributed instance 232, the second computing device 223 may perform training operations to update the second distributed instance 242, and the third computing device 224 may perform training operations to update the third distributed instance 252. Each of the training operations for each device may be performed independently, synchronously, semi-asynchronously, asynchronously, etc. Each of the computing devices 222-224 may perform different numbers of training operations, use different data for training, perform training at different times, etc." After receiving a request or model data, the client devices can update the portion of the machine learning model. This update is completed separate from other client devices and can be performed at different times and with the same or different model parameters.)
"transmitting, to the server, information defining the updated orthogonal partition in the machine learning model." (Detailed Description, Col. 10, In. 13-22; "A device of the computing devices 222-224 may send results of their respective training operation back to the cloud system 210. For example, the first computing device 222 may send neural network weights of the first distributed instance 232 to the cloud system 210, the second computing device 223 may send neural network weights of the second distributed instance 242 to the cloud system 210, and the third computing device 224 may send neural network weights of the third distributed instance 252 to the cloud system 210." The client devices are sent the distributed instances of the model. After receiving the instance, the client device will update that specified instance and then send it to the global server where the global model is updated.)

Regarding claim 16, Goodsitt discloses, "transmitting, to the server, one or more reports, wherein the constraints for updating the orthogonal partition are based on the one or more reports." (Detailed Description, Col. 11, In. 17-24; "As an example, some embodiments may characterize a distributed instance of a machine learning model with a target machine learning hyperparameter, where the hyperparameter may include a number of hidden layers, a learning rate for an optimization algorithm, a value indicating an activation function, a number of activation units in a layer, a number of clusters for a clustering analysis operation, a pooling size of a convolutional neural network, a filter size of a convolutional neural network, etc. Some embodiments may determine a required set of metrics based on an obtained machine learning hyperparameter, where the hyperparameter may be mapped to a set of requirements for client computing device metrics." In this system the client devices can have different computing abilities. The computing capabilities of a client device is sent to the central server to be stored. The central server can use this information to properly distribute to capable devices or it can alter the parameters to meet the devices constraints.)

Regarding claim 17, Goodsitt discloses, "wherein the one or more reports include generalized information about the local data." (Detailed Description, Col. 12, In. 4-19; "Some embodiments may select a set of client computing devices based on the type of information that is stored on the client computing devices. For example, some embodiments may obtain a target sensitive information type, such as "user name," "vehicle identifier," "disease," etc. In some embodiments, the set of client computing devices may report information to a server or other computing devices used to distribute or update machine learning models. For example, the set of client computing devices may report information indicating the use of an application or a component of an application, where such use may indicate the availability of a target sensitive information type. Some embodiments may then perform operations to filter the information reported by the set of client computing devices to select a subset of the set of client computing devices that may indicate the availability of the target sensitive information type." The client devices can communicate with the central server over the network. Local user data can be sent to the central server and stored. This data can be filtered by the server to determine which client devices should be used for training different machine learning models.)

Regarding claim 18, Goodsitt discloses, "wherein the one or more reports comprise radio measurements corresponding to a general classification of radio conditions at a device." (Detailed Description, Col. 5, Ln. 19-26; "In some implementations, the client computing device 102 may update its distributed instance of a machine learning model stored on a client memory ("client model instance") by performing a training operation based on the inputs received by the client computing device 102. Alternatively, or additionally, the client computing device 102 may update its client model instance based on data stored or otherwise accessible to the client computing device 102." This system proposed can perform many different actions on different types of machine learning models. The data of these models can pertain to any form of data that the model uses. The client during an update may use its own data to update the distributed instance. The client devices in this system can be mobile devices. Radio information can be stored as metadata taken from a mobile device component or module.)

Regarding claim 21, Goodsitt discloses, "wherein the method is performed by a user equipment (UE) in a wireless communication system, and the server is associated with a network entity serving the UE in the wireless communication system." (Detailed Description, Col. 3, In. 19-43; "FIG. 1 shows an illustrative system for updating a federated learning model, in accordance with one or more embodiments. A system 100 includes a client computing device 102. While shown as a mobile computing device, the client computing devices 102 may include other types of computing devices, such as a desktop computer, a wearable headset, a smartwatch, another type of mobile computing device, etc. In some embodiments, the client computing devices 102 may communicate with various other computing devices via a network 150, where the network 150 may include the Internet, a local area network, a peer-to-peer network, etc. The client computing device 102 or other computing devices may send and receive messages through the network 150 to communicate with a server 120, where the server 120 may include a non-transitory storage medium storing program instructions to perform one or more operations of subsystems 124-126. Further, while one or more operations are described herein as being performed by particular components of the system 100, those operations may be performed by other components of the system 100 in some embodiments. One or more operations described in this disclosure as being performed by the server 120 may instead be performed by the client computing device 102 or other computing devices described in this disclosure." Figure 1 shows a model of this system. This system includes a client device which is connected to a network. The client device can consist of many different kinds of electronic devices include mobile devices. This network connects the client devices to a database and a server. This server is a network entity that communicates over the network to client devices.)

Regarding claim 22, Goodsitt discloses, "A processing system comprising: memory comprising computer-executable instructions stored thereon; and one or more processors configured to execute the computer-executable instructions to cause the processing system to:" (Detailed Description, Col. 19, In. 38-45 "In some embodiments, the various computer systems and subsystems illustrated in FIG. 1 may include one or more computing devices that are programmed to perform the functions described herein. The computing devices may include one or more electronic storages (e.g., the set of databases 130), one or more physical processors programmed with one or more computer program instructions, and/or other components." Each of the devices disclosed in this system contain memory which store computer readable instructions and each of the systems can contain some form of processing unit.)
"partition a machine learning model into a plurality of partitions;" (Detailed Description, Col. 13, In. 11-17; "In some embodiments, the set of model parameters of the distributed instances of the machine learning model may include weights, biases, hyperparameters, or other values characterizing a neural network. As should be understood, when the distributed instance is stored on a client computing device, the distributed instance may be described as a client model instance." A global model in a federated system is divided into multiple distributed instances. Each of the instances is linked to a client-based device. There are multiple client devices in this system, meaning there are multiple distributed instances of the global model.)
"transmit, to each respective participating device of a plurality of participating devices in a federated learning scheme, a request to update a respective partition of the plurality of partitions in the machine learning model based on unique data at the respective participating device;" (Detailed Description, Col. 10, In. 43-53; "Some embodiments may send a distributed instance of a machine learning model to a set of client computing devices, as indicated by block 302. Sending a distributed instance of a machine learning model data may include sending parameter values, values indicating the architecture of the neural network model itself, functions, operators, or other data characterizing a machine learning model. Furthermore, some embodiments may send a client-side training application to the client computing device, where the client computing device may be for training the distributed instance." The global model is divided into distributed instances. The instances are then sent to the different devices in the network for the purpose of training the global model.)
"receive, from the plurality of participating devices, updates to one or more partitions in the machine learning model; and" (Detailed Description, Col. 10, Ln. 13-22; "A device of the computing devices 222-224 may send results of their respective training operation back to the cloud system 210. For example, the first computing device 222 may send neural network weights of the first distributed instance 232 to the cloud system 210, the second computing device 223 may send neural network weights of the second distributed instance 242 to the cloud system 210, and the third computing device 224 may send neural network weights of the third distributed instance 252 to the cloud system 210." Each of the devices have received a portion of the global model. Each device in turn trains their respected portion and will send the updates back to server to update the global model.)
"update the machine learning model based on the received updates." (Detailed Description, Col. 10, Ln. 23-36; "After receiving different sets of machine learning parameters from different devices, the cloud system 210 may combine model parameters from different devices. In some embodiments, the cloud system 210 may segregate different machine learning parameters based on their corresponding hyperparameters. For example, the cloud system 210 may combine neural network weights of the first distributed instance 232 and the second distributed instance 242 by determining a measure of central tendency for their respective weights. Additionally, the cloud system 210 may combine neural network weights of the third distributed instance 252 with neural network weights of other distributed instances by determining a measure of central tendency for their respective weights." After the models have been receive by the different devices the global server reviews the update for the distributed instances. The server will then update the global model accordingly.)

Regarding claim 23, Goodsitt discloses, "partition the machine learning model into a common subnetwork and one or more non-common subnetworks, or orthogonalize the partitions.". (Detailed Description, Col. 11, Ln. 3-7; "Some embodiments may send a distributed instance of a machine learning model having multiple sub-models to a client computing device, where the sub-models of the distributed instance stored on the client computing device may be described as client sub-models." The global model is divided into different distributed instances. These instances can contain model parameters as well as sub models of the overall global model.)

Regarding claim 27, Goodsitt discloses, "receive reports from each participating device of the plurality of participating devices; and" (Detailed Description, Col. 12, Ln. 8-15; "In some embodiments, the set of client computing devices may report information to a server or other computing devices used to distribute or update machine learning models. For example, the set of client computing devices may report information indicating the use of an application or a component of an application, where such use may indicate the availability of a target sensitive information type." The client devices may communicate with the central server. This commination can be for multiple reasons including but not limited to information about training or parameters.) 
"configure, for each respective participating device of the plurality of participating devices, constraints for updating a partition in the machine learning model based on the received reports for the respective participating device." (Detailed Description, Col. 9, Ln. 39-52; "Each respective device of the computing devices 222-224 may perform respective training operations to update their respective distributed instances. For example, the first computing device 222 may perform training operations to update the first distributed instance 232, the second computing device 223 may perform training operations to update the second distributed instance 242, and the third computing device 224 may perform training operations to update the third distributed instance 252. Each of the training operations for each device may be performed independently, synchronously, semi-asynchronously, asynchronously, etc. Each of the computing devices 222-224 may perform different numbers of training operations, use different data for training, perform training at different times, etc." Each of the distributed instances can be distributed to different client devices. During the training phase the global model may send each client device in the network different portions of the model or different constraints about the training including but not limited to training schedule and/or training operations.)

Regarding claim 28, Goodsitt discloses, "the plurality of participating devices comprises one or more user equipments (UEs) in a wireless communication system," (Detailed Description, Col. 3, In. 19-26; "FIG. 1 shows an illustrative system for updating a federated learning model, in accordance with one or more embodiments. A system 100 includes a client computing device 102. While shown as a mobile computing device, the client computing devices 102 may include other types of computing devices, such as a desktop computer, a wearable headset, a smartwatch, another type of mobile computing device, etc." Each of the user devices in the network can be different user devices. This includes mobile devices and other devices connected to a wireless network.)
"the machine learning model comprises a model for predicting parameters for wireless communications between a network entity and a user equipment (UE), and" (Detailed Description, Col. 4, Ln. 14-20; "The communication subsystem 124 may retrieve 15 the learning model from the set of databases 130 or another memory accessible to the server 120. For example, the server 120 may access a set of neural network weights for a plurality of layers of a neural network and send the set of neural network weights to one or more client computing 20 devices, such as the client computing device 102." This system can be used with many different machine learning models. This system is connected to a set of databased which can contain many different models and model parameters, which can include prediction models.)
"the data at each respective device comprises one or more radio measurements at each respective device." (Detailed Description, Col. 5, Ln. 19-26; "In some implementations, the client computing device 102 may update its distributed instance of a machine learning model stored on a client memory ("client model instance") by performing a training operation based on the inputs received by the client computing device 102. Alternatively, or additionally, the client computing device 102 may update its client model instance based on data stored or otherwise accessible to the client computing device 102." This system proposed can perform many different actions on different types of machine learning models. The data of these models can pertain to any form of data that the model uses. The client during an update may use its own data to update the distributed instance.)

Regarding claim 29, Goodsitt discloses, "wherein in order to update the machine learning model, the one or more processors are configured to aggregate weight updates from each participating device of the plurality of participating devices." (Detailed Description, Col. 10, Ln. 23- 36; "After receiving different sets of machine learning parameters from different devices, the cloud system 210 may combine model parameters from different devices. In some embodiments, the cloud system 210 may segregate different machine learning parameters based on their corresponding hyperparameters. For example, the cloud system 210 may combine neural network weights of the first distributed instance 232 and the second distributed instance 242 by determining a measure of central tendency for their respective weights. Additionally, the cloud system 210 may combine neural network weights of the third distributed instance 252 with neural network weights of other distributed instances by determining a measure of central tendency for their respective weights." The client devices will update their instance of the model and send their data to a cloud system. This system can choose to update the global model by evaluating and combining weights and parameters sent by the client devices. This can choose to combine weights or keep the separate depending on the model parameters.)

Regarding claim 30, Goodsitt discloses, "A processing system comprising: memory having computer-executable instructions stored thereon; and one or more processors configured to execute the computer-executable instructions to:" (Detailed Description, Col. 19, In. 38-45 "In some embodiments, the various computer systems and subsystems illustrated in FIG. 1 may include one or more computing devices that are programmed to perform the functions described herein. The computing devices may include one or more electronic storages (e.g., the set of databases 130), one or more physical processors programmed with one or more computer program instructions, and/or other components." Each of the devices disclosed in this system contain memory which store computer readable instructions and each of the systems can contain some form of processing unit.)
"receive, from a server, information defining an orthogonal partition in a machine learning model to be updated and constraints for updating the orthogonal partition, wherein the orthogonal partition comprises a partition for a first participating device in a federated learning scheme that is decorrelated from a partition for a second participating device in the federated learning scheme;" (Detailed Description, Col. 9, In. 6-16; "In some embodiments, the cloud system 210 may distribute the machine learning model 202 to the computing devices 222-224. The first computing device 222 may receive the first distributed instance 232, the second computing device 223 may receive the second distributed instance 242, and the third computing device 224 may receive the third distributed instance 252. In some embodiments, different computing devices may receive different hyperparameters that cause the different computing devices to have different initial versions of their respective distributed instances." This system will send model information and parameters to different client devices within the network. Each of the devices are not designed to communicate their private data with other clients within the network. Each device will be given a distributed instance of the global model to update separately from other client devices.)
"update the orthogonal partition in the machine learning model based on local data; and" (Detailed Description, Col. 9, In. 39-52; "Each respective device of the computing devices 222-224 may perform respective training operations to update their respective distributed instances. For example, the first computing device 222 may perform training operations to update the first distributed instance 232, the second computing device 223 may perform training operations to update the second distributed instance 242, and the third computing device 224 may perform training operations to update the third distributed instance 252. Each of the training operations for each device may be performed independently, synchronously, semi-asynchronously, asynchronously, etc. Each of the computing devices 222-224 may perform different numbers of training operations, use different data for training, perform training at different times, etc." After receiving a request or model data, the client devices can update the portion of the machine learning model. This update is completed separate from other client devices and can be performed at different times and with the same or different model parameters.)
"transmit, to the server, information defining the updated orthogonal partition in the machine learning model." (Detailed Description, Col. 10, In. 13-22; "A device of the computing devices 222-224 may send results of their respective training operation back to the cloud system 210. For example, the first computing device 222 may send neural network weights of the first distributed instance 232 to the cloud system 210, the second computing device 223 may send neural network weights of the second distributed instance 242 to the cloud system 210, and the third computing device 224 may send neural network weights of the third distributed instance 252 to the cloud system 210." The client devices are sent the distributed instances of the model. After receiving the instance, the client device will update that specified instance and then send it to the global server where the global model is updated.)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3, 4, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Goodsitt in view of Kaminski et al. (Kaminski et al, "Kernel Orthonormalization in Radial Basis Function Neural Networks", Sept. 1997, Hereinafter "Kaminski").

Regarding claim 3, Goodsitt fails to explicitly disclose, "wherein partitioning the machine learning model into the plurality of partitions comprises orthogonalizing the partitions.". However, Kaminski discloses (Introduction, pp. 1178; "In this paper the following two-stage procedure for calculation of an RBF (Radical basis function) network output weights is proposed to address this problem. First, RBF's are transformed into the set of orthonormal functions for which the optimum network output layer weights are computed. This allows elimination of the requirement to compute the off-diagonal terms in the linear set of equations for computations of the optimum network weight set. Second, these weights are recomputed in such a way that their values can be fitted back into the original RBF network structure (i.e., with kernel functions unchanged). This article discloses a method which uses orthogonalization in a machine learning network. This article discloses a process which will reduce the weights of a model and ensure it is orthogonal.)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Goodsitt and Kaminski. Goodsitt teaches a machine learning model training method in a federated environment which divides a machine learning model into distributed instances, distributes those instances to client devices for training, receives updated distributed instances from the client devices and aggregates the trained instances to update a machine learning model. Kaminski teaches a method which uses the Gram-Schmidt orthogonalization on machine learning modes. One of ordinary skill would have motivation to combine a method to update a machine learning model using a federated system with a system which can orthogonalize a machine learning model using the Gram-Schmidt process, "An efficient noniterative technique for computations of the RBF network weights has been proposed in which RBF kernels are transformed into an orthonormal set of functions. This has allowed the requirement, to compute the off-diagonal terms in the linear set of equations for computations of the optimum network weight set to be eliminated." (Kaminski, Summary, pp. 1182).

Regarding claim 4, Goodsitt fails to explicitly disclose, "wherein orthogonalizing the partitions comprises generating the plurality of partitions based on Gram-Schmidt orthogonalization.". However, Kaminski discloses (Orthonormalization of Kernal Functions, pp. 1179; "In general, however, the set of functions                         
                            
                                    ϕ
                                
                                    k
                                
                                    x
                                
                            k
                            =
                            1
                            ,
                             
                            .
                             
                            .
                             
                            .
                             
                            ,
                            M
                        
                     resulting from the choice of basis functions and their centers is not orthogonal. The underpinning idea behind the proposed method for calculating RBF network output weights is based upon transforming the network radial functions                         
                            
                                    ϕ
                                
                                    k
                                
                                    x
                                
                     into a set of orthonormal basis functions. Namely, the standard Gram-Schmidt orthonormalization algorithm is applied to radial functions                         
                            
                                    ϕ
                                
                                    k
                                
                                    x
                                
                     in order to obtain an orthonormal set of basis functions                         
                            
                                    ϕ
                                
                                    k
                                
                                    x
                                
                    " This article discloses a method which uses the standard gram-Schmidt orthonormalization algorithm. This will be used to again compute the weights of the model and possibly alter them.)

Regarding claim 24, Goodsitt fails to explicitly disclose, "wherein in order to orthogonalize the partitions, the one or more processors are configured to generate the plurality of partitions based on one or more of Gram-Schmidt orthogonalization, singular value decomposition, or Cholesky decomposition.". However, Kaminski discloses, (Orthonormalization of Kernal Functions, pp. 1179; "In general, however, the set of functions                         
                            
                                    ϕ
                                
                                    k
                                
                                    x
                                
                            k
                            =
                            1
                            ,
                             
                            .
                             
                            .
                             
                            .
                             
                            ,
                            M
                        
                     resulting from the choice of basis functions and their centers is not orthogonal. The underpinning idea behind the proposed method for calculating RBF network output weights is based upon transforming the network radial functions                         
                            
                                    ϕ
                                
                                    k
                                
                                    x
                                
                     into a set of orthonormal basis functions. Namely, the standard Gram-Schmidt orthonormalization algorithm is applied to radial functions                         
                            
                                    ϕ
                                
                                    k
                                
                                    x
                                
                     in order to obtain an orthonormal set of basis functions                         
                            
                                    ϕ
                                
                                    k
                                
                                    x
                                
                    " This article discloses a method which uses the standard gram-Schmidt orthonormalization algorithm. This will be used to again compute the weights of the model and possibly alter them.)

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Goodsitt and Kaminski in view of Kanjilal et al. (Kanjilal et al, "On the Application of Orthogonal Transformation for the Design and Analysis of Feedforward Networks", Sept. 1995, Hereinafter "Kanjilal").

Regarding claim 5 Goodsitt and Kaminski fail to explicitly disclose, "wherein orthogonalizing the partitions comprises generating the plurality of partitions based on singular value decomposition.". However, Kanjilal discloses, (Introduction, pp. 1061; "In this paper, two broad approaches have been used for optimizing the size of the NN for representative modelling: 1) Use time domain data and optimize on a) the number of input nodes and b) the number of links and nodes within the network, using SVD and QRcp factorization-based subset selection. 2) Use (orthogonally) transformed data and optimize on the number of nodes and links within the network." This article discloses a method using singular value decomposition. This method is used to optimize input nodes of a Neural Network.)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Goodsitt, Kaminski and Kanjilal. Goodsitt teaches a machine learning model training method in a federated environment which divides a machine learning model into distributed instances, distributes those instances to client devices for training, receives updated distributed instances from the client devices and aggregates the trained instances to update a machine learning model. Kaminski teaches a method which uses the Gram-Schmidt orthogonalization on machine learning modes. Kanjilal teaches a method which performs orthogonal transformations on machine learning models with using Singular Value Decomposition. One of ordinary skill would have motivation to combine a method to update a machine learning model using a federated system with a system which can orthogonalize a machine learning model using the Gram-Schmidt process, and a process that uses Singular Value Decomposition to refine a machine learning model, "Robust and efficient methods based on orthogonal transformation have been presented for optimizing the size of neural networks. The optimization of the network size is important because any model has to be rather parsimonious than exhaustive (or overparameterized) to be truly representative. It has been shown that SVD and QRcp factorization in particular can be successfully used for the optimizing the size of feedforward networks;" (Kanjilal, Conclusions, pp. 1068).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Goodsitt and Kaminski in view of Li et al. (Li et al., "Second-Order Convolutional Neural Network Based on Cholesky Compression Strategy", 2021, Hereinafter "Li").

Regarding claim 6, Goodsitt and Kaminski fail to explicitly disclose, "wherein orthogonalizing the partitions comprises generating the plurality of partitions based on Cholesky decomposition.". However, Li discloses (The Proposed Second-Order CNN Model, pp. 345; "The overall architecture and main steps of the proposed second-order CNN model can be shown in Fig. 3. By combing the first order and second-order information in Cov-lV, the model can capture more useful features in the input data, and Cholesky decomposition is then used to reduce the number of parameters. Finally, the vectorized H* is output to the classifier for label prediction." This article discloses a method which uses Cholesky decomposition to reduce the number of parameters. This method uses the Cholesky decomposition at the end of a training process however the use of Cho le sky decomposition is fundamentally the same, it will alter a model and refine it.)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Goodsitt, Kaminski and Li. Goodsitt teaches a machine learning model training method in a federated environment which divides a machine learning model into distributed instances, distributes those instances to client devices for training, receives updated distributed instances from the client devices and aggregates the trained instances to update a machine learning model. Kaminski teaches a method which uses the Gram-Schmidt orthogonalization on machine learning modes. Li teaches a new CNN system uses Cholesky decomposition for model compression. One of ordinary skill would have motivation to combine a method to update a machine learning model using a federated system with a system which can orthogonalize a machine learning model using the Gram-Schmidt process, and a process which uses Cholesky Decomposition to compress a machine learning model, "Table 3 shows the number of trainable parameters. In convolution part, the number of parameters of all models are the same. In the second-order part, the vectorized output of lower triangular matrix obtained by the Cholesky decomposition is used to reduce the full connection parameters of Models 3-5. Therefore, compared with Model2, Models 3- 5 have much less parameters. Especially Models 4-5 can achieve superior classification performance with the least performance cost of parameters among the six second-order models." (Li, Analysis of Results, pp. 349-350).

Claims 7, 8, 19, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Goodsitt in view of Porwik et al. (Porwik et al., "Feature projection k-NN classifier model for imbalanced and incomplete medical data", 2016, Hereinafter "Porwik").

Regarding claim 7, Goodsitt fails to explicitly disclose, "wherein partitioning the machine learning model comprises generating the plurality of partitions in the machine learning model based on a feature projection technique.". However, Porwik discloses (Modified k-NN on feature projections (FP k-NN) classifier, pp. 646; "For this task we propose so called feature projection classifier, which eliminates the need of feature normalization and solves the problem of missing values. In a standard k-NN classifier can appear data collection that two or more features have the same Euclidean distance to the classified feature. To solve that problem, the neighbor area scaling method has been used - when there are more neighbors in the same distance, then all of them will be taken into consideration. It can be shown graphically as feature projection into one axis only (see Fig. 1):" This article discloses a method which uses feature projection in training a machine learning model. Similarly, this method will keep track of features found and their distances.)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Goodsitt and Porwik. Goodsitt teaches a machine learning model training method in a federated environment which divides a machine learning model into distributed instances, distributes those instances to client devices for training, receives updated distributed instances from the client devices and aggregates the trained instances to update a machine learning model. Porwik teaches machine learning method which uses feature projection in a classification model which learnings from unequal sets of data. One of ordinary skill would have motivation to combine a method to update a machine learning model using a federated system with a system that is able to use feature projection to help improve and train a machine learning model, "Experiments show that proposed modified version of the k-NN on Feature Projections classifier performs well on all tested artificially and naturally degraded datasets. Its algorithm, after slight modification, can be also used as a wrapper type feature selection tool. In this role it not only improves the quality of classification of the proposed classifier, but also other state-of-the-art classification algorithms." (Porwik, Conclusion, pp. 655).

Regarding claim 8, Goodsitt discloses, "each partition of the plurality of partitions is associated with a set of weights;" (Detailed Description, Col. 13, In. 11-14; "In some embodiments, the set of model parameters of the distributed instances of the machine learning model may include weights, biases, hyperparameters, or other values characterizing a neural network." Each of the distributed instances contains a set model parameter. This includes model weights.)
"a first partition of the plurality of partitions comprises a common partition having a common set of weights; and" (Detailed Description, Col. 14-15, Ln. 62-67 and 1; "For example, some embodiments may receive a model parameter of a first distribution instance characterized by a first hyperparameter value that causes the first distribution instance to include two neural network layers and, in response, update a server-side machine learning model characterized by the same first hyperparameter value." In some instances, the model can be set up to have the first distributed instance be a specified set of network parameters. This teaches a common set of model parameters being evaluated.)
"a plurality of second partitions of the plurality of partitions comprise partitions having weights associated with one of a plurality of defined scenarios for which the machine learning model is trained." (Detailed Description, Col. 11, Ln. 29-37; "As an example, some embodiments may characterize a distributed instance of a machine learning model with a target machine learning hyperparameter, where the hyperparameter may include a number of hidden layers, a learning rate for an optimization algorithm, a value indicating an activation function, a number of activation units in a layer, a number of clusters for a clustering analysis operation, a pooling size of a convolutional neural network, a filter size of a convolutional neural network, etc." Each of the distributed instances can contain specified model parameters to be updated. Each of the client devices in the network can be used to train a distributed instance of the global model.)

Regarding claim 19, Goodsitt fails to explicitly disclose, "wherein updating the orthogonal partition in the machine learning model comprises updating the orthogonal partition based on a feature projection technique and the local data.". However, Porwik discloses (Modified k-NN on feature projections (FP k-NN) classifier, pp. 646; "For this task we propose so called feature projection classifier, which eliminates the need of feature normalization and solves the problem of missing values. In a standard k-NN classifier can appear data collection that two or more features have the same Euclidean distance to the classified feature. To solve that problem, the neighbor area scaling method has been used - when there are more neighbors in the same distance, then all of them will be taken into consideration. It can be shown graphically as feature projection into one axis only (see Fig. 1):" This article discloses a method which uses feature projection in training a machine learning model. Similarly, this method will keep track of features found and their distances. This method was designed to execute on a computing system and its attached data stored in memory.)

Regarding claim 25, Goodsitt discloses, "each partition of the plurality of partitions is associated with a set of weights;" (Detailed Description, Col. 13, In. 11-14; "In some embodiments, the set of model parameters of the distributed instances of the machine learning model may include weights, biases, hyperparameters, or other values characterizing a neural network." Each of the distributed instances contains a set model parameter. This includes model weights.)
"a first partition of the plurality of partitions comprises a common partition having a common set of weights; and" (Detailed Description, Col. 14-15, Ln. 62-67 and 1; "For example, some embodiments may receive a model parameter of a first distribution instance characterized by a first hyperparameter value that causes the first distribution instance to include two neural network layers and, in response, update a server-side machine learning model characterized by the same first hyperparameter value." In some instances, the model can be set up to have the first distributed instance be a specified set of network parameters. This teaches a common set of model parameters being evaluated.)
"a plurality of second partitions of the plurality of partitions comprise partitions having weights associated with one of a plurality of defined scenarios for which the machine learning model is trained." (Detailed Description, Col. 11, Ln. 29-37; "As an example, some embodiments may characterize a distributed instance of a machine learning model with a target machine learning hyperparameter, where the hyperparameter may include a number of hidden layers, a learning rate for an optimization algorithm, a value indicating an activation function, a number of activation units in a layer, a number of clusters for a clustering analysis operation, a pooling size of a convolutional neural network, a filter size of a convolutional neural network, etc." Each of the distributed instances can contain specified model parameters to be updated. Each of the client devices in the network can be used to train a distributed instance of the global model.)
Goodsitt fails to explicitly disclose, "generate the plurality of partitions in the machine learning model based on a feature projection technique,". However, Porwik discloses (Modified k-NN on feature projections (FP k-NN) classifier, pp. 646; "For this task we propose so called feature projection classifier, which eliminates the need of feature normalization and solves the problem of missing values. In a standard k-NN classifier can appear data collection that two or more features have the same Euclidean distance to the classified feature. To solve that problem, the neighbor area scaling method has been used - when there is more neighbors in the same distance, then all of them will be taken into consideration. It can be shown graphically as feature projection into one axis only (see Fig. 1):" This article discloses a method which uses feature projection in training a machine learning model. Similarly, this method will keep track of features found and their distances.) 

Claims 9, 10, 20, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Goodsitt in view of Saha et al. (Saha et al., "GRADIENT PROJECTION MEMORY FOR CONTINUAL LEARNING", 2021, Hereinafter "Saha").

Regarding claim 9, Goodsitt fails to explicitly disclose, "wherein partitioning the machine learning model comprises generating the plurality of partitions in the machine learning model based on a gradient projection technique.". However, Saha discloses, (Introduction, pp. 2; "Using Singular Value Decomposition (SVD) on these activations, we show how to obtain the minimum set of bases of the CGS by which past knowledge is preserved and learnability for the new tasks is ensured. We store these bases in the memory which we define as Gradient Projection Memory (GPM). In our method, we propose to learn any new task by taking gradient steps in the orthogonal direction to the space (CGS) spanned by the G PM. Our analysis shows that such orthogonal gradient descent induces minimum to no interference with the old learning, and thus effective in alleviating catastrophic forgetting. We evaluate our approach in the context of image classification with mini lmageNet, CIFAR-100, PM NIST and sequence of 5-Datasets on a variety of network architectures including ResNet." This article discloses a method which will use gradient steps to continually learning and evolve a machine learning model. This method will ensure the model evolves orthogonally.)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Goodsitt and Saha. Goodsitt teaches a machine learning model training method in a federated environment which divides a machine learning model into distributed instances, distributes those instances to client devices for training, receives updated distributed instances from the client devices and aggregates the trained instances to update a machine learning model. Saha teaches a machine learning model which uses gradient projection which can learn orthogonally. One of ordinary skill would have motivation to combine a method to update a machine learning model using a federated system with a system that uses gradient projection to learn orthogonally, "We show how to analyze the network representations to obtain minimum number of bases of these subspaces by which past information is preserved and learnability for the new tasks is ensured. Evaluation on diverse image classification tasks with different network architectures and comparisons with state-of-the-art algorithms show the effectiveness of our approach in achieving high classification performance while mitigating forgetting. We also show our algorithm is fast, makes efficient use of memory and is capable of learning long sequence of tasks in deeper networks preserving data privacy." (Saha, Conclusion, pp. 9).

Regarding claim 10, Goodsitt discloses, "the plurality of partitions comprises a null space and a non-null space, and" (Detailed Description, Col. 12, In. 4-8; "Some embodiments may select a set of client computing devices based on the type of information that is stored on the client computing devices. For example, some embodiments may obtain a target sensitive information type, such as "user name," "vehicle identifier," "disease," etc." The server in this patent is able to determine which client device to send a specified distributed instance. Under the broadest reasonable interpretation this teaches some client devices could have a null label type.)
"the requests to update the one or more partitions in the plurality of partitions comprises requests to update subspaces in the non-null space." (Detailed Description, Col. 12, In. 8-19; "In some embodiments, the set of client computing devices may report information to a server or other computing devices used to distribute or update machine learning models. For example, the set of client computing devices may report information indicating the use of an application or a component of an application, where such use may indicate the availability of a target sensitive information type. Some embodiments may then perform operations to filter the information reported by the set of client computing devices to select a subset of the set of client computing devices that may indicate the availability of the target sensitive information type." This method is able to select which client device it can send a disturbed instance too for updating. Since not all clients are needed to update a model, some clients might not have versions of the current global model being updated. Therefore, this system under the broadest reasonable interpretation, could filter devices which do not contain a local model and have them update their model or obtain a version from the database.)

Regarding claim 20, Goodsitt fails to explicitly disclose "wherein updating the orthogonal partition in the machine learning model comprises updating the orthogonal partition based on a gradient projection technique and the local data.". However, Saha discloses (Introduction, pp. 2; "Using Singular Value Decomposition (SVD) on these activations, we show how to obtain the minimum set of bases of the CGS by which past knowledge is preserved and learnability for the new tasks is ensured. We store these bases in the memory which we define as Gradient Projection Memory (GPM). In our method, we propose to learn any new task by taking gradient steps in the orthogonal direction to the space (CGS) spanned by the G PM. Our analysis shows that such orthogonal gradient descent induces minimum to no interference with the old learning, and thus effective in alleviating catastrophic forgetting. We evaluate our approach in the context of image classification with mini lmageNet, CIFAR-100, PM NIST and sequence of 5-Datasets on a variety of network architectures including ResNet." This article discloses a method which will use gradient steps to continually learning and evolve a machine learning model. This method will ensure the model evolves orthogonally. This method was designed using computer-based datasets and could be performed on local devices or servers and their attached data.)

Regarding claim 26, Goodsitt discloses, "the plurality of partitions comprises a null space and a non-null space, and" (Detailed Description, Col. 12, In. 4-8; "Some embodiments may select a set of client computing devices based on the type of information that is stored on the client computing devices. For example, some embodiments may obtain a target sensitive information type, such as "user name," "vehicle identifier," "disease," etc." The server in this patent is able to determine which client device to send a specified distributed instance. Under the broadest reasonable interpretation this teaches some client devices could have a null label type.) 
"the requests to update the one or more partitions in the plurality of partitions comprises requests to update subspaces in the non-null space." (Detailed Description, Col. 12, In. 8-19; "In some embodiments, the set of client computing devices may report information to a server or other computing devices used to distribute or update machine learning models. For example, the set of client computing devices may report information indicating the use of an application or a component of an application, where such use may indicate the availability of a target sensitive information type. Some embodiments may then perform operations to filter the information reported by the set of client computing devices to select a subset of the set of client computing devices that may indicate the availability of the target sensitive information type." This method is able to select which client device it can send a disturbed instance too for updating. Since not all clients are needed to update a model, some clients might not have versions of the current global model being updated. Therefore, this system under the broadest reasonable interpretation, could filter devices which do not contain a local model and have them update their model or obtain a version from the database.)
Goodsitt fails to explicitly disclose, "in order to partition the machine learning model, the one or more processors are further configured to generate the plurality of partitions in the machine learning model based on a gradient projection technique,". However, Saha discloses (Introduction, pp. 2; "Using Singular Value Decomposition (SVD) on these activations, we show how to obtain the minimum set of bases of the CGS by which past knowledge is preserved and learnability for the new tasks is ensured. We store these bases in the memory which we define as Gradient Projection Memory (GPM). In our method, we propose to learn any new task by taking gradient steps in the orthogonal direction to the space (CGS) spanned by the GPM. Our analysis shows that such orthogonal gradient descent induces minimum to no interference with the old learning, and thus effective in alleviating catastrophic forgetting. We evaluate our approach in the context of image classification with mini lmageNet, CIFAR-100, PM NIST and sequence of 5-Datasets on a variety of network architectures including ResNet." This article discloses a method which will use gradient steps to continually learning and evolve a machine learning model. This method will ensure the model evolves orthogonally.)

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL MICHAEL GALVIN-SIEBENALER whose telephone number is (571)272-1257. The examiner can normally be reached Monday - Friday 8AM to 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PAUL M GALVIN-SIEBENALER/Examiner, Art Unit 2147                                                                                                                                                                                                        
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action
MODEL DECORRELATION AND SUBSPACING FOR FEDERATED LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

MODEL DECORRELATION AND SUBSPACING FOR FEDERATED LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email