Last updated: April 19, 2026
Application No. 17/697,751
LAYER-BY-LAYER TRAINING FOR FEDERATED LEARNING

Non-Final OA §103
Filed
Mar 17, 2022
Examiner
HAN, KYU HYUNG
Art Unit
2123
Tech Center
2100 — Computer Architecture & Software
Assignee
Qualcomm Incorporated
OA Round
3 (Non-Final)
This examiner grants 43% of cases after interview

— +41.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 7 resolved cases, 2023–2026
Examiner Intelligence

HAN, KYU HYUNG View full profile →
Grants 43% of resolved cases
Career Allow Rate
3 granted / 7 resolved
-12.1% vs TC avg
Strong +42% interview lift
Without
With
+41.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
38.4%
-1.6% vs TC avg
§103
50.9%
+10.9% vs TC avg
§102
4.2%
-35.8% vs TC avg
§112
6.6%
-33.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 7 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09/23/2025 has been entered.

Response to Remarks
	Claim Rejections – 35 U.S.C. 103
Applicant’s prior art arguments have been fully considered and they are persuasive.
Applicant argues (pgs. 10-13) that neither Zhong nor Ly do not teach the newly amended limitations that further clarify that the federated learning involves aggregating at respective different network entities of a disaggregated open radio access network (O-RAN), wherein the different network entities comprise a distributed unit (DU). 
Examiner agrees. Accordingly, a new reference, Aamer et al. ("Entropy-Driven Stochastic Federated Learning in Non-IID 6G Edge-RAN") has been added to the rejection, as further detailed below.
The foregoing applies to all independent claims and their dependent claims.

Claim Rejections – 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-30 are rejected under 35 U.S.C. 103 as being unpatentable over Zhong et al. (“Communication-Efficient Federated Learning with Multi-layered Compressed Model Update and Dynamic Weighting Aggregation”) hereinafter known as Zhong in view of Ly et al. (US 20220101204 A1) hereinafter known as Ly in view of Aamer et al. ("Entropy-Driven Stochastic Federated Learning in Non-IID 6G Edge-RAN") hereinafter known as Aamer.

Regarding independent claim 1, Zhong teaches:
…
…
and instructions stored in the memory and executable by the processor to cause the apparatus to: receive, from a network entity, a first set of neural network weights corresponding to a first subset of a plurality of hierarchical layers of a neural network, wherein at least two of the plurality of hierarchical layers of the neural network are aggregated at respective different network entities … and associated with different training frequencies in federated training … (Zhong [Page 425, Figure 3] Zhong teaches that the client models are updated through download from a central server which is a network entity. Zhong [Page 422, Paragraph 3]: “We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that a set of parameters (weights) is received by the local clients, which correspond to the feature (shallow) layer of the global model. Zhong [Page 423, Paragraph 5]: “The frequency of local model update is the proportion of the number of local dataset updates in the total number of training rounds.” Zhong teaches that the frequency of local model update for each client is different. Zhong [Page 422, Paragraph 3]: “In the proposed multi-layered model update strategy, parameters in the feature layer will be updated more frequently than that in the composition layer.” Indeed, Zhong teaches that the frequency of update in training is different, as that of the feature layer is faster than that of the composition layer.)
train, according to a first training frequency, a first layer of the plurality of hierarchical layers based at least in part on the first set of neural network weights and a set of training data at the UE, wherein the first layer is outside of the first subset of the plurality of hierarchical layers; (Zhong [Page 422, Paragraph 3]: “We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that after downloading the feature (shallow) layer, the local model trains the composition (deep) layer, which is outside of the first subset that corresponds to the feature layer.)
and perform a transmission to the network entity, wherein the transmission is processed at the UE through the plurality of hierarchical layers of the neural network in accordance with the training. (Zhong [Page 422, Paragraph 3]: “We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that the parameters of both layers are updated synchronously in local clients and central server, which is done through transmission from local client to central server. Zhong [Page 425, Figure 3] Zhong teaches that the central server is updated through upload (transmission) from local client to central server.)

Zhong does not teach:
An apparatus for wireless communication at a user equipment (UE), comprising: a processor;
memory coupled with the processor;

However, Ly teaches:
An apparatus for wireless communication at a user equipment (UE), comprising: a processor; (Ly ¶ [0014]: “user equipment … and/or processing system. “Ly teaches apparatus for wireless communication at UE with a processor.)
memory coupled with the processor; (Ly ¶ [0008]: “a client device for wireless communication includes a memory; and one or more processors coupled to the memory” Ly teaches memory coupled to processor.)

Zhong and Ly are in the same field of endeavor as the present invention, as the
references are directed to entities training machine learning models together in federated learning. It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine updating the layers of the neural network as taught in Zhong with incorporating channel state information feedback into the training data as taught in Ly. Ly provides this additional functionality. As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Zhong to include teachings of Ly because the combination would allow for federated learning, layer by layer, to train a model that indicates an aspect of the strength of the wireless connection between entities. This has the potential benefit of entities training just the parts of the model that their wireless connection is based on, saving time and increasing efficiency.

Zhong and Ly do not explicitly teach:
… of a disaggregated open radio access network (O-RAN) …, wherein the different network entities comprise a distributed unit (DU);

However, Aamer teaches:
… of a disaggregated open radio access network (O-RAN) …, wherein the different network entities comprise a distributed unit (DU); (Aamer [Page 2, Column 2, Paragraph 3]: “FL [federated learning] strategies to make central units (CUs) at 6G Edge-RAN collaborate in learning a certain resource usage” Aamer teaches federated learning using a 6G Edge-RAN, which are built upon O-RAN frameworks such as its open and disaggregated architecture. Aamer [Page 2, Column 2, Paragraph 4]: “the considered network corresponds to a 6G edge-RAN under the central unit (CU)/distributed unit (DU) functional split, where each transmission/reception point (TRP) is co-located with its DU, while all CUs are hosted in an edge cloud where they run as virtual network functions (VNFs).” Aamer teaches that the different network entities comprise a distributed unit, each with its own TRP.)

Aamer is in the same field as the present invention, since it is directed to facilitating federated learning using radio access networks that are open and disaggregated.  It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine federated learning, layer by layer, to train a model that indicates an aspect of the strength of the wireless connection between entities as taught in Zhong as modified by Ly with using radio access networks that are open and disaggregated as taught in Aamer. Aamer provides this additional functionality. As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Zhong as modified by Ly to include teachings of Aamer because the combination would allow for the network to be run in the cloud, without the need for aggregation. This has the potential benefit of increasing the efficiency by lowering latency while scaling properly.

Regarding dependent claim 2, Zhong and Ly teach:
The apparatus of claim 1, 

Ly teaches:
wherein the instructions are further executable by the processor to cause the apparatus to: transmit, to the network entity, at least a portion of the set of training data at the UE. (Ly ¶ [0058]: “compress measurements in a way that limits compression loss. The client device may transmit the compressed measurements to the server device.” Ly teaches transmitting measurements associated with reference signals, which is training data, to the server.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 3, Zhong and Ly teach:
The apparatus of claim 2, 

Ly teaches:
wherein the portion of the set of training data comprises channel state information feedback. (Ly ¶ [0057]: “In some examples, the client device may measure reference signals during a beam management process for channel state feedback (CSF)” Ly teaches transmitting to the server training data that includes the CSF.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 4, Zhong and Ly teach:
The apparatus of claim 1, 

Zhong teaches:
wherein the instructions are further executable by the processor to cause the apparatus to: train a second layer of the plurality of hierarchical layers based at least in part on training the first layer and the set of training data at the UE, wherein the first set of neural network weights corresponds to the second layer; (Zhong [Page 422, Paragraph 3]: “We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that after downloading the feature (shallow) layer, the local model trains the feature (shallow) layer, which corresponds to the first set of weights because the first set of weights are the feature layer from the global model.)
and transmit, to the network entity, a second set of neural network weights for the second layer based at least in part on training the second layer. (Zhong [Page 422, Paragraph 3]: “We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that the feature (shallow) layer is trained and transmitted to the global model.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 5, Zhong and Ly teach:
The apparatus of claim 1, 

Zhong teaches:
wherein the instructions are further executable by the processor to cause the apparatus to: combine the first set of neural network weights corresponding to a second layer of the plurality of hierarchical layers and a second set of neural network weights produced from training the first layer to obtain a combined set of neural network weights; (Zhong [Page 422, Paragraph 3]: “We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that the feature (shallow) layer is trained and produces updated parameters, to obtain a combined set of neural network weights.)
train the plurality of hierarchical layers of the neural network based at least in part on the combined set of neural network weights and the set of training data at the UE, the training producing a third set of neural network weights; (Zhong [Page 422, Paragraph 3]: “The process of federated learning consists of a large number of training rounds. We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that the local model is trained repeatedly with the local training data and produces the third set of neural network weights (on the third round or loop, as needed)).
and perform the transmission to the network entity comprising the third set of neural network weights based at least in part on training the plurality of hierarchical layers. (Zhong [Page 422, Paragraph 3]: “The process of federated learning consists of a large number of training rounds. We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that after training producing the third set of weights (as needed with additional rounds as needed), the weights are updated to the global model.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 6, Zhong and Ly teach:
The apparatus of claim 1, 

Zhong teaches:
wherein the instructions to perform the transmission are executable by the processor to cause the apparatus to: apply, at a first time, a second layer of the plurality of hierarchical layers to a set of data, wherein the first set of neural network weights corresponds to the second layer; (Zhong [Page 427, Table 1]: Zhong teaches that the MCDW FedAvg model, which includes the feature (shallow) layer corresponding to the second layer, is applied to various sets of data, such as MNIST data.)
and apply, at a second time after the first time, the first layer to the set of data to obtain the transmission. (Zhong [Page 427, Table 1]: Zhong teaches that the MCDW FedAvg model, which includes the composition (deep) layer which is mapped to the first layer, is applied to various sets of data, such as MNIST data. The input starts at the shallow layers and move to the deep layers.)

The reasons to combine are substantially similar to those of claim 1.


Regarding dependent claim 7, Zhong and Ly teach:
The apparatus of claim 1, 

Zhong teaches:
wherein the instructions to perform the transmission are executable by the processor to cause the apparatus to: combine the first set of neural network weights corresponding to a second layer of the plurality of hierarchical layers and a second set of neural network weights produced from training the first layer to obtain a combined set of neural network weights; (Zhong [Page 422, Paragraph 3]: “The process of federated learning consists of a large number of training rounds. We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that in the process of two training rounds: after the first round ends, the deep layer is trained and the weights are updated. In the second round, the shallow layer is updated and thus are combined from the point of view of the entire model that has a combined set of neural network weights.)
apply, at a first time, the second layer of the plurality of hierarchical layers to a set of data, wherein the second layer is trained according to the combined set of neural network weights; (Zhong [Page 427, Table 1]: Zhong teaches that the MCDW FedAvg model, which includes the feature (shallow) layer which is mapped to the second layer, is applied to various sets of data, such as MNIST data. Note that the second layer or the feature layer is trained according to the combined neural network weights.)
and apply, at a second time after the first time, the first layer to the set of data to obtain the transmission. (Zhong [Page 427, Table 1]: Zhong teaches that the MCDW FedAvg model, which includes the composition (deep) layer which is mapped to the first layer, is applied to various sets of data, such as MNIST data. Note that there are different interpretations that allow for this to come at a later time compared to the second layer. The input starts at the shallow layers and move to the deep layers.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 8, Zhong and Ly teach:
The apparatus of claim 1, 

Zhong teaches:
wherein the instructions are further executable by the processor to cause the apparatus to: train, according to a second training frequency, one or more copies of the first layer of the plurality of hierarchical layers based at least in part on the first set of neural network weights and an additional set of training data at the UE. (Zhong [Page 423, Paragraph 5]: “The frequency of local model update is the proportion of the number of local dataset updates in the total number of training rounds.” Zhong teaches that the frequency of local model update for each client is different to both each other and to itself compared to in other rounds. Zhong [Page 427, Table 1]: Zhong teaches that the MCDW FedAvg model trains the deep layer (first layer) on different datasets, which show that different “copies” of the deep layer are used in different trainings of datasets. In addition, the layers are based on the first set of weights because of the incremental, turn-based nature of the training of shallow and deep layer.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 9, Zhong and Ly teach:
The apparatus of claim 1, 

Ly teaches:
wherein the instructions to train the first layer are further executable by the processor to cause the apparatus to: determine the UE is part of a group of UEs corresponding to a radio unit (RU). (Ly ¶ [0057]: “may measure signal strength of inter-radio access technology (e.g., WiFi) networks.” Ly teaches that the client device measures the strength of radio signals across devices, which show that the group of UEs are within a radio unit.)

The reasons to combine are substantially similar to those of claim 1.


Regarding dependent claim 10, Zhong and Ly teach:
The apparatus of claim 9, 

Ly teaches:
wherein the RU is part of a group of RUs corresponding to the DU, and a second layer of the plurality of hierarchical layers is trained by the RU based at least in part on the set of training data at the UE, a set of training data at the RU, or both. (Ly ¶ [0057]: “may measure signal strength of inter-radio access technology (e.g., WiFi) networks.” Ly teaches that the client device measures the strength of radio signals across devices, which show that the group of UEs are within a radio unit.)

The reasons to combine are substantially similar to those of claim 1.


Regarding dependent claim 11, Zhong and Ly teach:
The apparatus of claim 10, 

Zhong teaches:
wherein the DU is part of a group of DUs corresponding to a centralized unit (CU), and a third layer of the plurality of hierarchical layers is trained by the DU based at least in part on the set of training data at the UE, the set of training data at the RU, a set of training data at the DU, or any combination thereof. (Zhong [Page 422, Paragraph 3]: “The process of federated learning consists of a large number of training rounds. We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that the training data at the radio unit (local client) train on a third layer of hierarchical layers.)

The reasons to combine are substantially similar to those of claim 1.


Regarding dependent claim 12, Zhong and Ly teach:
The apparatus of claim 11, 

Zhong teaches:
wherein the CU is part of a group of CUs within a core network, and a fourth layer of the plurality of hierarchical layers is trained by the CU based at least in part on the set of training data at the UE, the set of training data at the RU, the set of training data at the DU, a set of training data at the CU, or any combination thereof. (Zhong [Page 422, Paragraph 3]: “The process of federated learning consists of a large number of training rounds. We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that the training data at the radio unit (local client) train on a fourth layer of hierarchical layers.)

The reasons to combine are substantially similar to those of claim 1.


Regarding dependent claim 13, Zhong and Ly teach:
The apparatus of claim 1, 

Zhong teaches:
wherein the instructions are further executable by the processor to cause the apparatus to: process the transmission using an auto-encoder, wherein the hierarchical layers are trained at the auto-encoder based at least in part on the set of training data at the UE, a set of training data at the network entity, or both. (Zhong [Page 424, Paragraph 3] “In order to further reduce the communication overhead of federated learning, the model is compressed by pruning and Huffman coding” Zhong teaches that the model is compressed using Huffman encoding. This is the transmission that is sent to the global model.)

The reasons to combine are substantially similar to those of claim 1.


Regarding dependent claim 14, Zhong and Ly teach:
The apparatus of claim 13, 

Zhong teaches:
wherein the first layer is an outermost layer of the plurality of hierarchical layers trained at the auto-encoder, an innermost layer of the plurality of hierarchical layers trained at the auto-encoder, or both. (Zhong [Page 422, Paragraph 3]: “The process of federated learning consists of a large number of training rounds. We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Ly teaches that the deep layer, which corresponds to the first layer and is the outermost layer compared to the shallow layer, gets trained also. Zhong [Page 424, Paragraph 3] “In order to further reduce the communication overhead of federated learning, the model is compressed by pruning and Huffman coding” Zhong teaches that this model is compressed using the encoder.)

The reasons to combine are substantially similar to those of claim 1.

Regarding independent claim 15, Zhong teaches:
An apparatus for wireless communication at a network entity, comprising: a processor; (Zhong [Page 420, Paragraph 7]: “First, the central server initializes the model parameters and broadcasts them to all clients.” Zhong teaches a central server that has wireless communication to the clients. With all the other functions the central server has, it needs to have a processor.)
…
and instructions stored in the memory and executable by the processor to cause the apparatus to: transmit, to one or more user equipment (UE), a first set of neural network weights corresponding to a first subset of a plurality of hierarchical layers of a neural network, wherein at least two of the plurality of hierarchical layers of the neural network are aggregated at respective different network entities … and associated with different training frequencies in federated training, … (Zhong [Page 425, Figure 3] Zhong teaches that the client models are updated through download from a central server which is a network entity – this means that the central model transmits the weights to the clients. Zhong [Page 422, Paragraph 3]: “We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that a set of parameters (weights) is received by the local clients, which correspond to the feature (shallow) layer of the global model. Zhong [Page 423, Paragraph 5]: “The frequency of local model update is the proportion of the number of local dataset updates in the total number of training rounds.” Zhong teaches that the frequency of local model update for each client is different. Zhong [Page 422, Paragraph 3]: “In the proposed multi-layered model update strategy, parameters in the feature layer will be updated more frequently than that in the composition layer.” Indeed, Zhong teaches that the frequency of update in training is different, as that of the feature layer is faster than that of the composition layer.)
train, according to a first frequency, a first layer of the plurality of hierarchical layers based at least in part on the first set of neural network weights and one or more UE updates to the plurality of hierarchical layers of the neural network; (Zhong [Page 422, Paragraph 3]: “We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that after downloading the feature (shallow) layer, the local model trains the composition (deep) layer, which is outside of the first subset that corresponds to the feature layer. Zhong [Page 423, Paragraph 1]: “The process of the federated learning consists of six training rounds (n, n + 1, . . ., n + 5). Point (n, client1) means client1 is participating in training the central server during the stage of training round ‘n’” Zhong teaches that the central server is being trained.)
and receive, from the one or more UE, a transmission and process the transmission through the plurality of hierarchical layers of the neural network in accordance with the training. (Zhong [Page 422, Paragraph 3]: “We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that the parameters of both layers are updated synchronously in local clients and central server, which is done through transmission from local client to central server. Zhong [Page 425, Figure 3] Zhong teaches that the central server is updated through upload (transmission) from local client to central server.)


Ly teaches:
memory coupled with the processor; (Ly ¶ [0008]: “a client device for wireless communication includes a memory; and one or more processors coupled to the memory” Ly teaches memory coupled to processor.)

Aamer teaches:
… of a disaggregated open radio access network (O-RAN) … wherein the different network entities comprise a distributed unit (DU); (Aamer [Page 2, Column 2, Paragraph 3]: “FL [federated learning] strategies to make central units (CUs) at 6G Edge-RAN collaborate in learning a certain resource usage” Aamer teaches federated learning using a 6G Edge-RAN, which are built upon O-RAN frameworks such as its open and disaggregated architecture. Aamer [Page 2, Column 2, Paragraph 4]: “the considered network corresponds to a 6G edge-RAN under the central unit (CU)/distributed unit (DU) functional split, where each transmission/reception point (TRP) is co-located with its DU, while all CUs are hosted in an edge cloud where they run as virtual network functions (VNFs).” Aamer teaches that the different network entities comprise a distributed unit, each with its own TRP.)

The reasons to combine are substantially similar to those of claim 1.

Claim 16 is substantially similar to claim 2, but instead of “transmit, to the network entity”, it has the following element:
	Zhong teaches:
receive, from at least one UE of the one or more UE (Zhong [Page 425, Figure 3] Zhong teaches that the central server is updated through upload (transmission) from local client to central server – the central server receives the training data from the UE.)

The reasons to combine are substantially similar to those of claim 1.

Claim 17 is rejected on the same grounds under 35 U.S.C. 103 as claim 3, as they are
substantially similar. Mutatis mutandis.

Regarding dependent claim 18, Zhong and Ly teach:
The apparatus of claim 15, 

Zhong teaches:
wherein the instructions are further executable by the processor to cause the apparatus to: receive the transmission from at least one UE of the one or more UE comprising a second set of neural network weights for the first layer; (Zhong [Page 422, Paragraph 3]: “We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that the feature (shallow) layer is received by the global model.)
combine the second set of neural network weights for the at least one UE of the one or more UE; (Zhong [Page 422, Paragraph 3]: “We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that the feature (shallow) layer is trained and produces updated parameters, to obtain a combined set of neural network weights.)
and train the first layer based at least in part on the combined second set of neural network weights, wherein the one or more UE updates comprise the combined second set of neural network weights. (Zhong [Page 422, Paragraph 3]: “The process of federated learning consists of a large number of training rounds. We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that the local model is trained repeatedly with the local training data and produces the third set of neural network weights (on the third round or loop, as needed)).

The reasons to combine are substantially similar to those of claim 1.



Claims 19, 20, 21, 22, 23 are rejected on the same grounds under 35 U.S.C. 103 as claim 9, 10, 11, 12, 13 as they are substantially similar, respectively. Mutatis mutandis.

Regarding dependent claim 24, Zhong and Ly teach:
The apparatus of claim 23, 
wherein a second layer of the plurality of hierarchical layers is associated with a first UE of the one or more UE and a third layer of the plurality of hierarchical layers is associated with a second UE of the one or more UE, and wherein the first layer, the second layer, and the third layer are trained at the auto-encoder. (Zhong [Page 422, Paragraph 3]: “The process of federated learning consists of a large number of training rounds. We record three training rounds as one loop. In each loop, only parameters of the feature layer are updated during the first two training rounds both in local clients and the central server. And only in the last training round, parameters of both the feature layer and the composition layer are updated synchronously” Zhong teaches that the second layer and the third layer can come from different UEs. Zhong [Page 424, Paragraph 3] “In order to further reduce the communication overhead of federated learning, the model is compressed by pruning and Huffman coding” Zhong teaches that the model is compressed using Huffman encoding.)

The reasons to combine are substantially similar to those of claim 1.


Claims 25, 26, 27, 28, 29, 30 are rejected on the same grounds under 35 U.S.C. 103 as claims 14, 1, 2, 4, 5, 15 as they are substantially similar, respectively. Mutatis mutandis.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KYU HYUNG HAN whose telephone number is (703) 756-5529.  The examiner can normally be reached on MF 9-5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Kyu Hyung Han/
Examiner
Art Unit 2123 

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
Prosecution Timeline

Mar 17, 2022
Application Filed
Jan 21, 2025
Non-Final Rejection — §103
Apr 09, 2025
Response Filed
Jul 22, 2025
Final Rejection — §103
Sep 23, 2025
Response after Non-Final Action
Oct 08, 2025
Request for Continued Examination
Oct 15, 2025
Response after Non-Final Action
Jan 15, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/332,295
Patent 12585928
HARDWARE ARCHITECTURE FOR INTRODUCING ACTIVATION SPARSITY IN NEURAL NETWORK
2y 5m to grant Granted Mar 24, 2026
17/317,300
Patent 12387101
SYSTEMS AND METHODS FOR PRUNING BINARY NEURAL NETWORKS GUIDED BY WEIGHT FLIPPING FREQUENCY
2y 5m to grant Granted Aug 12, 2025
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
43%
Grant Probability
85%
With Interview (+41.7%)
4y 6m
Median Time to Grant
High
PTA Risk
Based on 7 resolved cases by this examiner. Grant probability derived from career allow rate.