Last updated: May 29, 2026

Application No. 18/734,103

FEDERATED LEARNING FOR WIRELESS COMMUNICATIONS SYSTEMS

Non-Final OA §103

Filed

Jun 05, 2024

Priority

Jun 26, 2023 — provisional 63/510,283

Examiner

TURRIATE GASTULO, JUAN CARLOS

Art Unit

2446

Tech Center

2400 — Computer Networks

Assignee

Apple Inc.

OA Round

1 (Non-Final)

Interview Optional

— +35.2% interview lift. Examiner has a relatively high allowance rate (72%); +35.2% interview lift. A written response may suffice.

Based on 379 resolved cases, 2023–2026

Examiner Intelligence

TURRIATE GASTULO, JUAN CARLOS View full profile →

Grants 72% — above average

Career Allowance Rate

272 granted / 379 resolved

+13.8% vs TC avg

Strong +35% interview lift

Without

With

+35.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

17 currently pending

Career history

407

Total Applications

across all art units

Statute-Specific Performance

§101

1.1%

-38.9% vs TC avg

§103

95.0%

+55.0% vs TC avg

§102

2.6%

-37.4% vs TC avg

§112

0.4%

-39.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 379 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is in response to application filed 06/05/2024.
Claims 1-20 are pending in this application.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/17/2025 has been placed in record and considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-6, 8-14, 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Keshavamurthy et al. (US 2025/0371370 A1) in view of Krouka et al. (US 2025/0265500 A1) in further view of Park et al. (“FedFwd: Federated Learning without Backpropagation” – Published June 19, 2023) 
Regarding claim 1, Keshavamurthy discloses a method for federated learning for a wireless communication system ([0001]:  performing training of a model using federated learning), the method comprising:
initializing parameters of models for user equipment (UE) devices ([0149]:  A local training result reported by a particular UE may comprise updates for parameters of the local model at the particular UE. In some embodiments, the updates for the parameters of the local model are the values of the parameters of the local model when training of the local model has been completed at the UE);
selecting, via a network node, two or more of the UE devices as selected UE devices to participate in federated training ([0174]:  The selection of K UEs for local training can in some situations be random or based on any suitable UE selection scheme that takes into account the information included in the received FL reports (provided by the UEs). In the following example one of the selected UEs is UE1 300a….other selected UEs such as for example UE3 300c which could also have been selected as a first UE along with UE1 300a);
aggregating, at the network node, the local models to generate a global model ([0037]:  combining the local training results to generate aggregated training results for the model);
iteratively selecting the two or more of the UE devices ([0147]:  the FL aggregator 400 can send training configuration to the K selected UEs to enable the selected UEs to perform local training. The FL aggregator 400 is configured to then send a signal to each of the selected UEs (e.g., UE1 and UE3) to perform local training of their local models. For example, for iteration N 451, FL is performed with UE1 and UE3 as shown by the dashed box 411), and aggregating the local models until the global model converges ([0161]:  the FL iteration sets can be repeated or continued until the FL model converges (i.e., until the parameters of the global model are optimized).
However, Keshavamurthy does not disclose selecting, at the selected UE devices, one or more model layers as selected model layers to participate in the federated training.
In an analogous art, Krouka discloses selecting, at the selected UE devices, one or more model layers as selected model layers to participate in the federated training  ([0073]:  the selected cut-layer index affects the processing energy (e.g., memory access and computation). Moreover, although the devices may share the same ML model architecture, the energy consumption may be different even for devices that selecting the same cut-layer index).
Therefore, it would have been obvious before the effective filed date of the claimed invention to a person having ordinary skill in the art to modify Keshavamurthy to comprise “selecting, at the selected UE devices, one or more model layers as selected model layers to participate in the federated training” taught by Krouka.
One of ordinary skilled in the art would have been motivated because it would have enabled the devices process a fraction of the ML model and transmit the output of the partitioning layer to the centralized server at every communication round (Krouka, [0068]).  

However, Keshavamurthy-Krouka does not disclose performing, at the selected UE devices, forward-forward learning on the selected model layers to generate local models for the selected model layers of the selected UE devices; iteratively selecting the one or more of the model layers, performing the forward-forward learning on the selected model layers.
In an analogous art, Park discloses performing, at the selected UE devices, forward-forward learning on the selected model layers to generate local models for the selected model layers of the selected UE devices (pg. 2, right column, [0001]-[0002]:  federated learning algorithm, FedFwd, which follows the core steps of FedAvg, encompassing three primary stages: 1) the selection of a client subset at each iteration, 2) execution of local parameter updates, and 3) subsequent aggregation of these updates at the server. The Forward-Forward algorithm, a greedy layer-wise learning technique, adopts an alternative approach to the traditional backpropagation’s one forward and one backward pass by employing two forward passes. This algorithm uniquely trains each layer by leveraging a measure of goodness); iteratively selecting the one or more of the model layers, performing the forward-forward learning on the selected model layers (Pg. 3, left column, [0003]:  To ensure a fair comparison, we design FedFwd and FedAvgmodels to have the same number of layers and a similar number of parameters with a marginal parameter difference of approximately 1%).
Therefore, it would have been obvious before the effective filed date of the claimed invention to a person having ordinary skill in the art to modify Keshavamurthy-Krouka to comprise “performing, at the selected UE devices, forward-forward learning on the selected model layers to generate local models for the selected model layers of the selected UE devices; iteratively selecting the one or more of the model layers, performing the forward-forward learning on the selected model layers” taught by Park.
One of ordinary skilled in the art would have been motivated because it would have enabled the forward-forward algorithm to reduce the computational burden on local clients by eliminating the need to store all intermediate activations in memory (Park, pg. 1, right column, [0001]).  

Regarding claim 2, Keshavamurthy-Krouka-Park disclose the method for federated learning of claim 1, wherein the parameters are initialized with values that are random, pre-trained, or obtained from the network node (Keshavamurthy, [0138]:  the central node provides a global model comprising parameters or data to the distributed nodes and each of the distributed nodes performs local training of a local model (referred to hereinafter a local model training) using a dataset comprising data of the distributed node during an iteration of FL). 

Regarding claim 3, Keshavamurthy-Krouka-Park disclose the method for federated learning of claim 1, wherein the UE devices are selected based on their availability, reliability, or randomly (Keshavamurthy, [0145]:  The selection of UEs by the FL aggregator 400 can in some situations be random or based on any suitable UE selection scheme that takes into account the obtained FL reports (provided by the UEs). 

Regarding claim 4, Keshavamurthy-Krouka-Park disclose the method for federated learning of claim 1, wherein the one or more model layers are selected based on criteria determined by each of the selected UE devices, wherein a number of one or more model layers selected by the selected UE devices is based on available computational resources, transmission bandwidth, battery power, model evaluation errors, or local dataset size (Krouka, [0073]:  the selected cut-layer index affects the processing energy (e.g., memory access and computation). Moreover, although the devices may share the same ML model architecture, the energy consumption may be different even for devices that selecting the same cut-layer index. The reason for this is that the transmission energy E.sub.t depends on the output size of the cut-layer, as well as the radio channel conditions of the devices).  The same rationale applies as in claim 1.

Regarding claim 5, Keshavamurthy-Krouka-Park disclose the method for federated learning of claim 1, wherein aggregating is performed by averaging layers (Park, pg.3, left column, [0002]:  FedAvg by varying the size and depth of the hidden 92.57 Table 1. The comparison results between FedFwd (FF) and FedAvg (BP) on MNIST dataset. layers. We then evaluate the training speed of FedFwd and FedAvg based on the size of the mini-batch). The same rationale applies as in claim 1. 

Regarding claim 6, Keshavamurthy-Krouka-Park disclose the method for federated learning of claim 1, wherein model aggregation is a layer-wise operation (Krouka, [0149]:  network node 150 may aggregate the gradients received from devices 110-1 and 110-2, and any other devices participating in the collaborative training. Aggregation of gradients may comprise any suitable method for combining the gradients of respective instances of the second ML model, for example averaging them. Note that in case of SL mode part of the gradients (up to the cut-layer) are determined by devices 110 and the rest of them are determined by network node 150. In case of FL mode, all gradients are determined by a device). The same rationale applies as in claim 1.

Regarding claim 8, Keshavamurthy-Krouka-Park disclose the method for federated learning of claim 1, wherein each of the two or more selected UE devices choose a set of layers for the federated training, wherein a first layer from the set of layers determines an amount of training (Krouka, [0130]: The energy consumption of device 110-1 associated with transmission/reception of data (e.g., transmission of the output of the cut-layer and/or reception of respective gradients) may be estimated based on the transmit energy of device 110-1 and the transmission time, which may be dependent on the amount of the training output data of the cut-layer and the gradients). The same rationale applies as in claim 1.

Regarding claim 9, Keshavamurthy-Krouka-Park disclose the method for federated learning of claim 1, wherein the local models that the UE devices send includes partial layers of a neural network (Krouka, [0108]:  split learning with cut-layer index i(SL.sup.i), federated learning with the index of the cut-layer corresponding to the final layer of the second ML model, or an idle mode indicative of the respective device 110-1, 110-2, 110-3 not participating in the collaborative training of the second ML model. The training modes of devices 110 may be therefore indicative of respective cut-layers devices 110 configured to provide the training output data of the second ML model). The same rationale applies as in claim 1.

Regarding claim 10, Keshavamurthy discloses a network node apparatus comprising:
a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to:
initialize parameters of models for user equipment (UE) devices ([0149]:  A local training result reported by a particular UE may comprise updates for parameters of the local model at the particular UE. In some embodiments, the updates for the parameters of the local model are the values of the parameters of the local model when training of the local model has been completed at the UE);
select two or more of the UE devices as selected UE devices to participate in federated training ([0174]:  The selection of K UEs for local training can in some situations be random or based on any suitable UE selection scheme that takes into account the information included in the received FL reports (provided by the UEs). In the following example one of the selected UEs is UE1 300a….other selected UEs such as for example UE3 300c which could also have been selected as a first UE along with UE1 300a);
aggregate, at the network node, the local models to generate a global model ([0037]:  combining the local training results to generate aggregated training results for the model).
However, Keshavamurthy does not disclose instruct the selected UE devices to select one or more model layers as selected model layers to participate in the federated training.
In an analogous art, Krouka discloses instruct the selected UE devices to select one or more model layers as selected model layers to participate in the federated training ([0106]:  a reward (e.g., R=1) may be provided if the SL mode is selected with cut-layer index i that results in the minimum estimated energy consumption for a device among the selectable cut-layer indices and the energy consumption is below a threshold E.sub.max (e.g., maximum allowed energy consumption).
Therefore, it would have been obvious before the effective filed date of the claimed invention to a person having ordinary skill in the art to modify Keshavamurthy to comprise “instruct the selected UE devices to select one or more model layers as selected model layers to participate in the federated training” taught by Krouka.
One of ordinary skilled in the art would have been motivated because it would have enabled the devices process a fraction of the ML model and transmit the output of the partitioning layer to the centralized server at every communication round (Krouka, [0068]).  
However, Keshavamurthy-Krouka does not disclose receive local models generated by the selected UE devices performing forward-forward learning on the selected model layers.
In an analogous art, Park discloses receive local models generated by the selected UE devices performing forward-forward learning on the selected model layers (pg. 2, right column, [0001]-[0002]:  a federated learning algorithm, FedFwd, which follows the core steps of FedAvg, encompassing three primary stages: 1) the selection of a client subset at each iteration, 2) execution of local parameter updates, and 3) subsequent aggregation of these updates at the server. The Forward-Forward algorithm, a greedy layer-wise learning technique, adopts an alternative approach to the traditional backpropagation’s one forward and one backward pass by employing two forward passes. This algorithm uniquely trains each layer by leveraging a measure of goodness. Pg. 3, left column, [0003]:  To ensure a fair comparison, we design FedFwd and FedAvgmodels to have the same number of layers and a similar number of parameters with a marginal parameter difference of approximately 1%).
Therefore, it would have been obvious before the effective filed date of the claimed invention to a person having ordinary skill in the art to modify Keshavamurthy-Krouka to comprise “receive local models generated by the selected UE devices performing forward-forward learning on the selected model layers” taught by Park.
One of ordinary skilled in the art would have been motivated because it would have enabled the forward-forward algorithm to reduce the computational burden on local clients by eliminating the need to store all intermediate activations in memory (Park, pg. 1, right column, [0001]).  

Regarding claim 11; the claim is interpreted and rejected for the same reason as set forth in claim 2.

Regarding claim 12; the claim is interpreted and rejected for the same reason as set forth in claim 3.

Regarding claim 13; the claim is interpreted and rejected for the same reason as set forth in claim 5.

Regarding claim 14; the claim is interpreted and rejected for the same reason as set forth in claim 6.

Regarding claim 16; the claim is interpreted and rejected for the same reason as set forth in claim 9.

Regarding claim 17, Keshavamurthy discloses user equipment (UE) apparatus comprising:
a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to: initialize parameters of models for the UE apparatus ([0149]:  A local training result reported by a particular UE may comprise updates for parameters of the local model at the particular UE. In some embodiments, the updates for the parameters of the local model are the values of the parameters of the local model when training of the local model has been completed at the UE);
receive an indication from a network node that the UE apparatus is selected to participate in a federated training ([0174]:  The selection of K UEs for local training can in some situations be random or based on any suitable UE selection scheme that takes into account the information included in the received FL reports (provided by the UEs). In the following example one of the selected UEs is UE1 300a….other selected UEs such as for example UE3 300c which could also have been selected as a first UE along with UE1 300a);
send the local model to the network node for aggregation to generate a global model ([0037]:  combining the local training results to generate aggregated training results for the model).
However, Keshavamurthy does not disclose select one or more model layers as selected model layers to participate in the federated training.
In an analogous art, Krouka discloses select one or more model layers as selected model layers to participate in the federated training ([0106]:  a reward (e.g., R=1) may be provided if the SL mode is selected with cut-layer index i that results in the minimum estimated energy consumption for a device among the selectable cut-layer indices and the energy consumption is below a threshold E.sub.max (e.g., maximum allowed energy consumption).
Therefore, it would have been obvious before the effective filed date of the claimed invention to a person having ordinary skill in the art to modify Keshavamurthy to comprise “select one or more model layers as selected model layers to participate in the federated training” taught by Krouka.
One of ordinary skilled in the art would have been motivated because it would have enabled the devices process a fraction of the ML model and transmit the output of the partitioning layer to the centralized server at every communication round (Krouka, [0068]).  
However, Keshavamurthy-Krouka does not disclose perform forward-forward learning on the selected model layers to generate local models for the selected model layers.
In an analogous art, Park discloses perform forward-forward learning on the selected model layers to generate local models for the selected model layers (pg. 2, right column, [0001]-[0002]:  A federated learning algorithm, FedFwd, which follows the core steps of FedAvg, encompassing three primary stages: 1) the selection of a client subset at each iteration, 2) execution of local parameter updates, and 3) subsequent aggregation of these updates at the server. The Forward-Forward algorithm, a greedy layer-wise learning technique, adopts an alternative approach to the traditional backpropagation’s one forward and one backward pass by employing two forward passes. This algorithm uniquely trains each layer by leveraging a measure of goodness. Pg. 3, left column, [0003]:  To ensure a fair comparison, we design FedFwd and FedAvgmodels to have the same number of layers and a similar number of parameters with a marginal parameter difference of approximately 1%).
Therefore, it would have been obvious before the effective filed date of the claimed invention to a person having ordinary skill in the art to modify Keshavamurthy-Krouka to comprise “perform forward-forward learning on the selected model layers to generate local models for the selected model layers” taught by Park.
One of ordinary skilled in the art would have been motivated because it would have enabled the forward-forward algorithm to reduce the computational burden on local clients by eliminating the need to store all intermediate activations in memory (Park, pg. 1, right column, [0001]).  

Regarding claim 18; the claim is interpreted and rejected for the same reason as set forth in claim 2.

Regarding claim 19; the claim is interpreted and rejected for the same reason as set forth in claim 3.

Regarding claim 20; the claim is interpreted and rejected for the same reason as set forth in claim 4.

Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Keshavamurthy in view of Krouka in view of Park, as applies to claim 1, in further view of Mo et al. (US 2024/0144009 A1).
Regarding claim 7, Keshavamurthy-Krouka-Park disclose the method for federated learning of claim 1.
However, Keshavamurthy-Krouka-Park does not disclose wherein the selected model layers aggregation can be homogeneous or heterogeneous, wherein homogeneous aggregation comprises averaging neural network coefficients on a same layer among all UE devices, and wherein heterogeneous aggregation comprises averaging neural network coefficients on different layers.
In an analogous art, Mo discloses wherein the selected model layers aggregation can be homogeneous or heterogeneous, wherein homogeneous aggregation comprises averaging neural network coefficients on a same layer among all UE devices, and wherein heterogeneous aggregation comprises averaging neural network coefficients on different layers (([0237]:  a method of federated learning 330. At stage 232, the method identifies the model part. If the model part is an encoder 12 it is aggregated by averaging 336. If the model part is a predictor 14 it is determined if it is homogeneous or heterogeneous with a predictor 14 that is being averaged at stage 334. [0245]:  Homogeneous aggregation can occur for predictors 14 associated with the same computational resource class. Heterogeneous aggregation can occur across the aggregated predictors 14 of the different classes).
Therefore, it would have been obvious before the effective filed date of the claimed invention to a person having ordinary skill in the art to modify Keshavamurthy-Krouka-Park to comprise “wherein the crawling is performed until a fulfilment of a stop condition, in particular an expiry of a pre-set crawling period associated with the input message, is determined” taught by Mo.
One of ordinary skilled in the art would have been motivated because it would have enabled to perform federated learning for the neural network based on homogeneous or heterogeneous predictors (Mo, [0237]).  

Regarding claim 15; the claim is interpreted and rejected for the same reason as set forth in claim 7.

Additional References
	The prior art made of record and not relied upon is considered pertinent to applicants disclosure.
Balevi et al., US 2023/0316062 A1: Layer by Layer Training for Federated Learning. 
Li et al., US 2023/0082173: Data processing, Method, Federated Learning Training Method, and Related Apparatus and Device.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUAN C TURRIATE GASTULO whose telephone number is (571)272-6707. The examiner can normally be reached Monday - Friday 8 am-4 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian J Gillis can be reached at 571-272-7952. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.





Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/J.C.T/Examiner, Art Unit 2446     

/BRIAN J. GILLIS/Supervisory Patent Examiner, Art Unit 2446

Read full office action

Prosecution Timeline

Jun 05, 2024

Application Filed

Feb 25, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/672,060

Patent 12641052

Methods and Devices for Switching Data Frames in a Communications Network

2y 0m to grant Granted May 26, 2026

18/560,702

Patent 12634167

EDGE PLATFORM MANAGEMENT DEVICE, OPERATING METHOD OF EDGE PLATFORM MANAGEMENT DEVICE, AND EDGE GATEWAY DEVICE

1y 12m to grant Granted May 19, 2026

18/292,960

Patent 12603795

INFORMATION PROCESSING TERMINAL, INFORMATION PROCESSING DEVICE, AND SYSTEM

2y 2m to grant Granted Apr 14, 2026

18/666,106

Patent 12587432

Visual Map for Network Alerts

1y 10m to grant Granted Mar 24, 2026

17/743,421

Patent 12574436

BLOCKCHAIN MACHINE BROADCAST PROTOCOL WITH LOSS RECOVERY

3y 10m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

72%

Grant Probability

99%

With Interview (+35.2%)

3y 0m (~1y 0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 379 resolved cases by this examiner. Grant probability derived from career allowance rate.