Last updated: April 19, 2026
Application No. 17/098,320
SYSTEM AND METHOD FOR CONTROLLING MULTIPLE DEVICES THROUGH FEDERATED REINFORCEMENT LEARNING

Non-Final OA §103§112
Filed
Nov 13, 2020
Examiner
ZECHER, CORDELIA P K
Art Unit
2100
Tech Center
2100 — Computer Architecture & Software
Assignee
Korea University Of Technology And Education Industry-University Cooperation Foundation
OA Round
3 (Non-Final)
This examiner grants 50% of cases after interview

— +25.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 509 resolved cases, 2023–2026
Examiner Intelligence

ZECHER, CORDELIA P K View full profile →
Grants 50% of resolved cases
Career Allow Rate
253 granted / 509 resolved
-5.3% vs TC avg
Strong +26% interview lift
Without
With
+25.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 8m
Avg Prosecution
287 currently pending
Career history
796
Total Applications
across all art units
Statute-Specific Performance

§101
19.0%
-21.0% vs TC avg
§103
46.8%
+6.8% vs TC avg
§102
13.1%
-26.9% vs TC avg
§112
16.0%
-24.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 509 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2025-03-14 has been entered. Applicant has made no amendments. The status of claims is as follows:
Claims 1-10 remain pending in the application.
 
Response to Arguments
Applicant's arguments filed in response to rejections under 35 USC 103 have been fully considered and are partially persuasive; meaning that, Examiner has determined that the cited portions of Ahn fall short of teaching the claimed limitations; however, the previously applied combination still teaches the claimed limitations, with different passages from Ahn cited instead. Details follow.
Applicant argues that  “Ahn discloses a different kind of distributed DNN as follows, regarding to Caffe-MPI.”
Examiner agrees, this passage of Ahn does not teach averaging of gradients in Ahn’s invention, but is rather a reference to another system entirely.
Applicant argues that “Ahn’s method for distributed DNN follows a different approach: a parameter server updates the global weight whenever a gradient arrives from a worker, without aggregating all gradients that may arrive early or late from distributed workers. The parameter server and workers exchange weight parameters learned by them, calculate the difference between the two weight vectors, and update their own weight parameters by adding the scaled difference to it.” Applicant also states that “while the master worker calculated the average of the gradients, it also updates the master weights, meaning that this is not a federation among workers” and “there is no weight parameter update function in the federated reinforcement learning.” Applicant then appears to further argue that Zhang does not cure Ahn’s deficiencies, but does not make any particular argument against Zhang besides a conclusory statement that it is ”neither necessary nor relevant to the current invention” and the “concept is fundamentally different”. Examiner can identify no particular argument against the mapping of limitations to which to respond, but points out that the test for obvious ness is not bodily incorporation of one reference into another, but rather what would be suggested to one of ordinary skill in the art.
Examiner respectfully disagrees. Applicant is partially correct in that the parameter server and workers exchange weight parameters and calculate the difference to update their own weight parameters. However, this does not mean that there is no aggregation of gradients at the server. Indeed, Ahn states that the global weights are updated as a “moving averaging rate” as shown in Page 1121 at the top right, including Equation 4. Furthermore, the global weights are not only aggregated as a “moving average” of weight updates, but these global weights are indeed shared with the workers. Ahn, Page 1122 Section G states: “Thus, each worker’s main thread reads the global weight (Wg) at the start point of every iteration (T1) and updates the local weight by calculating weight increment from the difference between the global weight and the local weight (T2).” Thus, Examiner disagrees that “there is no weight parameter update function” or that “this is not a federation among workers”. Indeed, updated global parameters are shared with each worker at the beginning of each training iteration, and since each worker shares its updates with the parameter server, this is indeed a federation of workers. Examiner’s understanding of the term “federated learning”, is that the broadest reasonable interpretation encompasses a distributed training scenario in which raw training data is not shared with a server, but instead weights or gradients are shared and communicated between server and workers. Examiner disagrees that Ahn’s system is not “federated”.
Examiner notes that below, the same combination of references is applied, but the different sections of Ahn which better match the claimed limitations are now set forth in the rejections below. 
In the section “Prior Art of Record”, Examiner identifies other prior art identified during the new round of search, and particularly points out Blanchard, which states that aggregating and averaging parameters is typical in the art.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are:
Claim 1: “a plurality of device controllers configured to perform each of reinforcement learnings”
Claim 2: “a federated reinforcement learning unit configured to generate a learning model” and “is characterized in that the federated reinforcement learning is performed to complete the reinforcement learning in earlier stage than individually processed reinforcement learnings”
Claim 2: “a gradient reporting unit configured to calculate the gradient”
Claim 2: “an average gradient receiving unit configured to receive the average gradient”
Claim 2: “a learning parameter reporting unit configured to report the learning parameter”
Claim 2: “a learning parameter receiving unit configured to receive a first reported learning parameter”
Claim 4: “a device control unit configured to control the device”
Claim 4: “a device state information providing unit configured to provide a state information”
Claim 5: “a gradient receiving unit configured to request and receive the gradient”
Claim 5: “a gradient sharing unit configured to transmit and share the average gradient”
Claim 5: “a learning parameter receiving unit configured to receive the learning parameter”
Claim 5: “a learning parameter providing unit configured to provide and transfer the received learning parameter”
Claim 6: “a device state information receiving unit configured to receive device state information”
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-6 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The following claim limitations invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. 
-	Claim 1: “a plurality of device controllers configured to perform each of reinforcement learnings”
-	Claim 2: “a federated reinforcement learning unit configured to generate a learning model” and “is characterized in that the federated reinforcement learning is performed to complete the reinforcement learning in earlier stage than individually processed reinforcement learnings”
-	Claim 2: “a gradient reporting unit configured to calculate the gradient”
-	Claim 2: “an average gradient receiving unit configured to receive the average gradient”
-	Claim 2: “a learning parameter reporting unit configured to report the learning parameter”
-	Claim 2: “a learning parameter receiving unit configured to receive a first reported learning parameter”
-	Claim 4: “a device control unit configured to control the device”
-	Claim 4: “a device state information providing unit configured to provide a state information”
-	Claim 5: “a gradient receiving unit configured to request and receive the gradient”
-	Claim 5: “a gradient sharing unit configured to transmit and share the average gradient”
-	Claim 5: “a learning parameter receiving unit configured to receive the learning parameter”
-	Claim 5: “a learning parameter providing unit configured to provide and transfer the received learning parameter”
-	Claim 6: “a device state information receiving unit configured to receive device state information”
Examiner notes that there is no structure recited in the Claims or the Specification describing what any “controller” or “unit” comprises. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Claim 3 is rejected because it inherits the deficiencies of Claim 1.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mirhoseini et al. (Device Placement Optimization with Reinforcement Learning, hereinafter "Mirhoseini") in view of Ahn et al. (ShmCaffe: A Distributed Deep Learning Platform with Shared Memory Buffer for HPC Architecture, hereinafter "Ahn") and Zhang et al. (Stay Fresh: Speculative Synchronization for Fast Distributed Machine Learning, hereinafter "Zhang").

Regarding Claim 1,
Mirhoseini teaches a system for controlling multiple devices through a federated reinforcement learning, comprises:
a plurality of device controllers (Figure 3, "Our framework consists of several controllers" sec 3.4, p. 4) configured to perform each of reinforcement learnings ("neural networks and reinforcement learning for combinatorial optimization" sec. 2, p. 2) to control each of a plurality of devices ("we apply our proposed method to assign computations to devices" sec. 4, p. 4) and report gradients calculated in a process of the reinforcement learnings and a learning parameter according to completion of each of the reinforcement learnings to a federated reinforcement learning managing server ("Each replica independently performs forward and backward passes to compute the model’s gradients with respect to a minibatch of 32 images and then updates the parameters asynchronously." sec. sec. 4.4, p. 7; "All of the controllers interact with a single shared parameter server… When all of the running times are received, the controller uses the running times to scale the corresponding gradients to asynchronously update the controller parameters that reside in the parameter server." sec. 3.4, p. 4).

Mirhoseini does not explicitly teach wherein the federated reinforcement learning managing server is configured to average the reported gradients, share the average gradient with the plurality of device controllers, and transfer the reported learning parameter to at least more than one of the devices in which corresponding reinforcement learning is not completed.

Ahn teaches wherein the federated reinforcement learning managing server is configured to average the reported gradients, share the average gradient with the plurality of device controllers  (Page 1123 Top Right: " On the other hand, the worker using the EASGD method updates the local weight from local gradient learned by itself after the training of each minibatch in (2). The updated local weight and the global weight are exchanged between workers and the parameter server, and they update their own weight based on the difference of exchanged weight. Equation (3) is the second weight update formula in the workers, and (4) is the weight update formula in the parameter server. As workers use the learning rate (η) when updating the local weight in (2), the moving averaging rate (α) is used as in (3) and (4).

    PNG
    media_image1.png
    66
    280
    media_image1.png
    Greyscale

ShmCaffe use the SGD optimizer of Caffe to update the local weight. ShmCaffe workers calculate weight increment (ΔWx) in (5) and update the local weights (6). The workers store the weight increment in the shared buffer of the SMB server, and updates the global weights (Wg) by accumulating the weight increment into the global weights of the SMB server (7).”), and 
transfer the reported learning parameter to at least more than one of the devices in which corresponding reinforcement learning is not completed (Page 1122 Section F: “The global weight parameter (Wg) buffer created in the SMB server is shared by all deep learning workers. Each worker allocates a shared memory buffer (ΔWx) at the SMB server to store the weight increment computed from the difference between its local weight and the global weight. This buffer is not shared among the other workers. The values of ΔWx are accumulated to the global weight buffer (Wg), as shown in Fig. 5.” and Section G: “Thus, each worker’s main thread reads the global weight (Wg) at the start point of every iteration (T1) and updates the local weight by calculating weight increment from the difference between the global weight and the local weight (T2).”. Examiner notes that the recitation of an iterative process also teaches that the reinforcement learning is not completed, with the only exception being the final iteration.)
Mirhoseini and Ahn are analogous art because both are directed to asynchronous distributed training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the asynchronous training method of Mirhoseini with the asynchronous training method of Ahn.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training time, as suggested by Ahn ("ShmCaffe is 10.1 times faster than Caffe and 2.8 times faster than Caffe-MPI for deep neural network training" sec. Abstract, p. 1118) and also because the hybrid system in which both workers and the parameter server update the weights is faster than the previous “Downpour” method, as stated by Ahn at the bottom left of Page 1121: “Weights are exchanged between the parameter server and the workers when using the EASGD scheme. EASGD is more efficient than the Downpour SGD, in which the weight update is performed by the parameter server. In the EASGD method, the weight update is performed by both the worker and the parameter server.”)

The combination of Mirhoseini and Ahn does not explicitly teach wherein the system is characterized in that the reinforcement learning is completed earlier than individually processed reinforcement learnings by performing the federated reinforcement learning in coalition with the reinforcement learnings through the shared average gradient and transferred learning parameter.
Zhang teaches wherein the system is characterized in that the reinforcement learning is completed earlier than individually processed reinforcement learnings by performing the federated reinforcement learning in coalition with the reinforcement learnings through the shared average gradient and transferred learning parameter ("Fig. 6: Illustration of speculative synchronization. Worker-1 speculatively aborts computation after observing the two pushes made by two peers. It pulls parameters again and starts over." p. 102; "Instead of imposing an arbitrary delay without justification, we let the worker asynchronously proceed to the next iteration immediately, while at the same time speculating about the updates made by others. Once the worker learns that the global parameters have been updated “enough” times, it will abort the ongoing iteration, pull the fresher parameters to start over—if that is not too late yet. Continuing the example in Fig. 2, we apply speculative synchronization and illustrate workers’ behaviors in Fig. 6. We start to focus on worker-1. After finishing the first iteration, it pulls parameters and starts the next iteration immediately. Shortly after it starts, it learns that two other peers have pushed updates to servers (highlighted in Fig. 6), which it views as a significant-enough change made to the global parameters. Worker-1 hence aborts the ongoing iteration, re-synchronizes with servers to include those two recent updates, and starts over with much fresher parameters. The abort-and-restart decision is also made by worker-4 upon its notice of two updates pushed shortly after the second iteration starts. In contrast, workers 2 and 3 choose not to restart as they do not see enough updates (two in this example) pushed to servers since their last pulls." p. 102-103).
Mirhoseini and Zhang are analogous art because both are directed to distributed training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the asynchronous distributed training of the Mirhoseini/Ahn combination with the speculative synchronization of Zhang.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training, as suggested by Zhang ("Experimental results show that speculative synchronization achieves up to 3× speedup over the asynchronous parallel scheme in many machine learning applications, with little additional communication overhead." sec. Abstract).

Regarding Claim 2,
The Mirhoseini/Ahn/Zhang combination teaches the system of claim 1.  
Mirhoseini further teaches wherein the plurality of device controllers further comprises a federated reinforcement learning unit configured to generate a learning model for controlling the device through the federated reinforcement learning ("We train Inception-V3 on the ImageNet dataset (Russakovsky et al., 2015) until the model reaches the accuracy of 72% on the validation set." sec. 4.4, p. 7),
wherein the federated reinforcement learning unit comprises:  a gradient reporting unit configured to calculate the gradient for the reinforcement learning currently being performed according to a request of the federated reinforcement learning managing server and report the calculated gradient to the federated reinforcement learning managing server ("Each replica independently performs forward and backward passes to compute the model’s gradients with respect to a minibatch of 32 images and then updates the parameters asynchronously." sec. sec. 4.4, p. 7; "All of the controllers interact with a single shared parameter server… When all of the running times are received, the controller uses the running times to scale the corresponding gradients to asynchronously update the controller parameters that reside in the parameter server." sec. 3.4, p. 4);

Mirhoseini does not explicitly teach an average gradient receiving unit configured to receive the average gradient obtained by calculating average of the plurality of gradients reported from the federated reinforcement learning managing server; a learning parameter reporting unit configured to report the learning parameter to the federated reinforcement learning managing server; and a learning parameter receiving unit configured to receive the first reported learning parameter from the federated reinforcement learning managing server; wherein the federated reinforcement learning unit is characterized in that the federated reinforcement learning […] by performing continuously the reinforcement learnings by using the received average gradient and the received learning parameter, in case that the learning parameter are received under the state that corresponding reinforcement learning is not completed
Ahn teaches an average gradient receiving unit configured to receive the average gradient obtained by calculating average of the plurality of gradients reported from the federated reinforcement learning managing server (Page 1123 Top Right: " On the other hand, the worker using the EASGD method updates the local weight from local gradient learned by itself after the training of each minibatch in (2). The updated local weight and the global weight are exchanged between workers and the parameter server, and they update their own weight based on the difference of exchanged weight. Equation (3) is the second weight update formula in the workers, and (4) is the weight update formula in the parameter server. As workers use the learning rate (η) when updating the local weight in (2), the moving averaging rate (α) is used as in (3) and (4))”.
a learning parameter reporting unit configured to report the learning parameter to the federated reinforcement learning managing server; and a learning parameter receiving unit configured to receive the first reported learning parameter from the federated reinforcement learning managing server (Page 1122 Section F: “The global weight parameter (Wg) buffer created in the SMB server is shared by all deep learning workers. Each worker allocates a shared memory buffer (ΔWx) at the SMB server to store the weight increment computed from the difference between its local weight and the global weight. This buffer is not shared among the other workers. The values of ΔWx are accumulated to the global weight buffer (Wg), as shown in Fig. 5.” and Section G: “Thus, each worker’s main thread reads the global weight (Wg) at the start point of every iteration (T1) and updates the local weight by calculating weight increment from the difference between the global weight and the local weight (T2).”. Examiner notes that the recitation of an iterative process also teaches that the reinforcement learning is not completed, with the only exception being the final iteration.)
wherein the federated reinforcement learning unit is characterized in that the federated reinforcement learning […] by performing continuously the reinforcement learnings by using the received average gradient and the received learning parameter (Page 1123 Top Right: " On the other hand, the worker using the EASGD method updates the local weight from local gradient learned by itself after the training of each minibatch in (2). The updated local weight and the global weight are exchanged between workers and the parameter server, and they update their own weight based on the difference of exchanged weight. Equation (3) is the second weight update formula in the workers, and (4) is the weight update formula in the parameter server. As workers use the learning rate (η) when updating the local weight in (2), the moving averaging rate (α) is used as in (3) and (4))” in case that the learning parameter are received under the state that corresponding reinforcement learning is not completed ("Asynchronous SGD (ASGD) is one of the most widely used asynchronous distributed variants of SGD. The ASGD has been proposed to address the disadvantage of SSGD; namely, workers have to wait until the slowest worker finishes calculating gradient [11], [12]… ASGD uses a parameter server to share parameters asynchronously between workers." sec. II, p. 1119; "The asynchronous method is a way in which the parameter server updates the global weight whenever gradient arrives from a worker, without aggregating all the gradients arriving late or early from the distributed workers. The synchronous method has a large aggregation overhead because there is a variation in the training time of each deep learning worker. However, since the asynchronous method can eliminate such aggregation overhead, it can train DNN quickly without sacrificing the accuracy." sec. II, p. 1119; this teaches that an asynchronous method can reach completion without waiting for late workers that are continuing to work.).
Mirhoseini and Ahn are analogous art because both are directed to asynchronous distributed training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the asynchronous training method of Mirhoseini with the asynchronous training method of Ahn.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training time, as suggested by Ahn ("ShmCaffe is 10.1 times faster than Caffe and 2.8 times faster than Caffe-MPI for deep neural network training" sec. Abstract, p. 1118).

Mirhoseini does not explicitly teach wherein the federated reinforcement learning unit is characterized in that the federated reinforcement learning is performed to complete the reinforcement learning in earlier stage than individually processed reinforcement learnings […] in case that the learning parameter are received under the state that corresponding reinforcement learning is not completed.
Zhang teaches wherein the federated reinforcement learning unit is characterized in that the federated reinforcement learning is performed to complete the reinforcement learning in earlier stage than individually processed reinforcement learnings […] in case that the learning parameter are received under the state that corresponding reinforcement learning is not completed ("Fig. 6: Illustration of speculative synchronization. Worker-1 speculatively aborts computation after observing the two pushes made by two peers. It pulls parameters again and starts over." p. 102; "Instead of imposing an arbitrary delay without justification, we let the worker asynchronously proceed to the next iteration immediately, while at the same time speculating about the updates made by others. Once the worker learns that the global parameters have been updated “enough” times, it will abort the ongoing iteration, pull the fresher parameters to start over—if that is not too late yet. Continuing the example in Fig. 2, we apply speculative synchronization and illustrate workers’ behaviors in Fig. 6. We start to focus on worker-1. After finishing the first iteration, it pulls parameters and starts the next iteration immediately. Shortly after it starts, it learns that two other peers have pushed updates to servers (highlighted in Fig. 6), which it views as a significant-enough change made to the global parameters. Worker-1 hence aborts the ongoing iteration, re-synchronizes with servers to include those two recent updates, and starts over with much fresher parameters. The abort-and-restart decision is also made by worker-4 upon its notice of two updates pushed shortly after the second iteration starts. In contrast, workers 2 and 3 choose not to restart as they do not see enough updates (two in this example) pushed to servers since their last pulls." p. 102-103).
Mirhoseini and Zhang are analogous art because both are directed to distributed training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the asynchronous distributed training of the Mirhoseini/Ahn combination with the speculative synchronization of Zhang.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training, as suggested by Zhang ("Experimental results show that speculative synchronization achieves up to 3× speedup over the asynchronous parallel scheme in many machine learning applications, with little additional communication overhead." sec. Abstract).

Regarding Claim 3,
The Mirhoseini/Ahn/Zhang combination teaches the system of claim 1.  Ahn further teaches wherein the gradient is the rate at which the reinforcement learning is performed and is characterized in that a plurality of reinforcement learnings performed through the plurality of device controllers are proceeded at average rate of the plurality of reinforcement learnings by sharing the gradient ("The master worker gathers the computed gradients by slave workers, takes the average of them, updates master weights, and finally distributes the updated master weights to slave workers." sec. IV. C, p. 1123).
Mirhoseini and Ahn are analogous art because both are directed to asynchronous distributed training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the asynchronous training method of the Mirhoseini/Zhang combination with the asynchronous training method of Ahn.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training time, as suggested by Ahn ("ShmCaffe is 10.1 times faster than Caffe and 2.8 times faster than Caffe-MPI for deep neural network training" sec. Abstract, p. 1118).

Regarding Claim 4,
The Mirhoseini/Ahn/Zhang combination teaches the system of claim 1.  Mirhoseini further teaches wherein each of the plurality of device controllers further comprises: a device control unit configured to control the device using the generated learning model; and a device state information providing unit configured to provide a state information of each of the device controllers to the federated reinforcement learning managing server (Figure 3, Parameter Server, Controller 1, Controller 2, … Controller K, p. 4; "We speed up the training process of our model using asynchronous distributed training, as shown in Figure 3. Our framework consists of several controllers, each of which execute the current policy defined by the attentional sequence-to-sequence model as described in Section 3.2. All of the controllers interact with a single shared parameter server… each controller receives a signal that indicates it should sample K placements… the controller uses the running times to scale the corresponding gradients to asynchronously update the controller parameters that reside in the parameter server." sec. 3.4, p. 4).

Regarding Claim 5,
The Mirhoseini/Ahn/Zhang combination teaches the system of claim 1. The Mirhoseini/Ahn/Zhang combination further teaches the following:
Mirhoseini teaches wherein the federated reinforcement learning managing server further comprises: a gradient receiving unit configured to request and receive the gradient from the plurality of device controllers ("Each replica independently performs forward and backward passes to compute the model’s gradients with respect to a minibatch of 32 images and then updates the parameters asynchronously." sec. sec. 4.4, p. 7; "All of the controllers interact with a single shared parameter server… When all of the running times are received, the controller uses the running times to scale the corresponding gradients to asynchronously update the controller parameters that reside in the parameter server." sec. 3.4, p. 4).

Ahn teaches a gradient sharing unit configured to transmit and share the average gradient obtained by the average of the received gradients to the plurality of device controllers (Page 1123 Top Right: " On the other hand, the worker using the EASGD method updates the local weight from local gradient learned by itself after the training of each minibatch in (2). The updated local weight and the global weight are exchanged between workers and the parameter server, and they update their own weight based on the difference of exchanged weight. Equation (3) is the second weight update formula in the workers, and (4) is the weight update formula in the parameter server. As workers use the learning rate (η) when updating the local weight in (2), the moving averaging rate (α) is used as in (3) and (4))”.
Mirhoseini and Ahn are analogous art because both are directed to asynchronous distributed training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the asynchronous training method of the Mirhoseini/Zhang combination with the asynchronous training method of Ahn.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training time, as suggested by Ahn ("ShmCaffe is 10.1 times faster than Caffe and 2.8 times faster than Caffe-MPI for deep neural network training" sec. Abstract, p. 1118).

Zhang teaches a learning parameter receiving unit configured to receive the learning parameter reported from the device controller in which the reinforcement learning is completed using the shared gradient; and a learning parameter providing unit configured to provide and transfer the received learning parameter to at least more than one of the device controllers in which the reinforcement learning is not completed ("Fig. 6: Illustration of speculative synchronization. Worker-1 speculatively aborts computation after observing the two pushes made by two peers. It pulls parameters again and starts over." p. 102; "Instead of imposing an arbitrary delay without justification, we let the worker asynchronously proceed to the next iteration immediately, while at the same time speculating about the updates made by others. Once the worker learns that the global parameters have been updated “enough” times, it will abort the ongoing iteration, pull the fresher parameters to start over—if that is not too late yet. Continuing the example in Fig. 2, we apply speculative synchronization and illustrate workers’ behaviors in Fig. 6. We start to focus on worker-1. After finishing the first iteration, it pulls parameters and starts the next iteration immediately. Shortly after it starts, it learns that two other peers have pushed updates to servers (highlighted in Fig. 6), which it views as a significant-enough change made to the global parameters. Worker-1 hence aborts the ongoing iteration, re-synchronizes with servers to include those two recent updates, and starts over with much fresher parameters. The abort-and-restart decision is also made by worker-4 upon its notice of two updates pushed shortly after the second iteration starts. In contrast, workers 2 and 3 choose not to restart as they do not see enough updates (two in this example) pushed to servers since their last pulls." p. 102-103).
Mirhoseini and Zhang are analogous art because both are directed to distributed training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the asynchronous distributed training of the Mirhoseini/Ahn combination with the speculative synchronization of Zhang.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training, as suggested by Zhang ("Experimental results show that speculative synchronization achieves up to 3× speedup over the asynchronous parallel scheme in many machine learning applications, with little additional communication overhead." sec. Abstract).

Regarding Claim 6,
The Mirhoseini/Ahn/Zhang combination teaches the system of claim 5.   Zhang further teaches wherein the federated reinforcement learning managing server further comprises: a device state information receiving unit configured to receive device state information resulting from controlling the corresponding devices from the plurality of device controllers ("Fig. 6: Illustration of speculative synchronization. Worker-1 speculatively aborts computation after observing the two pushes made by two peers. It pulls parameters again and starts over." p. 102; "Instead of imposing an arbitrary delay without justification, we let the worker asynchronously proceed to the next iteration immediately, while at the same time speculating about the updates made by others. Once the worker learns that the global parameters have been updated “enough” times, it will abort the ongoing iteration, pull the fresher parameters to start over—if that is not too late yet. Continuing the example in Fig. 2, we apply speculative synchronization and illustrate workers’ behaviors in Fig. 6. We start to focus on worker-1. After finishing the first iteration, it pulls parameters and starts the next iteration immediately. Shortly after it starts, it learns that two other peers have pushed updates to servers (highlighted in Fig. 6), which it views as a significant-enough change made to the global parameters. Worker-1 hence aborts the ongoing iteration, re-synchronizes with servers to include those two recent updates, and starts over with much fresher parameters. The abort-and-restart decision is also made by worker-4 upon its notice of two updates pushed shortly after the second iteration starts. In contrast, workers 2 and 3 choose not to restart as they do not see enough updates (two in this example) pushed to servers since their last pulls." p. 102-103; viewing if the global parameters has been updated enough times teaches the state information); and
wherein the federated reinforcement learning managing server is configured to re-perform the federated reinforcement learning by transmitting the re-execution command for the federated reinforcement learning to the plurality of device controllers, in case that the received state information is monitored and the monitoring result is outside the preset threshold range ("Fig. 6: Illustration of speculative synchronization. Worker-1 speculatively aborts computation after observing the two pushes made by two peers. It pulls parameters again and starts over." p. 102; "Instead of imposing an arbitrary delay without justification, we let the worker asynchronously proceed to the next iteration immediately, while at the same time speculating about the updates made by others. Once the worker learns that the global parameters have been updated “enough” times, it will abort the ongoing iteration, pull the fresher parameters to start over—if that is not too late yet. Continuing the example in Fig. 2, we apply speculative synchronization and illustrate workers’ behaviors in Fig. 6. We start to focus on worker-1. After finishing the first iteration, it pulls parameters and starts the next iteration immediately. Shortly after it starts, it learns that two other peers have pushed updates to servers (highlighted in Fig. 6), which it views as a significant-enough change made to the global parameters. Worker-1 hence aborts the ongoing iteration, re-synchronizes with servers to include those two recent updates, and starts over with much fresher parameters. The abort-and-restart decision is also made by worker-4 upon its notice of two updates pushed shortly after the second iteration starts. In contrast, workers 2 and 3 choose not to restart as they do not see enough updates (two in this example) pushed to servers since their last pulls." p. 102-103; the number of times the global parameters has been updated enough times teaches the preset threshold range).
Mirhoseini and Zhang are analogous art because both are directed to distributed training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the asynchronous distributed training of the Mirhoseini/Ahn combination with the speculative synchronization of Zhang.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training, as suggested by Zhang ("Experimental results show that speculative synchronization achieves up to 3× speedup over the asynchronous parallel scheme in many machine learning applications, with little additional communication overhead." sec. Abstract).

Regarding Claim 7,
Mirhoseini teaches a method for controlling multiple devices through a federated reinforcement learning comprising:
in a plurality of device controllers (Figure 3, "Our framework consists of several controllers" sec 3.4, p. 4), individually performing each of the reinforcement learnings ("neural networks and reinforcement learning for combinatorial optimization" sec. 2, p. 2) to control each of the plurality of devices ("we apply our proposed method to assign computations to devices" sec. 4, p. 4) and reporting a gradient calculated in a process of the reinforcement learning according to a request of a federated reinforcement learning managing server to the federated reinforcement learning managing server ("Each replica independently performs forward and backward passes to compute the model’s gradients with respect to a minibatch of 32 images and then updates the parameters asynchronously." sec. sec. 4.4, p. 7; "All of the controllers interact with a single shared parameter server… When all of the running times are received, the controller uses the running times to scale the corresponding gradients to asynchronously update the controller parameters that reside in the parameter server." sec. 3.4, p. 4).

Mirhoseini does not explicitly teach in the federated reinforcement learning managing server, sharing an average gradient by providing the average gradient calculated for a plurality of the gradients reported from the plurality of device controllers; in the plurality of device controllers, continuing the reinforcement learning using the shared averaged gradient; in at least one of the plurality of device controllers, when the reinforcement learning using the average gradient is completed, reporting a learning parameter according to a completed result to the federated reinforcement learning managing server; in the federated reinforcement learning managing server, transferring the learning parameter by transmitting a first reported and received learning parameter to at least one device controller for which the reinforcement learning is not completed; and in the at least one device controller, continuously performing the reinforcement learning by using the received learning parameter.
Ahn teaches in the federated reinforcement learning managing server, sharing an average gradient by providing the average gradient calculated for a plurality of the gradients reported from the plurality of device controllers (Page 1123 Top Right: " On the other hand, the worker using the EASGD method updates the local weight from local gradient learned by itself after the training of each minibatch in (2). The updated local weight and the global weight are exchanged between workers and the parameter server, and they update their own weight based on the difference of exchanged weight. Equation (3) is the second weight update formula in the workers, and (4) is the weight update formula in the parameter server. As workers use the learning rate (η) when updating the local weight in (2), the moving averaging rate (α) is used as in (3) and (4))”.
in the plurality of device controllers, continuing the reinforcement learning using the shared averaged gradient  (Page 1122 Section F: “The global weight parameter (Wg) buffer created in the SMB server is shared by all deep learning workers. Each worker allocates a shared memory buffer (ΔWx) at the SMB server to store the weight increment computed from the difference between its local weight and the global weight. This buffer is not shared among the other workers. The values of ΔWx are accumulated to the global weight buffer (Wg), as shown in Fig. 5.” and Section G: “Thus, each worker’s main thread reads the global weight (Wg) at the start point of every iteration (T1) and updates the local weight by calculating weight increment from the difference between the global weight and the local weight (T2).”. Examiner notes that the recitation of an iterative process also teaches that the reinforcement learning is not completed, with the only exception being the final iteration.)
in at least one of the plurality of device controllers, when the reinforcement learning using the average gradient is completed, reporting a learning parameter according to a completed result to the federated reinforcement learning managing server; in the federated reinforcement learning managing server, transferring the learning parameter by transmitting a first reported and received learning parameter to at least one device controller for which the reinforcement learning is not completed; and in the at least one device controller, continuously performing the reinforcement learning by using the received learning parameter ("Asynchronous SGD (ASGD) is one of the most widely used asynchronous distributed variants of SGD. The ASGD has been proposed to address the disadvantage of SSGD; namely, workers have to wait until the slowest worker finishes calculating gradient [11], [12]… ASGD uses a parameter server to share parameters asynchronously between workers." sec. II, p. 1119; "The asynchronous method is a way in which the parameter server updates the global weight whenever gradient arrives from a worker, without aggregating all the gradients arriving late or early from the distributed workers. The synchronous method has a large aggregation overhead because there is a variation in the training time of each deep learning worker. However, since the asynchronous method can eliminate such aggregation overhead, it can train DNN quickly without sacrificing the accuracy." sec. II, p. 1119; this teaches that an asynchronous method can reach completion without waiting for late workers that are continuing to work.).
Mirhoseini and Ahn are analogous art because both are directed to asynchronous distributed training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the asynchronous training method of Mirhoseini with the asynchronous training method of Ahn.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training time, as suggested by Ahn ("ShmCaffe is 10.1 times faster than Caffe and 2.8 times faster than Caffe-MPI for deep neural network training" sec. Abstract, p. 1118).

Mirhoseini does not explicitly teach wherein the method is characterized in that overall reinforcement learning is completed earlier than individually performed reinforcement learnings by performing the federated reinforcement learning in coalition with the reinforcement learnings through the sharing of the averaged gradient and the transferring of the learning parameter.
Zhang teaches wherein the method is characterized in that overall reinforcement learning is completed earlier than individually performed reinforcement learnings by performing the federated reinforcement learning in coalition with the reinforcement learnings through the sharing of the averaged gradient and the transferring of the learning parameter ("Fig. 6: Illustration of speculative synchronization. Worker-1 speculatively aborts computation after observing the two pushes made by two peers. It pulls parameters again and starts over." p. 102; "Instead of imposing an arbitrary delay without justification, we let the worker asynchronously proceed to the next iteration immediately, while at the same time speculating about the updates made by others. Once the worker learns that the global parameters have been updated “enough” times, it will abort the ongoing iteration, pull the fresher parameters to start over—if that is not too late yet. Continuing the example in Fig. 2, we apply speculative synchronization and illustrate workers’ behaviors in Fig. 6. We start to focus on worker-1. After finishing the first iteration, it pulls parameters and starts the next iteration immediately. Shortly after it starts, it learns that two other peers have pushed updates to servers (highlighted in Fig. 6), which it views as a significant-enough change made to the global parameters. Worker-1 hence aborts the ongoing iteration, re-synchronizes with servers to include those two recent updates, and starts over with much fresher parameters. The abort-and-restart decision is also made by worker-4 upon its notice of two updates pushed shortly after the second iteration starts. In contrast, workers 2 and 3 choose not to restart as they do not see enough updates (two in this example) pushed to servers since their last pulls." p. 102-103).
Mirhoseini and Zhang are analogous art because both are directed to distributed training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the asynchronous distributed training of the Mirhoseini/Ahn combination with the speculative synchronization of Zhang.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training, as suggested by Zhang ("Experimental results show that speculative synchronization achieves up to 3× speedup over the asynchronous parallel scheme in many machine learning applications, with little additional communication overhead." sec. Abstract).

Regarding Claim 8,
The Mirhoseini/Ahn/Zhang combination teaches the method of claim 7. Ahn further teaches wherein the gradient is a rate at which the reinforcement learning is performed and is characterized in that a plurality of reinforcement learnings performed through the plurality of device controllers are proceeded at average rate of the plurality of reinforcement learnings by sharing the gradient  (Page 1122 Section F: “The global weight parameter (Wg) buffer created in the SMB server is shared by all deep learning workers. Each worker allocates a shared memory buffer (ΔWx) at the SMB server to store the weight increment computed from the difference between its local weight and the global weight. This buffer is not shared among the other workers. The values of ΔWx are accumulated to the global weight buffer (Wg), as shown in Fig. 5.” and Section G: “Thus, each worker’s main thread reads the global weight (Wg) at the start point of every iteration (T1) and updates the local weight by calculating weight increment from the difference between the global weight and the local weight (T2).”. Examiner notes that the recitation of an iterative process also teaches that the reinforcement learning is not completed, with the only exception being the final iteration.)
Mirhoseini and Ahn are analogous art because both are directed to asynchronous distributed training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the asynchronous training method of the Mirhoseini/Zhang combination with the asynchronous training method of Ahn.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training time, as suggested by Ahn ("ShmCaffe is 10.1 times faster than Caffe and 2.8 times faster than Caffe-MPI for deep neural network training" sec. Abstract, p. 1118).

Regarding Claim 9,
The Mirhoseini/Ahn/Zhang combination teaches the method of claim 7. Mirhoseini further teaches wherein the method for controlling multiple devices through the federated reinforcement learning, further comprising: in the plurality of device controllers, controlling the corresponding devices by corresponding learning model generated through the federated reinforcement learning; and in the plurality of device controllers, providing state information of the devices according to the result of controlling the devices to the federated reinforcement learning managing server (Figure 3, Parameter Server, Controller 1, Controller 2, … Controller K, p. 4; "We speed up the training process of our model using asynchronous distributed training, as shown in Figure 3. Our framework consists of several controllers, each of which execute the current policy defined by the attentional sequence-to-sequence model as described in Section 3.2. All of the controllers interact with a single shared parameter server… each controller receives a signal that indicates it should sample K placements… the controller uses the running times to scale the corresponding gradients to asynchronously update the controller parameters that reside in the parameter server." sec. 3.4, p. 4).

Mirhoseini does not explicitly teach wherein the method is characterized in that the reinforcement learning is performed again in the plurality of device controllers in case that a re-execution command for the federated reinforcement learning is received from the federated reinforcement learning managing server according to a result of monitoring the state information.
Zhang teaches wherein the method is characterized in that the reinforcement learning is performed again in the plurality of device controllers in case that a re-execution command for the federated reinforcement learning is received from the federated reinforcement learning managing server according to a result of monitoring the state information ("Fig. 6: Illustration of speculative synchronization. Worker-1 speculatively aborts computation after observing the two pushes made by two peers. It pulls parameters again and starts over." p. 102; "Instead of imposing an arbitrary delay without justification, we let the worker asynchronously proceed to the next iteration immediately, while at the same time speculating about the updates made by others. Once the worker learns that the global parameters have been updated “enough” times, it will abort the ongoing iteration, pull the fresher parameters to start over—if that is not too late yet. Continuing the example in Fig. 2, we apply speculative synchronization and illustrate workers’ behaviors in Fig. 6. We start to focus on worker-1. After finishing the first iteration, it pulls parameters and starts the next iteration immediately. Shortly after it starts, it learns that two other peers have pushed updates to servers (highlighted in Fig. 6), which it views as a significant-enough change made to the global parameters. Worker-1 hence aborts the ongoing iteration, re-synchronizes with servers to include those two recent updates, and starts over with much fresher parameters. The abort-and-restart decision is also made by worker-4 upon its notice of two updates pushed shortly after the second iteration starts. In contrast, workers 2 and 3 choose not to restart as they do not see enough updates (two in this example) pushed to servers since their last pulls." p. 102-103).
Mirhoseini and Zhang are analogous art because both are directed to distributed training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the asynchronous distributed training of the Mirhoseini/Ahn combination with the speculative synchronization of Zhang.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training, as suggested by Zhang ("Experimental results show that speculative synchronization achieves up to 3× speedup over the asynchronous parallel scheme in many machine learning applications, with little additional communication overhead." sec. Abstract).

Regarding Claim 10,
The Mirhoseini/Ahn/Zhang combination teaches the method of claim 7. Zhang further teaches wherein the method for controlling multiple devices through the federated reinforcement learning, further comprising: in the federated reinforcement learning managing server, receiving state information of the devices resulting from controlling the devices from the plurality of device controllers ("Fig. 6: Illustration of speculative synchronization. Worker-1 speculatively aborts computation after observing the two pushes made by two peers. It pulls parameters again and starts over." p. 102; "Instead of imposing an arbitrary delay without justification, we let the worker asynchronously proceed to the next iteration immediately, while at the same time speculating about the updates made by others. Once the worker learns that the global parameters have been updated “enough” times, it will abort the ongoing iteration, pull the fresher parameters to start over—if that is not too late yet. Continuing the example in Fig. 2, we apply speculative synchronization and illustrate workers’ behaviors in Fig. 6. We start to focus on worker-1. After finishing the first iteration, it pulls parameters and starts the next iteration immediately. Shortly after it starts, it learns that two other peers have pushed updates to servers (highlighted in Fig. 6), which it views as a significant-enough change made to the global parameters. Worker-1 hence aborts the ongoing iteration, re-synchronizes with servers to include those two recent updates, and starts over with much fresher parameters. The abort-and-restart decision is also made by worker-4 upon its notice of two updates pushed shortly after the second iteration starts. In contrast, workers 2 and 3 choose not to restart as they do not see enough updates (two in this example) pushed to servers since their last pulls." p. 102-103; viewing if the global parameters has been updated enough times teaches the state information),
wherein the method is characterized in that the reinforcement learning is performed again by transmitting a re-execution command for the federated reinforcement learning to the plurality of device controllers in the federated reinforcement learning managing server in case that the received state information of the devices is monitored, and the monitoring result is out of a preset threshold range ("Fig. 6: Illustration of speculative synchronization. Worker-1 speculatively aborts computation after observing the two pushes made by two peers. It pulls parameters again and starts over." p. 102; "Instead of imposing an arbitrary delay without justification, we let the worker asynchronously proceed to the next iteration immediately, while at the same time speculating about the updates made by others. Once the worker learns that the global parameters have been updated “enough” times, it will abort the ongoing iteration, pull the fresher parameters to start over—if that is not too late yet. Continuing the example in Fig. 2, we apply speculative synchronization and illustrate workers’ behaviors in Fig. 6. We start to focus on worker-1. After finishing the first iteration, it pulls parameters and starts the next iteration immediately. Shortly after it starts, it learns that two other peers have pushed updates to servers (highlighted in Fig. 6), which it views as a significant-enough change made to the global parameters. Worker-1 hence aborts the ongoing iteration, re-synchronizes with servers to include those two recent updates, and starts over with much fresher parameters. The abort-and-restart decision is also made by worker-4 upon its notice of two updates pushed shortly after the second iteration starts. In contrast, workers 2 and 3 choose not to restart as they do not see enough updates (two in this example) pushed to servers since their last pulls." p. 102-103; the number of times the global parameters has been updated enough times teaches the preset threshold range).
Mirhoseini and Zhang are analogous art because both are directed to distributed training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the asynchronous distributed training of the Mirhoseini/Ahn combination with the speculative synchronization of Zhang.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training, as suggested by Zhang ("Experimental results show that speculative synchronization achieves up to 3× speedup over the asynchronous parallel scheme in many machine learning applications, with little additional communication overhead." sec. Abstract).

Prior Art of Record
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Zhuo et al. (“Federated Reinforcement Learning”) discloses a federated learning version of reinforcement learning
Blanchard et al. (US 2020/0380340 A1) discloses in [0008]: “Distributed implementations of SGD (see reference [33]) typically take the following form: A single parameter server is in charge of updating the parameter vector, while worker processes perform the actual update estimation, based on the share of data they have access to. More specifically, the parameter server executes learning rounds, during each of which the parameter vector is broadcast to the workers. In turn, each worker computes an estimate of the update to apply (an estimate of the gradient), and the parameter server aggregates all results to finally update the parameter vector. Nowadays, this aggregation is typically implemented through averaging.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached on (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LEONARD A SIEGER/Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Nov 13, 2020
Application Filed
Mar 09, 2024
Non-Final Rejection — §103, §112
Jun 07, 2024
Response Filed
Oct 09, 2024
Final Rejection — §103, §112
Dec 17, 2024
Response after Non-Final Action
Mar 14, 2025
Request for Continued Examination
Mar 21, 2025
Response after Non-Final Action
Oct 10, 2025
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/121,725
Patent 12583466
VEHICLE CONTROL MODULES INCLUDING CONTAINERIZED ORCHESTRATION AND RESOURCE MANAGEMENT FOR MIXED CRITICALITY SYSTEMS
2y 5m to grant Granted Mar 24, 2026
18/448,891
Patent 12578751
DATA PROCESSING CIRCUITRY AND METHOD, AND SEMICONDUCTOR MEMORY
2y 5m to grant Granted Mar 17, 2026
18/062,207
Patent 12561162
AUTOMATED INFORMATION TECHNOLOGY INFRASTRUCTURE MANAGEMENT
2y 5m to grant Granted Feb 24, 2026
18/364,680
Patent 12536291
PLATFORM BOOT PATH FAULT DETECTION ISOLATION AND REMEDIATION PROTOCOL
2y 5m to grant Granted Jan 27, 2026
18/411,841
Patent 12393641
METHODS FOR UTILIZING SOLVER HARDWARE FOR SOLVING PARTIAL DIFFERENTIAL EQUATIONS
2y 5m to grant Granted Aug 19, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
50%
Grant Probability
76%
With Interview (+25.8%)
3y 8m
Median Time to Grant
High
PTA Risk
Based on 509 resolved cases by this examiner. Grant probability derived from career allow rate.