Last updated: April 19, 2026
Application No. 17/964,536
METHOD AND APPARATUS FOR GENERATING MULTI-DRONE NETWORK COOPERATIVE OPERATION PLAN BASED ON REINFORCEMENT LEARNING

Non-Final OA §102§103
Filed
Oct 12, 2022
Examiner
HWANG, MEGAN ELIZABETH
Art Unit
2143
Tech Center
2100 — Computer Architecture & Software
Assignee
ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
OA Round
1 (Non-Final)
This examiner grants 47% of cases after interview

— +60.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 19 resolved cases, 2023–2026
Examiner Intelligence

HWANG, MEGAN ELIZABETH View full profile →
Grants 47% of resolved cases
Career Allow Rate
9 granted / 19 resolved
-7.6% vs TC avg
Strong +60% interview lift
Without
With
+60.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
25 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
34.9%
-5.1% vs TC avg
§103
41.0%
+1.0% vs TC avg
§102
7.4%
-32.6% vs TC avg
§112
15.3%
-24.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 19 resolved cases
Office Action

§102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-7 and 18-20, filed 10/12/2022, are presented for examination. 
Election/Restrictions
Applicant’s election without traverse of Claims 8-17 in the reply filed on 12/31/2025 is acknowledged.
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. KR10-2022-0033925, filed 03/18/2022.
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 10/12/2022 and 01/08/2026 have been considered by the examiner.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “an input unit configured to”, “a learning unit configured to” and “a plan generation unit configured to” in claims 18-20.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 4, 6, and 18-19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zhu et al. (“Building a Connected Communication Network for UAV Clusters using DE-MADDPG”, published 08/20/2021), hereinafter Zhu.
	Regarding Claim 1, Zhu teaches A method of generating a multi-drone network operation plan based on reinforcement learning (Zhu: “The multi-agent reinforcement learning (MARL) algorithm is a solution model framework for the collaborative control of UAV clusters. It enables multiple agents to complete complex tasks through collaborative decision-making in a high-dimensional, dynamic environment.” [Section 4. Methods]), the method comprising steps of:
	(a) defining a reinforcement learning hyperparameter and training an actor neural network for each drone agent by using a multi-agent deep deterministic policy gradient (MADDPG) algorithm based on the defined hyperparameter (Zhu: “The UAV obtained its state by observing the environment and produced the control values according to the set control policy. The UAV then used the feedback of the environment to adjust the control policy and form a closed-loop training process. The training hyperparameters are presented in Table 2.” [Section 5.2. Training Configuration]; “As shown in Figure 3, the MADDPG adopts the framework of centralized training with decentralized execution. Each agent is associated with an actor network and a centralized critic network. In the training, a single agent observes its state, outputs actions based on the actor network, and then obtains the corresponding rewards and new states from the environment.” [Section 4.2. Multi-Agent DDPG Approach]);
	(b) generating Markov game formalization information based on multi-drone network task information and generating state-action history information by using the trained actor neural network based on the formalization information (Zhu: “We simulate the training process of reinforcement learning based on a locally observable Markov game. The Markov process of 𝑛 agents is represented by a high-dimensional tuple  <𝑆,𝐴,𝑅,𝑃,𝛾 >, where 𝑆 =[𝑠1,𝑠2,⋯,𝑠𝑛] denotes the state space of the Markov decision process, 𝐴 =[𝑎1,𝑎2,⋯,𝑎𝑛] is the joint action set of all agents, 𝑅 =[𝑟1,𝑟2,⋯,𝑟𝑛], 𝑟𝑖 is the reward of the agent 𝑖, 𝑃 :𝑆 ×𝐴 ×𝑆 →[0,1] is the state transfer function, and 𝛾 is the attenuation coefficient of the cumulative discount reward.” [Section 4.1. Markov Game for Multi-UAV Cooperation]; “The critic network obtains the action and state information of all agents and outputs the state–action value for the state and action from a global perspective. Then, the actor network is updated by the state–action value.” [Section 4.2. Multi-Agent DDPG Approach]; In light of Paragraph [00113] of the specification, which states “The plan generation unit 230 generates Markov game formalization information based on task information. That is, the plan generation unit 230 converts the task information into Markov game formalization information (e.g., a state of the Markov game, the observation model, the action model, and the reward model)”, BRI would support that “Markov game formalization information” encompasses a Markov game model including observations, actions, states and rewards by and for the drones); and
	(c) generating a multi-drone network operation plan based on the state-action history information (Zhu: “In a multi-agent system, all agents execute a joint policy to generate a new state. The reward of each agent depends on the joint policy executed by all of them. According to the relationship between the agents, MARL can be divided into complete cooperation, complete competition, and a competition/cooperation hybrid. Our objective is to train the UAV clusters to learn the policy of constructing connected networks with complete cooperation. In this setting, the objective of each agent is to maximize the common reward.” [Section 4.1. Markov Game for Multi-UAV Cooperation]; “During the online execution, each UAV outputs control commands based on the trained model, which effectively saves time for making decisions on the fly.” [Section 1. Introduction]).
	Regarding Claim 18, it is a system claim that corresponds with the method of Claim 1. Therefore, it is rejected for the same reasons as Claim 1 above.
	Regarding Claim 4, Zhu teaches the method of Claim 1, wherein: the state-action history information comprises location information of a drone for each decision step (Zhu: “State: The state of each agent is the location coordinates of the UAV in the cluster coordinate system, represented by a vector (𝑥,𝑦,𝑦).” [Section 4.4. Design of DE-MADDPG]), and
	the step (c) comprises generating flight path information of the drone included in the operation plan based on the location information (Zhu: “Action: The action of each agent is the velocity of the UAV in the north–east–down coordinate system, denoted by a vector (𝑣𝑖,𝛽𝑖,𝜉𝑖), where 𝑣𝑖 =|𝒗′𝑖|, 𝛽𝑖 is the flight path angle of the UAV 𝑖 and 𝜉𝑖 is the heading angle.” [Section 4.4. Design of DE-MADDPG]).
	Regarding Claim 4, Zhu teaches the method of Claim 1, wherein: the state-action history information comprises network topology history information for each decision step (Zhu: “Assuming that the UAV cluster network is homogeneous, each UAV can be represented as a node, and the two-way communication links between UAVs are represented by the edges. Therefore, the UAV nodes 𝒇 ={𝑓1,𝑓2,⋯,𝑓𝑛} and their corresponding communication links 𝒆 ={𝑒1,𝑒2,⋯𝑒𝑚} form a three-dimensional network topology, expressed as a graph 𝐺(𝒇,𝒆), and 𝑚 is the number of edges in the graph.” [Section 3.3. Network Connectivity Model]), and
	the step (c) comprises generating topology information included in the operation plan based on the topology history information (Zhu: “Our objective is to train the UAV clusters to learn the policy of constructing connected networks with complete cooperation.” [Section 4.1. Markov Game for Multi-UAV Cooperation]; “During the movement of the cluster, each UAV that is moving to the center facilitates communication and interconnection within the cluster. Based on this, we consider a Gaussian shaping reward function. At time t, we denote the distance between the UAV 𝑖 and the virtual navigator by 𝑑𝑖, and the difference between this distance at the present time instant and at the previous time instant is denoted by Δ𝑑𝑖 =𝑑𝑖(𝑡) −𝑑𝑖(𝑡 −1).” [Section 4.4. Design of DE-MADDPG]).
	Regarding Claim 19, Zhu teaches the system of Claim 18, wherein the learning unit generates tuple data comprising observation, an action, a reward, and next observation for each drone agent by using the MADDPG algorithm based on the reinforcement learning hyperparameter (Zhu: “We simulate the training process of reinforcement learning based on a locally observable Markov game. The Markov process of 𝑛 agents is represented by a high-dimensional tuple  <𝑆,𝐴,𝑅,𝑃,𝛾 >, where 𝑆 =[𝑠1,𝑠2,⋯,𝑠𝑛] denotes the state space of the Markov decision process, 𝐴 =[𝑎1,𝑎2,⋯,𝑎𝑛] is the joint action set of all agents, 𝑅 =[𝑟1,𝑟2,⋯,𝑟𝑛], 𝑟𝑖 is the reward of the agent 𝑖, 𝑃 :𝑆 ×𝐴 ×𝑆 →[0,1] is the state transfer function, and 𝛾 is the attenuation coefficient of the cumulative discount reward.” [Section 4.1. Markov Game for Multi-UAV Cooperation]; See [Algorithm 1, Lines 3-7], in which a state (observation), action, new state and reward are determined and stored into the tuple D), and
	trains the actor neural network for each drone agent based on a mini-batch of the tuple data (Zhu: See [Algorithm 1, Line 12-18], in which the actor neural network for each agent is trained on a random sampled minibatch of K samples from D).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-3, 5, 7 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhu, as applied to Claims 1 and 18 above, in view of Hu et al. (“Cooperative Internet of UAVs: Distributed Trajectory Design by Multi-Agent Deep Reinforcement Learning”, published 08/03/2020), hereinafter Hu.
	Regarding Claim 2, Zhu teaches the method of Claim 1, wherein the multi-drone network task information comprises information on a drone agent and information on communication (Zhu: “With a GPS receiver and a set of radio transceivers, each UAV can obtain its real-time position and communicate with other UAVs if their physical distance is small enough. When the cluster communication network is intact, each UAV can communicate with all other UAVs through the communication link.” [Section 3. Model]).
	However, it fails to expressly disclose wherein the multi-drone network task information comprises information on a base station, information on a target point, and a task termination condition.
	In the same field of endeavor, Hu teaches wherein the multi-drone network task information comprises information on a base station, information on a target point, and a task termination condition (Hu: “UAVs have been widely used in critical sensing applications, where they need to execute multiple sensing tasks and transmit the results to base stations (BSs) over cellular networks.” [Section I. Introduction]; “UAVs know the state of the environment from the information exchange with the BS at the beginning of each cycle.” [Section IV.A. Markov Decision Process Formulation]; “Each Task j involves a sensing target, which is located at xtj=(xtj,ytj,0), to be sensed by the UAVs.” [Section II. System Model]; “After a UAV has finished the transmission cycles for reporting its sensing results, the BS will inform the UAV of the AoI of the selected task at the beginning of the next cycle. If the AoI of the selected task is not reduced to one cycle, the UAV knows that its previously transmitted sensing result was invalid. Then, the UAV will begin another sensing cycle at once and will subsequently transmit the new sensing result. The sequence of sensing cycle and transmission cycles will be repeated until a valid sensing result is received at the BS. Moreover, if multiple UAVs are executing the same task, they will consider the task completed when any valid sensing result has been received for the task at the BS.” [Section III. Distributed Sense-and-Send Protocol]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein the multi-drone network task information comprises information on a base station, information on a target point, and a task termination condition, as taught by Hu to the method of Zhu because both of these methods are directed towards cooperative multi-agent deep reinforcement learning by modeling UAVs in a Markov game setup. In making this combination and accounting for information on a base station and target point as well as establishing a termination condition, it would allow the method of Zhu to execute multiple sensing tasks at once and better simulate “certain sensing applications, such as traffic monitoring, collapsed building detection, and forest fire surveillance”, where “UAVs need to continuously sense and transmit results to base stations, in order to keep the sensing results as fresh as possible” (Hu: [Section I. Introduction]) as well as indicate to the base station when it has completed a task so it can be assigned a new task without idling or overlap (Hu: [Section III. Distributed Sense-and-Send Protocol]).
	Regarding Claim 3, Zhu teaches the method of Claim 1, wherein the step (b) comprises steps of: 
	(b1) generating the formalization information based on the task information (Zhu: “The multi-agent reinforcement learning (MARL) algorithm is a solution model framework for the collaborative control of UAV clusters. It enables multiple agents to complete complex tasks through collaborative decision-making in a high-dimensional, dynamic environment. The MARL is autonomous, distributed, coordinated, and near to the optimal joint policy based on the core idea of centralized learning and distributed execution. It is based on the actor–critic framework and effectively solves the problems of non-stationarity in the multi-agent environment and the failure of experience playback.” [Section 4. Methods]; “We simulate the training process of reinforcement learning based on a locally observable Markov game. The Markov process of 𝑛 agents is represented by a high-dimensional tuple  <𝑆,𝐴,𝑅,𝑃,𝛾 >, where 𝑆 =[𝑠1,𝑠2,⋯,𝑠𝑛] denotes the state space of the Markov decision process, 𝐴 =[𝑎1,𝑎2,⋯,𝑎𝑛] is the joint action set of all agents, 𝑅 =[𝑟1,𝑟2,⋯,𝑟𝑛], 𝑟𝑖 is the reward of the agent 𝑖, 𝑃 :𝑆 ×𝐴 ×𝑆 →[0,1] is the state transfer function, and 𝛾 is the attenuation coefficient of the cumulative discount reward.” [Section 4.1. Markov Game for Multi-UAV Cooperation]);
	(b2) initializing a state of each drone agent based on the formalization information (Zhu: See [Algorithm 1, Line 3], in which the state of each agent is initialized);
	(b3) obtaining observation for each drone agent based on the state of each drone agent (Zhu: “𝑜𝑖 is the observation of the agent 𝑖, 𝑄𝜋𝑖(𝑠,𝑎1,⋯,𝑎𝑛) is the centralized state–action value function of the agent 𝑖, 𝑠 =[𝑜1,⋯,𝑜𝑛] consists of the observations of all agents” [Section 4.2. Multi-Agent DDPG Approach]);
	(b4) inferring an action of each drone agent by inputting the observation to the actor neural network (Zhu: “Each UAV is regarded as an agent and outputs actions based on its observations and the value of its actor network.” [Section 1. Introduction]); and
	(b5) obtaining a next state of each drone agent based on the state and the action (Zhu: “𝑠′ =[𝑜′1,⋯,𝑜′𝑛] consists of the new observations of all agents after executing the actions” [Section 4.2. Multi-Agent DDPG Approach]).
	However, it fails to expressly disclose (b6) determining whether a task termination condition included in the task information has been satisfied based on the next state, repeating the steps (b3) to (b5) when the task termination condition is not satisfied, and generating the state-action history information by synthesizing the state and the action when the task termination condition is satisfied.
	In the same field of endeavor, Hu teaches (b6) determining whether a task termination condition included in the task information has been satisfied based on the next state, repeating the steps (b3) to (b5) when the task termination condition is not satisfied (Hu: “After a UAV has finished the transmission cycles for reporting its sensing results, the BS will inform the UAV of the AoI of the selected task at the beginning of the next cycle. If the AoI of the selected task is not reduced to one cycle, the UAV knows that its previously transmitted sensing result was invalid. Then, the UAV will begin another sensing cycle at once and will subsequently transmit the new sensing result. The sequence of sensing cycle and transmission cycles will be repeated until a valid sensing result is received at the BS. Moreover, if multiple UAVs are executing the same task, they will consider the task completed when any valid sensing result has been received for the task at the BS.” [Section III. Distributed Sense-and-Send Protocol]), and 
	generating the state-action history information by synthesizing the state and the action when the task termination condition is satisfied (Hu: “in the formulated MDP, when the selected task of a UAV has not been completed successfully, the action of the UAV is to continue executing the current selected task. Therefore, the state transitions between consecutive cycles cannot be used directly as the experience for the training of the UAVs’ action selection policies. To handle this problem, we propose that UAV i records the state transition only during decision cycles, where its previous selected task is completed. Denote the replay buffer of UAV i by Di = {ei}. The stored experience in the replay buffer is given by                         
                            
                                    e
                                
                                    i
                                
                            =
                            (
                            
                                            s
                                        
                                        ~
                                    
                                    i
                                
                            ,
                             
                                            a
                                        
                                        ~
                                    
                                    i
                                
                            ,
                             
                                            s
                                        
                                        ~
                                    
                                    '
                                
                                    i
                                
                            ,
                             
                                            r
                                        
                                        ~
                                    
                                    i
                                
                            )
                        
                    , where                         
                            
                                            s
                                        
                                        ~
                                    
                                    i
                                
                     denotes the state of the cycle when UAV I is in a certain decision cycle,                         
                            
                                            a
                                        
                                        ~
                                    
                                    i
                                
                     is the action of UAV i determined in that cycle,                         
                            
                                            s
                                        
                                        ~
                                    
                                    '
                                
                                    i
                                
                     denotes the state of the cycle when the selected task is executed, and                         
                            
                                            r
                                        
                                        ~
                                    
                                    i
                                
                     denotes the sum of the rewards for UAV i during the state transition.” [Section V.C. Training Process]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated (b6) determining whether a task termination condition included in the task information has been satisfied based on the next state, repeating the steps (b3) to (b5) when the task termination condition is not satisfied, and generating the state-action history information by synthesizing the state and the action when the task termination condition is satisfied, as taught by Hu to the method of Zhu because both of these methods are directed towards cooperative multi-agent deep reinforcement learning by modeling UAVs in a Markov game setup. In making this combination and generating the state-action history after the termination condition is satisfied, it would allow the method of Zhu to update the actor network policies on state-actions stored in the replay buffer when a task is completed successfully (Hu: [Section V.C. Training Process]).
	Regarding Claim 20, it is a system claim that corresponds with the method of Claim 3. Therefore, it is rejected for the same reasons as Claim 3 above.
	Regarding Claim 5, Zhu teaches the method of Claim 1. However, it fails to expressly disclose wherein: the state-action history information comprises a task time of a drone and location information of the drone for each decision step, and the step (c) comprises generating speed information of the drone included in the operation plan based on the task time and the location information.
	In the same field of endeavor, Hu teaches wherein: the state-action history information comprises a task time of a drone and location information of the drone for each decision step (Hu: “The timeline of the protocol is divided into cycles, and the duration of each cycle is denoted by tc.” [Section III. Distributed Sense-and-Send Protocol]; “In the information exchange, each UAV reports its state to the BS, which includes its current location, selected task, sensing location for that task, and the amount of data from the previous sensing result that is still awaiting transmission. Then, the BS broadcasts the state of the system, including the states of all the UAVs and the AoI of each task.” [Section III. Distributed Sense-and-Send Protocol]), and 
	the step (c) comprises generating speed information of the drone included in the operation plan based on the task time and the location information (Hu: “The task execution process begins with one decision cycle, where the UAV determines its next selected task and the corresponding sensing location. After the decision cycle, the UAV moves towards its new sensing location directly with maximum speed vmax.” [Section III. Distributed Sense-and-Send Protocol]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein: the state-action history information comprises a task time of a drone and location information of the drone for each decision step, and the step (c) comprises generating speed information of the drone included in the operation plan based on the task time and the location information, as taught by Hu to the method of Zhu because both of these methods are directed towards cooperative multi-agent deep reinforcement learning by modeling UAVs in a Markov game setup. In making this combination and analyzing a task time and location to determine an operation plan account for speed, it would allow the method of Zhu to account for a given UAVs top speed, distance from a target and time in each cycle to balance resource management with efficiency in task completion (Hu: [Section III. Distributed Sense-and-Send Protocol]). 
	Regarding Claim 7, Zhu teaches the method of Claim 1. However, it fails to expressly disclose wherein: the state-action history information comprises task intent of a drone and an action of the drone for each decision step, and the step (c) comprises generating task execution information included in the operation plan based on the task intent and the action of the drone.
	In the same field of endeavor, Hu teaches wherein: the state-action history information comprises task intent of a drone and an action of the drone for each decision step (Hu: “In this protocol, each UAV executes only one task at a time. For convenience, we define the task which the UAV is executing as its selected task and define the location where the UAV performs sensing as its sensing location.” [Section III. Distributed Sense-and-Send Protocol]), and 
	the step (c) comprises generating task execution information included in the operation plan based on the task intent and the action of the drone (Hu: “The process for a UAV to execute its selected task can be described as follows, see also Fig. 3. When a UAV is informed that a valid sensing result for its selected task has been received by the BS, it starts a new task execution process. The task execution process begins with one decision cycle, where the UAV determines its next selected task and the corresponding sensing location.” [Section III. Distributed Sense-and-Send Protocol]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein: the state-action history information comprises task intent of a drone and an action of the drone for each decision step, and the step (c) comprises generating task execution information included in the operation plan based on the task intent and the action of the drone, as taught by Hu to the method of Zhu because both of these methods are directed towards cooperative multi-agent deep reinforcement learning by modeling UAVs in a Markov game setup. In making this combination and accounting for the intent a given agent in decision making at various parts of executing a task, it would allow the method of Zhu to minimize the time elapsed between transmissions by preventing a scenario in which “the UAV started moving to its next selected task or towards the BS during the transmission cycle,” and “the UAV would have to fly back and sense the target again if the BS ultimately determined the sensing result was invalid” (Hu: [Section III. Distributed Sense-and-Send Protocol]). 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Xuan et al. (“UAV Swarm Attack-Defense Confrontation Based on Multi-agent Reinforcement Learning”) discusses a multi-agent deep reinforcement learning approach to train UAVs in simultaneously cooperative and competitive environments using a MADDPG algorithm. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MEGAN E HWANG whose telephone number is (703)756-1377. The examiner can normally be reached Monday-Thursday 10:00-7:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached at (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.E.H./Examiner, Art Unit 2143                                                                                                                                                                                                        /JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143
Read full office action
Prosecution Timeline

Oct 12, 2022
Application Filed
Jan 26, 2026
Non-Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/446,509
Patent 12456093
Corporate Hierarchy Tagging
2y 5m to grant Granted Oct 28, 2025
17/521,057
Patent 12437514
VIDEO DOMAIN ADAPTATION VIA CONTRASTIVE LEARNING FOR DECISION MAKING
2y 5m to grant Granted Oct 07, 2025
18/484,826
Patent 12437517
VIDEO DOMAIN ADAPTATION VIA CONTRASTIVE LEARNING FOR DECISION MAKING
2y 5m to grant Granted Oct 07, 2025
18/484,832
Patent 12437518
VIDEO DOMAIN ADAPTATION VIA CONTRASTIVE LEARNING FOR DECISION MAKING
2y 5m to grant Granted Oct 07, 2025
18/484,839
Patent 12437519
VIDEO DOMAIN ADAPTATION VIA CONTRASTIVE LEARNING FOR DECISION MAKING
2y 5m to grant Granted Oct 07, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
47%
Grant Probability
99%
With Interview (+60.2%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 19 resolved cases by this examiner. Grant probability derived from career allow rate.
METHOD AND APPARATUS FOR GENERATING MULTI-DRONE NETWORK COOPERATIVE OPERATION PLAN BASED ON REINFORCEMENT LEARNING

This examiner grants 47% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email