Last updated: April 19, 2026
Application No. 17/945,549
LEARNING DEVICE, COMMUNICATION DEVICE, UNMANNED VEHICLE, WIRELESS COMMUNICATION SYSTEM, LEARNING METHOD, AND COMPUTER-READABLE STORAGE MEDIUM

Final Rejection §103
Filed
Sep 15, 2022
Examiner
SHARMA, SHIVAM
Art Unit
3665
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Mitsubishi Heavy Industries Ltd.
OA Round
4 (Final)
Interview Optional

— -1.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 34 resolved cases, 2023–2026
Examiner Intelligence

SHARMA, SHIVAM View full profile →
Grants 44% of resolved cases
Career Allow Rate
15 granted / 34 resolved
-7.9% vs TC avg
Minimal -1% lift
Without
With
+-1.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
49 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
11.8%
-28.2% vs TC avg
§103
44.8%
+4.8% vs TC avg
§102
19.4%
-20.6% vs TC avg
§112
24.0%
-16.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 34 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in reply to the amendments filed on 10/15/2025 for Application No. 17/945,549
Claims 1, 2 and 4 – 8 are currently pending and have been examined. Claims 1, 2, 7 and 8 have been amended.
This action is made FINAL

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 2 and 4 – 8 are rejected under 35 U.S.C. 103 as being unpatentable over Al-Saadi et al. (Multi-Rate Medium Access Protocol Based on Reinforcement Learning) in view of MAHKONEN et al. (US 20200105151 A1), hereinafter MAHKONEN.

Regarding claim 1, Al-Saadi teaches a learning device (Al-Saadi: Page 1, Abstract: “Many wireless devices employ multi-rate techniques to improve network performance. However, despite the significant amount of research aimed at dynamically adjusting the transmission rate, the majority of this effort considers neither the competing nodes in wireless mesh networks nor the congestion in the nodes. This work employs distributed intelligent agents to observe the surrounding environment in order to dynamically adjust the individual node transmission rates. Reinforcement learning is employed to control the way each node updates its transmission rate based on the transmission rate of the adjacent node as well as the traffic load. This work is validated through extensive simulations that compare the proposed model with three of the most widely cited schemes. The results indicate significant improvement in system throughput.”; Page 2, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 2: “In this work, a new reinforcement learning algorithm, named rate adaptation based on reinforcement learning (RARE), is proposed.”
Supplemental Note: the wireless devices are interpreted to function as a learning device which is able to employ the learning functions taught by the art. RARE is the learning model)
… the learning device comprising: (Al-Saadi: Page 1, Abstract: “Many wireless devices employ multi-rate techniques to improve network performance. However, despite the significant amount of research aimed at dynamically adjusting the transmission rate, the majority of this effort considers neither the competing nodes in wireless mesh networks nor the congestion in the nodes. This work employs distributed intelligent agents to observe the surrounding environment in order to dynamically adjust the individual node transmission rates. Reinforcement learning is employed to control the way each node updates its transmission rate based on the transmission rate of the adjacent node as well as the traffic load. This work is validated through extensive simulations that compare the proposed model with three of the most widely cited schemes. The results indicate significant improvement in system throughput.”; Page 2, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 2: “In this work, a new reinforcement learning algorithm, named rate adaptation based on reinforcement learning (RARE), is proposed.”
Supplemental Note: the wireless devices are interpreted to function as a learning device which is able to employ the learning functions taught by the art. RARE is the learning model)
… wherein the processor is configured to set a first transmission rate for an information transfer rate of a communication device controlled by the computer using the learned model; (Al-Saadi: Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”,
Supplemental Note: the first transmission rate is the maximum rate the physical devices can support set by the RARE learning model)
allow the information transfer rate of a transmission rate of a learning model to reinforcement learn such that a reward given in an environment, in which the first transmission rate is set, when a communication is successful is maximized; (Al-Saadi: Page 2, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 1: “Reinforcement learning is a machine learning technique, which aims to find the optimal action to perform in a dynamic environment. It employs trial and error to evaluate the selected action and find the optimal action through a mathematical formulation. The Q-learning algorithm is one of the most well-known approaches of reinforcement learning applied to wireless networks [12]. In Q-learning, each time (ti) an action is executed, a reward R(ti) is calculated based on feedback from the environment. Then, using (1), the agent re-computes the Q-value, which is subsequently used to estimate the best action again. 

    PNG
    media_image1.png
    67
    569
    media_image1.png
    Greyscale

(1) where a is the learning rate (0≤a≤1), t_i is the current time, t_i-1 is the previous time and y is the discount value. If a =0 then there is no learning in the algorithm; if y=0 the reinforcement learning is opportunistic, which maximizes only the current reward. In this work, a new reinforcement learning algorithm, named rate adaptation based on reinforcement learning (RARE), is proposed. RARE is an agent-based algorithm where each node acts as an intelligent agent.”; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 5: “RARE uses equation (1) to maximise the probability of accessing the wireless channel (LP) by learning from the previous updates of transmission rate. The reward function of Q-learning employs both LP [9] and reward weight (RW). RW is either a positive value, to improve the chance of increasing the transmission rate, or a negative value, to increase the probability of reducing the transmission rate.”,
Supplemental Note: the learning model sends a reward per a successful transmission of a transmission rate. The process starts with setting the data rate to the maximum value the physical device can support and narrows down by giving rewards which increase or decreases transmission rates depending on the successful transmissions)
extract, as the learned model, a learning model in which a number of learning steps is equal to or greater than a predetermined number; (Al-Saadi; Page 2, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 2: “In this work, a new reinforcement learning algorithm, named rate adaptation based on reinforcement learning (RARE), is proposed. RARE is an agent-based algorithm where each node acts as an intelligent agent. Each agent calculates the probability of accessing the communication medium based on the number of unsuccessful transmissions and the current transmission rate. In addition, each node receives a “hello” message periodically from its neighbors containing the transmission rate, the probability of access to the channel and the estimated traffic load. Reinforcement learning is utilized by each node to calculate whether the probability of accessing the channel has improved since the last transmission message. Thus, it learns from previous actions whether it is necessary to update the transmission rate.”,
Supplemental Note: The RARE algorithm is described above and Fig. 1 shows the various steps the RARE learning model performs. The predetermined number of steps are the steps performed in the flow chart)

    PNG
    media_image2.png
    993
    936
    media_image2.png
    Greyscale

determine whether performance of the learned model extracted by the model extraction unit has reached first performance requirement; (Al-Saadi: Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”,
Supplemental Note: the maximum rate identified by the RARE learning model is the rate the physical device can support)
update the first transmission rate to a second transmission rate lower than the first transmission rate when the processor determines that performance of the learned model has reached the first performance requirement; and (Al-Saadi; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7: “RARE updates the transmission rate in order to reduce the interference on the neighbor nodes and increase LP. In case of a successful transmission, if the number of consecutive successful transmissions (S) is higher than success threshold (Sth), then RW is set to a positive value and the status of the wireless channel is recalculated using (3). If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant;”,
Supplemental Note: the transmission rate is maximized to what the physical devices can support, interpreted as the first performance requirement. The rate is then evaluated by the RARE learning model which if the failure threshold is met by the maximized rate, the rate is then reduced, interpreted as the second transmission rate. This process is shown in Fig. 1 below)

    PNG
    media_image2.png
    993
    936
    media_image2.png
    Greyscale

the processor determines whether performance of the learned model updated to the second transmission rate satisfies second performance requirement being the same as or inferior to the first performance requirement, (Al-Saadi; Al-Saadi; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7: “RARE updates the transmission rate in order to reduce the interference on the neighbor nodes and increase LP. In case of a successful transmission, if the number of consecutive successful transmissions (S) is higher than success threshold (Sth), then RW is set to a positive value and the status of the wireless channel is recalculated using (3). If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant;”
Supplemental Note: the maximized rate is determined by the rate that the physical device can support. If this rate meets the failure threshold, the rate is reduced so it can perform the same as the maximized rate without any congestion)
when the processor does not determine that performance of the learned model updated to the second transmission rate satisfies the second performance requirement, repeats the reinforcement learning until it is determined that the performance of the learned model has reached the first performance requirement, (Al-Saadi; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7: “RARE updates the transmission rate in order to reduce the interference on the neighbor nodes and increase LP. In case of a successful transmission, if the number of consecutive successful transmissions (S) is higher than success threshold (Sth), then RW is set to a positive value and the status of the wireless channel is recalculated using (3). If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant;”
Supplemental Note: the maximized rate (interpreted as the claimed first transmission rate) is determined by the rate that the physical device can support (interpreted as the claimed first performance requirement). If this rate meets the failure threshold, the rate is reduced (interpreted as the claimed second transmission rate) so it can perform the same as the maximized rate without any congestion based on the amount of consecutive successful transmissions (interpreted as the claimed second performance requirement). However the transmission rates increase due to the consecutive transmission failures exceeding the failure threshold, thus the rate can be increased until it reaches the maximum rate (interpreted as the claimed first transmission rate) if the failure threshold keeps being met. This process is shown in Fig. 1 below)

    PNG
    media_image2.png
    993
    936
    media_image2.png
    Greyscale

when the processor determines that performance of the learned model updated to the second transmission rate satisfies the second performance requirement, (Al-Saadi; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7: “In case of a successful transmission, if the number of consecutive successful transmissions (S) is higher than success threshold (Sth), then RW is set to a positive value and the status of the wireless channel is recalculated using (3). If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased.”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7 – 8: “3 and 4 are selected for Sth and Fth respectively, as they provide the highest throughput. Finally, RARE updates Qlend(ti), Qd(ti) and LPd(ti) based on the ‘hello’ messages that each node receives periodically, and proceeds with the next available MDPU.” ,
Supplemental Note: based on the successful transmissions, the system is able to identify the success and failure thresholds to be applied)
… that satisfies the second performance requirement at the second transmission rate (Al-Saadi: Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING , Paragraph 7: “If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant; however, 3 and 4 are selected for Sth and Fth respectively, as they provide the highest throughput.”; Page 6, V. CONCLUSION, Paragraph 1: “This paper introduces a new reinforcement algorithm, which adaptively updates the transmission rate in order to increase the success rate of accessing the channel without interfering with the other nodes in WMN. The algorithm learns from previous updates to avoid unnecessary change in the transmission rate (e.g. due to channel error rather than interference), which causes packets loss.”,
Supplemental Note: the second transmission rate is the one lower than the maximum rate the physical device can support. The second transmission rate is modified depending on the consecutive transmission failures exceeding a failure method. The model is able to increase and decrease the transmission rate to satisfy the success rate. In the example above, the threshold values are adjusted to what provides the highest throughput, thus this also updates the second transmission rate to what provides the highest throughput)
… repeat reinforcement learning of the learned model until it is determined that the performance of the learned model satisfies the second performance requirement when performance of the learned model updated to the second transmission rate is not determined to satisfy the second performance requirement. (Al-Saadi: Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING , Paragraph 7: “If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant; however, 3 and 4 are selected for Sth and Fth respectively, as they provide the highest throughput.”; Page 6, V. CONCLUSION, Paragraph 1: “This paper introduces a new reinforcement algorithm, which adaptively updates the transmission rate in order to increase the success rate of accessing the channel without interfering with the other nodes in WMN. The algorithm learns from previous updates to avoid unnecessary change in the transmission rate (e.g. due to channel error rather than interference), which causes packets loss.”,
Supplemental Note: the second transmission rate is the one lower than the maximum rate the physical device can support. The second transmission rate is modified depending on the consecutive transmission failures exceeding a failure method. The model is able to increase and decrease the transmission rate to satisfy the success rate. In the example above, the threshold values are adjusted to what provides the highest throughput, thus this also updates the second transmission rate to what provides the highest throughput. This process is also shown below in Figure 1 which states the process is continuously repeating as it updates it’s parameters)























    PNG
    media_image2.png
    993
    936
    media_image2.png
    Greyscale

	
In sum, Al-Saadi teaches a learning device comprising:  wherein the processor is configured to set a first transmission rate for an information transfer rate of a communication device controlled by the computer using the learned model; allow the information transfer rate of a transmission rate of a learning model to reinforcement learn such that a reward given in an environment, in which the first transmission rate is set, when a communication is successful is maximized; extract, as the learned model, a learning model in which a number of learning steps is equal to or greater than a predetermined number; determine whether performance of the learned model extracted by the model extraction unit has reached first performance requirement; update the first transmission rate to a second transmission rate lower than the first transmission rate when the processor determines that performance of the learned model has reached the first performance requirement; and the processor determines whether performance of the learned model updated to the second transmission rate satisfies second performance requirement being the same as or inferior to the first performance requirement, when the processor does not determine that performance of the learned model updated to the second transmission rate satisfies the second performance requirement, repeats the reinforcement learning until it is determined that the performance of the learned model has reached the first performance requirement, when the processor determines that performance of the learned model updated to the second transmission rate satisfies the second performance requirement, that satisfies the second performance requirement at the second transmission rate and repeat reinforcement learning of the learned model until it is determined that the performance of the learned model satisfies the second performance requirement when performance of the learned model updated to the second transmission rate is not determined to satisfy the second performance requirement. Al-Saadi however does not teach a learned model to be installed in a computer of an unmanned vehicle having the computer to learn and a communication device, a processor controls the operation of each unit of the learning device, and a storage unit is a memory that stores the learned model, and select the learned model to be installed in the computer whereas Mahkonen does. 
Mahkonen teaches for allowing a learned model to be installed in a computer of an unmanned vehicle having the computer to learn and a communication device, (Mahkonen: Paragraph 0003: “Unmanned Aerial Vehicle (UAV), also sometime referred to as a Drone, is a radio controlled or automated aircraft. Typically, UAVs are controlled by the users over analog radio controlled (RC) channel, but today autopilot software (SW) may be utilized in the aircraft to let them fly beyond the line of sight of the human operators. Autonomous UAVs still require connectivity to the network in order for an operator to recall or change the mission the UAV is executing. Typically for such communication the UAVs will be equipped with Mobile Internet interfaces (e.g. 3GPP radio or WiFi).”; Paragraph 0119: “UAV 150 includes a computing device 601, which includes a processor 602. A memory 603 is included as part of computing device 601, but in other embodiments, memory 603 may be separate from the computing device 601. Memory 603 stores instructions that when executed by processor 602 cause the UAV 150 to perform various operations dependent on the particular software being executed.”; Paragraph 0140: “It is to be noted that memory 803 and 903 may store code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using machine-readable (e.g., computer-readable) media, such as computer-readable storage media as described earlier in the disclosure. The instructions (e.g. computer program) executed on the processor cause the network node 801, 901 to receive and rebroadcast the geolocation information and the other information of an UAV as described above. Memory 803, 903 may be a computer-readable storage medium storing instructions which, when executed by the processor 802, 902, cause the network node 801, 901 to perform the method of receiving and rebroadcasting the geolocation information and the other information of an UAV as described above”,
Supplemental Note: a UAV is an unmanned vehicle. The computer is the computing device which is able to have multiple software, such as the learning model as taught by Al-Saadi, to be used to install onto itself)
… a processor controls the operation of each unit of the learning device, and a storage unit is a memory that stores the learned model, (Mahkonen: Paragraph 0010: “UAV that comprises a radio transceiver to transmit and receive radio communication, a processor, and a memory, in which the memory contains instructions, which when the instructions are executed by the processor,”; Paragraph 0119: “UAV 150 includes a computing device 601, which includes a processor 602. A memory 603 is included as part of computing device 601, but in other embodiments, memory 603 may be separate from the computing device 601. Memory 603 stores instructions that when executed by processor 602 cause the UAV 150 to perform various operations dependent on the particular software being executed.” ,
Supplemental Note: the UAV’s processor and memory is used to store and operate the learning model when combined with Al-Saadi)
… select the learned model to be installed in the computer, wherein (Mahkonen: Paragraph 40: “A memory referenced in the specification may store code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using non-transitory machine-readable (e.g., computer-readable) media, such as machine-readable storage media (e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (e.g., electrical, optical, radio, acoustical or other form of propagated signals—such as carrier waves, infrared signals)”: Paragraph 0119: “Memory 603 stores instructions that when executed by processor 602 cause the UAV 150 to perform various operations dependent on the particular software being executed.”; Paragraph 0120: “The various instructions (e.g. 604, 605, 606, 607) may be a computer program comprising instructions which, when executed by the processor 602, cause the UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above. Memory 603 may be a computer-readable storage medium storing instructions which, when executed by the processor 602, cause UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above”;
Supplemental Note: the processor is able to execute various software used to perform wireless communication tasks which when combined with Al-Saadi, can be used to install the learning model for transmission rates for wireless communication as will be described further in the rejection) 
… the processor selects the learned model (Mahkonen: Paragraph 40: “A memory referenced in the specification may store code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using non-transitory machine-readable (e.g., computer-readable) media, such as machine-readable storage media (e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (e.g., electrical, optical, radio, acoustical or other form of propagated signals—such as carrier waves, infrared signals)”: Paragraph 0119: “Memory 603 stores instructions that when executed by processor 602 cause the UAV 150 to perform various operations dependent on the particular software being executed.”; Paragraph 0120: “The various instructions (e.g. 604, 605, 606, 607) may be a computer program comprising instructions which, when executed by the processor 602, cause the UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above. Memory 603 may be a computer-readable storage medium storing instructions which, when executed by the processor 602, cause UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above”;
Supplemental Note: the processor is able to execute various software used to perform wireless communication tasks which when combined with Al-Saadi, can be used to install the learning model for transmission rates for wireless communication as will be described further in the rejection) 
… as the learned model to be installed in the computer, and (Mahkonen: Paragraph 40: “A memory referenced in the specification may store code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using non-transitory machine-readable (e.g., computer-readable) media, such as machine-readable storage media (e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (e.g., electrical, optical, radio, acoustical or other form of propagated signals—such as carrier waves, infrared signals)”: Paragraph 0119: “Memory 603 stores instructions that when executed by processor 602 cause the UAV 150 to perform various operations dependent on the particular software being executed.”; Paragraph 0120: “The various instructions (e.g. 604, 605, 606, 607) may be a computer program comprising instructions which, when executed by the processor 602, cause the UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above. Memory 603 may be a computer-readable storage medium storing instructions which, when executed by the processor 602, cause UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above”;
Supplemental Note: the processor is able to execute various software used to perform wireless communication tasks which when combined with Al-Saadi, can be used to install the learning model for transmission rates for wireless communication as will be described further in the rejection) 
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the invention disclosed by Al-Saadi with the teachings of Mahkonen with a reasonable expectation of success. Mahkonen teaches a plurality of unmanned aerial vehicles that utilize wireless devices to be able to communicate with each other and other components. The wireless mesh networks currently use nodes as relays that work wirelessly to aid in propagating data from one source to another by using multi-hop paths. To send data faster, these nodes will need a higher transmission rate however that increases the interference among the different nodes which overall reduces the network throughput. The technology Al-Saadi applies a learning model which controls the transmission speeds for each of the nodes in the network to lower the impact of congestion. One with knowledge in the art would find it obvious to try or have motivation to utilize the teaching of Al-Saadi’s model into their system of unmanned flying vehicles. Unmanned vehicle’s advantage of not requiring a user inside the vehicle is also one of its disadvantages as making changes to how the vehicle operates requires the timely transmission of data from and to the user to the vehicles. Al-Saadi teaches a model which reduces the impact of congestion when wireless data is being transmitted, thus able to mitigate transmission congestion issues between the unmanned aerial vehicles of Mahkonen. Al-Saadi further teaches this ability by repeating the process of finding a transmission rate that has the highest success rate, in which the success/failure thresholds can be adjusted to find this transmission rate. For example, if a large data set with multiple operations are wirelessly sent to an unmanned aerial vehicle, the probability the data will be impacted by congestion is mitigated if the learning model is Al-Saadi is implemented, thus increasing the effectiveness of the vehicles to receive and send data in a timely manner. 

Regarding claim 2, Al-Saadi, as modified, teaches wherein the processor changes the second transmission rate in accordance with the second performance requirement (Al-Saadi; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network” Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7: “RARE updates the transmission rate in order to reduce the interference on the neighbor nodes and increase LP. In case of a successful transmission, if the number of consecutive successful transmissions (S) is higher than success threshold (Sth), then RW is set to a positive value and the status of the wireless channel is recalculated using (3). If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant;”,
Supplemental Note: the transmission rate is maximized to what the physical devices can support, interpreted as the first performance requirement, the rate is then evaluated by the RARE learning model which if the failure threshold is met by the maximized rate, the rate is then reduced, interpreted as the second transmission rate. This process is shown in Fig. 1 below)

    PNG
    media_image2.png
    993
    936
    media_image2.png
    Greyscale


Regarding claim 4, Al-Saadi, as modified, teaches a communication device configured to perform communication based on control using the learned model learned by the learning device according to claim 1. (Al-Saadi: Page 1, Abstract: “Many wireless devices employ multi-rate techniques to improve network performance. However, despite the significant amount of research aimed at dynamically adjusting the transmission rate, the majority of this effort considers neither the competing nodes in wireless mesh networks nor the congestion in the nodes. This work employs distributed intelligent agents to observe the surrounding environment in order to dynamically adjust the individual node transmission rates. Reinforcement learning is employed to control the way each node updates its transmission rate based on the transmission rate of the adjacent node as well as the traffic load. This work is validated through extensive simulations that compare the proposed model with three of the most widely cited schemes. The results indicate significant improvement in system throughput.”,
Supplemental Note: the wireless devices are used in this prior art to perform the learning model)

Regarding claim 5, Al-Saadi, as modified, teaches a communication device, wherein the computer performs communication through the communication device using the learned model. (Al-Saadi: Page 1, Abstract: “Many wireless devices employ multi-rate techniques to improve network performance. However, despite the significant amount of research aimed at dynamically adjusting the transmission rate, the majority of this effort considers neither the competing nodes in wireless mesh networks nor the congestion in the nodes. This work employs distributed intelligent agents to observe the surrounding environment in order to dynamically adjust the individual node transmission rates. Reinforcement learning is employed to control the way each node updates its transmission rate based on the transmission rate of the adjacent node as well as the traffic load. This work is validated through extensive simulations that compare the proposed model with three of the most widely cited schemes. The results indicate significant improvement in system throughput.”,
Supplemental Note: the wireless devices are used in this prior art to perform the learning model)
In sum, Al-Saadi teaches a communication device used to perform communication by its learned model. Al-Saadi however does not teach an unmanned vehicle able to install the learning model whereas Mahkonen does. 
Mahkonen teaches an unmanned vehicle comprising: a computer in which the learned model learned by the learning device according to claim 1 is installed; and (Mahkonen: Paragraph 0119: “UAV 150 includes a computing device 601, which includes a processor 602. A memory 603 is included as part of computing device 601, but in other embodiments, memory 603 may be separate from the computing device 601. Memory 603 stores instructions that when executed by processor 602 cause the UAV 150 to perform various operations dependent on the particular software being executed.”; Paragraph 0140: “It is to be noted that memory 803 and 903 may store code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using machine-readable (e.g., computer-readable) media, such as computer-readable storage media as described earlier in the disclosure. The instructions (e.g. computer program) executed on the processor cause the network node 801, 901 to receive and rebroadcast the geolocation information and the other information of an UAV as described above. Memory 803, 903 may be a computer-readable storage medium storing instructions which, when executed by the processor 802, 902, cause the network node 801, 901 to perform the method of receiving and rebroadcasting the geolocation information and the other information of an UAV as described above”; Paragraph 0119: “Memory 603 stores instructions that when executed by processor 602 cause the UAV 150 to perform various operations dependent on the particular software being executed.”,
Supplemental Note: unmanned aerial vehicles are equivalent to the unmanned vehicle that comprises a computing device with various software installed and the processor can select a particular software to be used. This can be combined below with the prior art of Al-Saadi as described below)
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the invention disclosed by Al-Saadi with the teachings of Mahkonen with a reasonable expectation of success. As discussed in claim 1, Al-Saadi teaches a learning model which is used to improve the network performance, with the use of reinforcement learning, particularly for the wireless devices used for communication. Mahkonen teaches a plurality of unmanned aerial vehicles that utilize wireless devices to be able to communicate with each other and other components. The wireless mesh networks currently use nodes as relays that work wirelessly to aid in propagating data from one source to another by using multi-hop paths. To send data faster, these nodes will need a higher transmission rate however that increases the interference among the different nodes which overall reduces the network throughput. The technology Al-Saadi applies a learning model which controls the transmission speeds for each of the nodes in the network to lower the impact of congestion. One with knowledge in the art would find it obvious to try or have motivation to utilize the teaching of Al-Saadi’s model into their system of unmanned flying vehicles. Unmanned vehicle’s advantage of not requiring a user inside the vehicle is also one of its disadvantages as making changes to how the vehicle operates requires the timely transmission of data from and to the user to the vehicles. Al-Saadi teaches a model which reduces the impact of congestion when wireless data is being transmitted, thus able to mitigate transmission congestion issues between the unmanned aerial vehicles of Mahkonen. For example, if a large data set with multiple operations are wirelessly sent to an unmanned aerial vehicle, the probability the data will be impacted by congestion is mitigated if the learning model is Al-Saadi is implemented, thus increasing the effectiveness of the vehicles to receive and send data in a timely manner.

Regarding claim 6, Al-Saadi does not teach a wireless communication system comprising a plurality of unmanned vehicles corresponding to the unmanned vehicle of according to claim whereas Mahkonen does. 
Mahkonen teaches a wireless communication system comprising a plurality of unmanned vehicles each corresponding to the unmanned vehicle according to claim 5. (Mahkonen: Abstract: “Broadcasting geolocation information of an Unmanned Aerial Vehicle (UAV) from the UAV by determining current geolocation of the UAV by communicating with a geolocation service and utilizing the geolocation service to geolocate the UAV.”; Paragraph 0132: “The radio wirelessly communicates with the radio network access device when broadcasting the geolocation information and the other information and communicates with other UAVs when communicating using peer-to-peer communication, such as the sidelink communication.”,
Supplemental Note: the system taught in the prior art can be used in multiple UAVs)
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the invention disclosed by Al-Saadi. As discussed in claim 1, Al-Saadi teaches a learning model which is used to improve the network performance, with the use of reinforcement learning, particularly for the wireless devices used for communication. Mahkonen teaches a plurality of unmanned aerial vehicles that utilize wireless devices to be able to communicate with each other and other components. The wireless mesh networks currently use nodes as relays that work wirelessly to aid in propagating data from one source to another by using multi-hop paths. To send data faster, these nodes will need a higher transmission rate however that increases the interference among the different nodes which overall reduces the network throughput. The technology Al-Saadi applies a learning model which controls the transmission speeds for each of the nodes in the network to lower the impact of congestion. One with knowledge in the art would find it obvious to try or have motivation to utilize the teaching of Al-Saadi’s model into their system of unmanned flying vehicles. Unmanned vehicle’s advantage of not requiring a user inside the vehicle is also one of its disadvantages as making changes to how the vehicle operates requires the timely transmission of data from and to the user to the vehicles. Al-Saadi teaches a model which reduces the impact of congestion when wireless data is being transmitted, thus able to mitigate transmission congestion issues between the unmanned aerial vehicles of Mahkonen. For example, if a large data set with multiple operations are wirelessly sent to an unmanned aerial vehicle, the probability the data will be impacted by congestion is mitigated if the learning model is Al-Saadi is implemented, thus increasing the effectiveness of the vehicles to receive and send data in a timely manner. This relates to the plurality of unmanned vehicles as all they will be able to utilize this learning model, thus the advantages are spread to all the vehicles.  

Regarding claim 7, Al-Saadi teaches a learning method for allowing a learned model to be installed in a computer to learn, (taught by Mahkonen) the learning method comprising: (Al-Saadi: Page 1, Abstract: “Many wireless devices employ multi-rate techniques to improve network performance. However, despite the significant amount of research aimed at dynamically adjusting the transmission rate, the majority of this effort considers neither the competing nodes in wireless mesh networks nor the congestion in the nodes. This work employs distributed intelligent agents to observe the surrounding environment in order to dynamically adjust the individual node transmission rates. Reinforcement learning is employed to control the way each node updates its transmission rate based on the transmission rate of the adjacent node as well as the traffic load. This work is validated through extensive simulations that compare the proposed model with three of the most widely cited schemes. The results indicate significant improvement in system throughput.”; Page 2, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 2: “In this work, a new reinforcement learning algorithm, named rate adaptation based on reinforcement learning (RARE), is proposed.”
Supplemental Note: the wireless devices are interpreted to function as a learning device which is able to employ the learning functions taught by the art. RARE is the learning model)
 setting a first transmission rate for an information transfer rate a of a communication device controlled by the computer using the learned model; (Al-Saadi: Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”,
Supplemental Note: the first transmission rate is the maximum rate the physical devices can support set by the RARE learning model)
allowing the information transfer rate of a transmission rate of a learning model to reinforcement learn such that a reward given in an environment, in which the first transmission rate is set, when a communication is successful is maximized; (Al-Saadi: Page 2, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 1: “Reinforcement learning is a machine learning technique, which aims to find the optimal action to perform in a dynamic environment. It employs trial and error to evaluate the selected action and find the optimal action through a mathematical formulation. The Q-learning algorithm is one of the most well-known approaches of reinforcement learning applied to wireless networks [12]. In Q-learning, each time (ti) an action is executed, a reward R(ti) is calculated based on feedback from the environment. Then, using (1), the agent re-computes the Q-value, which is subsequently used to estimate the best action again. 

    PNG
    media_image1.png
    67
    569
    media_image1.png
    Greyscale

(1) where a is the learning rate (0≤a≤1), t_i is the current time, t_i-1 is the previous time and y is the discount value. If a =0 then there is no learning in the algorithm; if y=0 the reinforcement learning is opportunistic, which maximizes only the current reward. In this work, a new reinforcement learning algorithm, named rate adaptation based on reinforcement learning (RARE), is proposed. RARE is an agent-based algorithm where each node acts as an intelligent agent.”; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 5: “RARE uses equation (1) to maximise the probability of accessing the wireless channel (LP) by learning from the previous updates of transmission rate. The reward function of Q-learning employs both LP [9] and reward weight (RW). RW is either a positive value, to improve the chance of increasing the transmission rate, or a negative value, to increase the probability of reducing the transmission rate.”,
Supplemental Note: the learning model sends a reward per a successful transmission of a transmission rate. The process starts with setting the data rate to the maximum value the physical device can support and narrows down by giving rewards which increase or decreases transmission rates depending on the successful transmissions)
extracting, as the learned model, a learning model in which a number of learning steps is equal to or greater than a predetermined number; (Al-Saadi; Page 2, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 2: “In this work, a new reinforcement learning algorithm, named rate adaptation based on reinforcement learning (RARE), is proposed. RARE is an agent-based algorithm where each node acts as an intelligent agent. Each agent calculates the probability of accessing the communication medium based on the number of unsuccessful transmissions and the current transmission rate. In addition, each node receives a “hello” message periodically from its neighbors containing the transmission rate, the probability of access to the channel and the estimated traffic load. Reinforcement learning is utilized by each node to calculate whether the probability of accessing the channel has improved since the last transmission message. Thus, it learns from previous actions whether it is necessary to update the transmission rate.”,
Supplemental Note: The RARE algorithm is described above and Fig. 1 shows the various steps the RARE learning model performs. The predetermined number of steps are the steps performed in the flow chart)

    PNG
    media_image2.png
    993
    936
    media_image2.png
    Greyscale

 determining whether performance of the extracted learned model has reached first performance requirement; (Al-Saadi: Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”,
Supplemental Note: the maximum rate identified by the RARE learning model is the rate the physical device can support)
updating the first transmission rate to a second transmission rate lower than the first transmission rate when performance of the learned model is determined to have reached the first performance requirement; (Al-Saadi; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7: “RARE updates the transmission rate in order to reduce the interference on the neighbor nodes and increase LP. In case of a successful transmission, if the number of consecutive successful transmissions (S) is higher than success threshold (Sth), then RW is set to a positive value and the status of the wireless channel is recalculated using (3). If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant;”,
Supplemental Note: the transmission rate is maximized to what the physical devices can support, interpreted as the first performance requirement. The rate is then evaluated by the RARE learning model which if the failure threshold is met by the maximized rate, the rate is then reduced, interpreted as the second transmission rate. This process is shown in Fig. 1 below)

    PNG
    media_image2.png
    993
    936
    media_image2.png
    Greyscale

 determining whether performance of the learned model updated to the second transmission rate satisfies second performance requirement being the same as or inferior to the first performance requirement; (Al-Saadi; Al-Saadi; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7: “RARE updates the transmission rate in order to reduce the interference on the neighbor nodes and increase LP. In case of a successful transmission, if the number of consecutive successful transmissions (S) is higher than success threshold (Sth), then RW is set to a positive value and the status of the wireless channel is recalculated using (3). If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant;”
Supplemental Note: the maximized rate is determined by the rate that the physical device can support. If this rate meets the failure threshold, the rate is reduced so it can perform the same as the maximized rate without any congestion)
repeating the reinforcement learning until it is determined that the performance of the learned model has reached the first performance requirement when performance of the learned model updated to the second transmission rate is determined to satisfy the second performance requirement, (Al-Saadi; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7: “RARE updates the transmission rate in order to reduce the interference on the neighbor nodes and increase LP. In case of a successful transmission, if the number of consecutive successful transmissions (S) is higher than success threshold (Sth), then RW is set to a positive value and the status of the wireless channel is recalculated using (3). If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant;”
Supplemental Note: the maximized rate (interpreted as the claimed first transmission rate) is determined by the rate that the physical device can support (interpreted as the claimed first performance requirement). If this rate meets the failure threshold, the rate is reduced (interpreted as the claimed second transmission rate) so it can perform the same as the maximized rate without any congestion based on the amount of consecutive successful transmissions (interpreted as the claimed second performance requirement). However the transmission rates increase due to the consecutive transmission failures exceeding the failure threshold, thus the rate can be increased until it reaches the maximum rate (interpreted as the claimed first transmission rate) if the failure threshold keeps being met. This process is shown in Fig. 1 below)

    PNG
    media_image2.png
    993
    936
    media_image2.png
    Greyscale

 	… that satisfies the second performance requirement at the second transmission rate (Al-Saadi: Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING , Paragraph 7: “If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant; however, 3 and 4 are selected for Sth and Fth respectively, as they provide the highest throughput.”; Page 6, V. CONCLUSION, Paragraph 1: “This paper introduces a new reinforcement algorithm, which adaptively updates the transmission rate in order to increase the success rate of accessing the channel without interfering with the other nodes in WMN. The algorithm learns from previous updates to avoid unnecessary change in the transmission rate (e.g. due to channel error rather than interference), which causes packets loss.”,
Supplemental Note: the second transmission rate is the one lower than the maximum rate the physical device can support. The second transmission rate is modified depending on the consecutive transmission failures exceeding a failure method. The model is able to increase and decrease the transmission rate to satisfy the success rate. In the example above, the threshold values are adjusted to what provides the highest throughput, thus this also updates the second transmission rate to what provides the highest throughput)
 	… when performance of the learned model updated to the second transmission rate is determined to satisfy the second performance requirement, and (Al-Saadi; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7: “In case of a successful transmission, if the number of consecutive successful transmissions (S) is higher than success threshold (Sth), then RW is set to a positive value and the status of the wireless channel is recalculated using (3). If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased.”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7 – 8: “3 and 4 are selected for Sth and Fth respectively, as they provide the highest throughput. Qlend LPd. Finally, RARE updates (ti), Qd(ti) and (ti) based on the ‘hello’ messages that each node receives periodically, and proceeds with the next available MDPU.” ,
Supplemental Note: based on the successful transmissions, the system is able to identify the success and failure thresholds to be applied)
repeating reinforcement learning of the learned model until it is determined that the performance of the learned model satisfies the second performance requirement when performance of the learned model updated to the second transmission rate is not determined to satisfy the second performance requirement. (Al-Saadi: Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING , Paragraph 7: “If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant; however, 3 and 4 are selected for Sth and Fth respectively, as they provide the highest throughput.”; Page 6, V. CONCLUSION, Paragraph 1: “This paper introduces a new reinforcement algorithm, which adaptively updates the transmission rate in order to increase the success rate of accessing the channel without interfering with the other nodes in WMN. The algorithm learns from previous updates to avoid unnecessary change in the transmission rate (e.g. due to channel error rather than interference), which causes packets loss.”,
Supplemental Note: the second transmission rate is the one lower than the maximum rate the physical device can support. The second transmission rate is modified depending on the consecutive transmission failures exceeding a failure method. The model is able to increase and decrease the transmission rate to satisfy the success rate. In the example above, the threshold values are adjusted to what provides the highest throughput, thus this also updates the second transmission rate to what provides the highest throughput. This process is also shown below in Figure 1 which states the process is continuously repeating as it updates it’s parameters)


    PNG
    media_image2.png
    993
    936
    media_image2.png
    Greyscale
In sum, Al-Saadi teaches a learning method for allowing a learned model to be installed in a computer to learn, (taught by Mahkonen) the learning method comprising: setting a first transmission rate for an information transfer rate a of a communication device controlled by the computer using the learned model; allowing the information transfer rate of a transmission rate of a learning model to reinforcement learn such that a reward given in an environment, in which the first transmission rate is set, when a communication is successful is maximized; extracting, as the learned model, a learning model in which a number of learning steps is equal to or greater than a predetermined number;  determining whether performance of the extracted learned model has reached first performance requirement; updating the first transmission rate to a second transmission rate lower than the first transmission rate when performance of the learned model is determined to have reached the first performance requirement; determining whether performance of the learned model updated to the second transmission rate satisfies second performance requirement being the same as or inferior to the first performance requirement; repeating the reinforcement learning until it is determined that the performance of the learned model has reached the first performance requirement when performance of the learned model updated to the second transmission rate is determined to satisfy the second performance requirement, that satisfies the second performance requirement at the second transmission rate when performance of the learned model updated to the second transmission rate is determined to satisfy the second performance requirement, and repeating reinforcement learning of the learned model until it is determined that the performance of the learned model satisfies the second performance requirement when performance of the learned model updated to the second transmission rate is not determined to satisfy the second performance requirement. Al-Saadi however does not teach a learning method for allowing a learned model to be installed in a computer to learn, and selecting the learned model as the learned model to be installed in the computer whereas Mahkonen does.
Mahkonen teaches a learning method for allowing a learned model to be installed in a computer to learn, (Mahkonen: Paragraph 0003: “Unmanned Aerial Vehicle (UAV), also sometime referred to as a Drone, is a radio controlled or automated aircraft. Typically, UAVs are controlled by the users over analog radio controlled (RC) channel, but today autopilot software (SW) may be utilized in the aircraft to let them fly beyond the line of sight of the human operators. Autonomous UAVs still require connectivity to the network in order for an operator to recall or change the mission the UAV is executing. Typically for such communication the UAVs will be equipped with Mobile Internet interfaces (e.g. 3GPP radio or WiFi).”; Paragraph 0119: “UAV 150 includes a computing device 601, which includes a processor 602. A memory 603 is included as part of computing device 601, but in other embodiments, memory 603 may be separate from the computing device 601. Memory 603 stores instructions that when executed by processor 602 cause the UAV 150 to perform various operations dependent on the particular software being executed.”; Paragraph 0140: “It is to be noted that memory 803 and 903 may store code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using machine-readable (e.g., computer-readable) media, such as computer-readable storage media as described earlier in the disclosure. The instructions (e.g. computer program) executed on the processor cause the network node 801, 901 to receive and rebroadcast the geolocation information and the other information of an UAV as described above. Memory 803, 903 may be a computer-readable storage medium storing instructions which, when executed by the processor 802, 902, cause the network node 801, 901 to perform the method of receiving and rebroadcasting the geolocation information and the other information of an UAV as described above”,
Supplemental Note: a UAV is an unmanned vehicle. The computer is the computing device which is able to have multiple software, such as the learning model as taught by Al-Saadi, to be used to install onto itself)
… and selecting the learned model (Mahkonen: Paragraph 40: “A memory referenced in the specification may store code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using non-transitory machine-readable (e.g., computer-readable) media, such as machine-readable storage media (e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (e.g., electrical, optical, radio, acoustical or other form of propagated signals—such as carrier waves, infrared signals)”: Paragraph 0119: “Memory 603 stores instructions that when executed by processor 602 cause the UAV 150 to perform various operations dependent on the particular software being executed.”; Paragraph 0120: “The various instructions (e.g. 604, 605, 606, 607) may be a computer program comprising instructions which, when executed by the processor 602, cause the UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above. Memory 603 may be a computer-readable storage medium storing instructions which, when executed by the processor 602, cause UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above”;
Supplemental Note: the processor is able to execute various software used to perform wireless communication tasks which when combined with Al-Saadi, can be used to install the learning model for transmission rates for wireless communication as will be described further in the rejection)
… as the learned model to be installed in the computer (Mahkonen: Paragraph 40: “A memory referenced in the specification may store code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using non-transitory machine-readable (e.g., computer-readable) media, such as machine-readable storage media (e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (e.g., electrical, optical, radio, acoustical or other form of propagated signals—such as carrier waves, infrared signals)”: Paragraph 0119: “Memory 603 stores instructions that when executed by processor 602 cause the UAV 150 to perform various operations dependent on the particular software being executed.”; Paragraph 0120: “The various instructions (e.g. 604, 605, 606, 607) may be a computer program comprising instructions which, when executed by the processor 602, cause the UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above. Memory 603 may be a computer-readable storage medium storing instructions which, when executed by the processor 602, cause UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above”;
Supplemental Note: the processor is able to execute various software used to perform wireless communication tasks which when combined with Al-Saadi, can be used to install the learning model for transmission rates for wireless communication as will be described further in the rejection)
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the invention disclosed by Al-Saadi with the teachings of Mahkonen with a reasonable expectation of success. Please refer to the claim rejection of claim 1 as both state the same claimed function therefore rejected under the same pretenses. 

Regarding claims 8, Al-Saadi teaches a non-transitory computer-readable storage medium storing a learning program for allowing a learned model to be installed in a computer to learn using a learning device serving as another computer, (taught by Mahkonen) the learning program causing the learning device to perform: (Al-Saadi: Page 1, Abstract: “Many wireless devices employ multi-rate techniques to improve network performance. However, despite the significant amount of research aimed at dynamically adjusting the transmission rate, the majority of this effort considers neither the competing nodes in wireless mesh networks nor the congestion in the nodes. This work employs distributed intelligent agents to observe the surrounding environment in order to dynamically adjust the individual node transmission rates. Reinforcement learning is employed to control the way each node updates its transmission rate based on the transmission rate of the adjacent node as well as the traffic load. This work is validated through extensive simulations that compare the proposed model with three of the most widely cited schemes. The results indicate significant improvement in system throughput.”; Page 2, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 2: “In this work, a new reinforcement learning algorithm, named rate adaptation based on reinforcement learning (RARE), is proposed.”
Supplemental Note: the wireless devices are interpreted to function as a learning device which is able to employ the learning functions taught by the art. RARE is the learning model)
setting a first transmission rate for an information transfer rate a of a communication device controlled by the computer using the learned model; (Al-Saadi: Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”,
Supplemental Note: the first transmission rate is the maximum rate the physical devices can support set by the RARE learning model)
allowing the information transfer rate of a transmission rate of a learning model to reinforcement learn such that a reward given in an environment, in which the first transmission rate is set, when a communication is successful is maximized; (Al-Saadi: Page 2, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 1: “Reinforcement learning is a machine learning technique, which aims to find the optimal action to perform in a dynamic environment. It employs trial and error to evaluate the selected action and find the optimal action through a mathematical formulation. The Q-learning algorithm is one of the most well-known approaches of reinforcement learning applied to wireless networks [12]. In Q-learning, each time (ti) an action is executed, a reward R(ti) is calculated based on feedback from the environment. Then, using (1), the agent re-computes the Q-value, which is subsequently used to estimate the best action again. 

    PNG
    media_image1.png
    67
    569
    media_image1.png
    Greyscale

(1) where a is the learning rate (0≤a≤1), t_i is the current time, t_i-1 is the previous time and y is the discount value. If a =0 then there is no learning in the algorithm; if y=0 the reinforcement learning is opportunistic, which maximizes only the current reward. In this work, a new reinforcement learning algorithm, named rate adaptation based on reinforcement learning (RARE), is proposed. RARE is an agent-based algorithm where each node acts as an intelligent agent.”; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 5: “RARE uses equation (1) to maximise the probability of accessing the wireless channel (LP) by learning from the previous updates of transmission rate. The reward function of Q-learning employs both LP [9] and reward weight (RW). RW is either a positive value, to improve the chance of increasing the transmission rate, or a negative value, to increase the probability of reducing the transmission rate.”,
Supplemental Note: the learning model sends a reward per a successful transmission of a transmission rate. The process starts with setting the data rate to the maximum value the physical device can support and narrows down by giving rewards which increase or decreases transmission rates depending on the successful transmissions)
extracting, as the learned model, a learning model in which a number of learning steps is equal to or greater than a predetermined number; (Al-Saadi; Page 2, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 2: “In this work, a new reinforcement learning algorithm, named rate adaptation based on reinforcement learning (RARE), is proposed. RARE is an agent-based algorithm where each node acts as an intelligent agent. Each agent calculates the probability of accessing the communication medium based on the number of unsuccessful transmissions and the current transmission rate. In addition, each node receives a “hello” message periodically from its neighbors containing the transmission rate, the probability of access to the channel and the estimated traffic load. Reinforcement learning is utilized by each node to calculate whether the probability of accessing the channel has improved since the last transmission message. Thus, it learns from previous actions whether it is necessary to update the transmission rate.”,
Supplemental Note: The RARE algorithm is described above and Fig. 1 shows the various steps the RARE learning model performs. The predetermined number of steps are the steps performed in the flow chart)

    PNG
    media_image2.png
    993
    936
    media_image2.png
    Greyscale

determining whether performance of the extracted learned model has reached first performance requirement; (Al-Saadi: Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”,
Supplemental Note: the maximum rate identified by the RARE learning model is the rate the physical device can support)
updating the first transmission rate to a second transmission rate lower than the first transmission rate when performance of the learned model is determined to have reached the first performance requirement; (Al-Saadi; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7: “RARE updates the transmission rate in order to reduce the interference on the neighbor nodes and increase LP. In case of a successful transmission, if the number of consecutive successful transmissions (S) is higher than success threshold (Sth), then RW is set to a positive value and the status of the wireless channel is recalculated using (3). If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant;”,
Supplemental Note: the transmission rate is maximized to what the physical devices can support, interpreted as the first performance requirement. The rate is then evaluated by the RARE learning model which if the failure threshold is met by the maximized rate, the rate is then reduced, interpreted as the second transmission rate. This process is shown in Fig. 1 below)

    PNG
    media_image2.png
    993
    936
    media_image2.png
    Greyscale

determining whether performance of the learned model updated to the second transmission rate satisfies second performance requirement being the same as or inferior to the first performance requirement; (Al-Saadi; Al-Saadi; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7: “RARE updates the transmission rate in order to reduce the interference on the neighbor nodes and increase LP. In case of a successful transmission, if the number of consecutive successful transmissions (S) is higher than success threshold (Sth), then RW is set to a positive value and the status of the wireless channel is recalculated using (3). If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant;”
Supplemental Note: the maximized rate is determined by the rate that the physical device can support. If this rate meets the failure threshold, the rate is reduced so it can perform the same as the maximized rate without any congestion)
repeating the reinforcement learning until it is determined that the performance of the learned model has reached the first performance requirement when performance of the learned model updated to the second transmission rate is determined to satisfy the second performance requirement, (Al-Saadi; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3: “The algorithm explores the network environment by setting the data rate to the maximum value that the physical device can support. Then, it initializes other parameters to zero as shown in Figure 1. In order to estimate the load on each node, equation (2) from [7] is employed to calculate the average queue length [variable] where [variable] is the average queue length of nodes [variable] is the set of all available nodes in the network”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7: “RARE updates the transmission rate in order to reduce the interference on the neighbor nodes and increase LP. In case of a successful transmission, if the number of consecutive successful transmissions (S) is higher than success threshold (Sth), then RW is set to a positive value and the status of the wireless channel is recalculated using (3). If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant;”
Supplemental Note: the maximized rate (interpreted as the claimed first transmission rate) is determined by the rate that the physical device can support (interpreted as the claimed first performance requirement). If this rate meets the failure threshold, the rate is reduced (interpreted as the claimed second transmission rate) so it can perform the same as the maximized rate without any congestion based on the amount of consecutive successful transmissions (interpreted as the claimed second performance requirement). However the transmission rates increase due to the consecutive transmission failures exceeding the failure threshold, thus the rate can be increased until it reaches the maximum rate (interpreted as the claimed first transmission rate) if the failure threshold keeps being met. This process is shown in Fig. 1 below)

    PNG
    media_image2.png
    993
    936
    media_image2.png
    Greyscale

… that satisfies the second performance requirement at the second transmission rate (Al-Saadi: Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING , Paragraph 7: “If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant; however, 3 and 4 are selected for Sth and Fth respectively, as they provide the highest throughput.”; Page 6, V. CONCLUSION, Paragraph 1: “This paper introduces a new reinforcement algorithm, which adaptively updates the transmission rate in order to increase the success rate of accessing the channel without interfering with the other nodes in WMN. The algorithm learns from previous updates to avoid unnecessary change in the transmission rate (e.g. due to channel error rather than interference), which causes packets loss.”,
Supplemental Note: the second transmission rate is the one lower than the maximum rate the physical device can support. The second transmission rate is modified depending on the consecutive transmission failures exceeding a failure method. The model is able to increase and decrease the transmission rate to satisfy the success rate. In the example above, the threshold values are adjusted to what provides the highest throughput, thus this also updates the second transmission rate to what provides the highest throughput)
… when performance of the learned model updated to the second transmission rate is determined to satisfy the second performance requirement, and (Al-Saadi; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7: “In case of a successful transmission, if the number of consecutive successful transmissions (S) is higher than success threshold (Sth), then RW is set to a positive value and the status of the wireless channel is recalculated using (3). If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased.”; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7 – 8: “3 and 4 are selected for Sth and Fth respectively, as they provide the highest throughput. Qlend LPd. Finally, RARE updates (ti), Qd(ti) and (ti) based on the ‘hello’ messages that each node receives periodically, and proceeds with the next available MDPU.” ,
Supplemental Note: based on the successful transmissions, the system is able to identify the success and failure thresholds to be applied)
repeating reinforcement learning of the learned model until it is determined that the performance of the learned model satisfies the second performance requirement when performance of the learned model updated to the second transmission rate is not determined to satisfy the second performance requirement. (Al-Saadi: Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING , Paragraph 7: “If the wireless link shows improvement since the last transmission and the load in the nodes that share the wireless channel is not high, then the transmission rate is increased. Conversely, if the transmission fails, and the number of consecutive transmissions failure (F) exceeds the failure threshold (Fth) then RW is set to a negative value, and Q(ti) is recalculated using (3). Then, if the Q(ti) is smaller than Q(ti-1) and the load on the node is low, then RARE decreases the transmission rate. For Sth and Fth thresholds, the simulation uses the following values: 2, 3, 4 and 5. The results show that the impact of these threshold values on the average system throughput is insignificant; however, 3 and 4 are selected for Sth and Fth respectively, as they provide the highest throughput.”; Page 6, V. CONCLUSION, Paragraph 1: “This paper introduces a new reinforcement algorithm, which adaptively updates the transmission rate in order to increase the success rate of accessing the channel without interfering with the other nodes in WMN. The algorithm learns from previous updates to avoid unnecessary change in the transmission rate (e.g. due to channel error rather than interference), which causes packets loss.”,
Supplemental Note: the second transmission rate is the one lower than the maximum rate the physical device can support. The second transmission rate is modified depending on the consecutive transmission failures exceeding a failure method. The model is able to increase and decrease the transmission rate to satisfy the success rate. In the example above, the threshold values are adjusted to what provides the highest throughput, thus this also updates the second transmission rate to what provides the highest throughput. This process is also shown below in Figure 1 which states the process is continuously repeating as it updates it’s parameters)


    PNG
    media_image2.png
    993
    936
    media_image2.png
    Greyscale
In sum, Al-Saadi teaches a non-transitory computer-readable storage medium storing a learning program for allowing a learned model to be installed in a computer to learn using a learning device serving as another computer, setting a first transmission rate for an information transfer rate a of a communication device controlled by the computer using the learned model; allowing the information transfer rate of a transmission rate of a learning model to reinforcement learn such that a reward given in an environment, in which the first transmission rate is set, when a communication is successful is maximized; extracting, as the learned model, a learning model in which a number of learning steps is equal to or greater than a predetermined number; determining whether performance of the extracted learned model has reached first performance requirement; updating the first transmission rate to a second transmission rate lower than the first transmission rate when performance of the learned model is determined to have reached the first performance requirement; determining whether performance of the learned model updated to the second transmission rate satisfies second performance requirement being the same as or inferior to the first performance requirement; repeating the reinforcement learning until it is determined that the performance of the learned model has reached the first performance requirement when performance of the learned model updated to the second transmission rate is determined to satisfy the second performance requirement that satisfies the second performance requirement at the second transmission rate when performance of the learned model updated to the second transmission rate is determined to satisfy the second performance requirement, and repeating reinforcement learning of the learned model until it is determined that the performance of the learned model satisfies the second performance requirement when performance of the learned model updated to the second transmission rate is not determined to satisfy the second performance requirement. Al-Saadi however does not teach the ability of a learning model to be extracted and installed onto a computer whereas Mahkonen does.
Mahkonen teaches a non-transitory computer-readable storage medium storing a learning program for allowing a learned model to be installed in a computer to learn using a learning device serving as another computer (Mahkonen: Paragraph 0003: “Unmanned Aerial Vehicle (UAV), also sometime referred to as a Drone, is a radio controlled or automated aircraft. Typically, UAVs are controlled by the users over analog radio controlled (RC) channel, but today autopilot software (SW) may be utilized in the aircraft to let them fly beyond the line of sight of the human operators. Autonomous UAVs still require connectivity to the network in order for an operator to recall or change the mission the UAV is executing. Typically for such communication the UAVs will be equipped with Mobile Internet interfaces (e.g. 3GPP radio or WiFi).”; Paragraph 0119: “UAV 150 includes a computing device 601, which includes a processor 602. A memory 603 is included as part of computing device 601, but in other embodiments, memory 603 may be separate from the computing device 601. Memory 603 stores instructions that when executed by processor 602 cause the UAV 150 to perform various operations dependent on the particular software being executed.”; Paragraph 0140: “It is to be noted that memory 803 and 903 may store code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using machine-readable (e.g., computer-readable) media, such as computer-readable storage media as described earlier in the disclosure. The instructions (e.g. computer program) executed on the processor cause the network node 801, 901 to receive and rebroadcast the geolocation information and the other information of an UAV as described above. Memory 803, 903 may be a computer-readable storage medium storing instructions which, when executed by the processor 802, 902, cause the network node 801, 901 to perform the method of receiving and rebroadcasting the geolocation information and the other information of an UAV as described above”,
Supplemental Note: a UAV is an unmanned vehicle. The computer is the computing device which is able to have multiple software, such as the learning model as taught by Al-Saadi, to be used to install onto itself)
… selecting the learned model (Mahkonen: Paragraph 40: “A memory referenced in the specification may store code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using non-transitory machine-readable (e.g., computer-readable) media, such as machine-readable storage media (e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (e.g., electrical, optical, radio, acoustical or other form of propagated signals—such as carrier waves, infrared signals)”: Paragraph 0119: “Memory 603 stores instructions that when executed by processor 602 cause the UAV 150 to perform various operations dependent on the particular software being executed.”; Paragraph 0120: “The various instructions (e.g. 604, 605, 606, 607) may be a computer program comprising instructions which, when executed by the processor 602, cause the UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above. Memory 603 may be a computer-readable storage medium storing instructions which, when executed by the processor 602, cause UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above”;
Supplemental Note: the processor is able to execute various software used to perform wireless communication tasks which when combined with Al-Saadi, can be used to install the learning model for transmission rates for wireless communication as will be described further in the rejection)
… as the learned model to be installed in the computer (Mahkonen: Paragraph 40: “A memory referenced in the specification may store code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using non-transitory machine-readable (e.g., computer-readable) media, such as machine-readable storage media (e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (e.g., electrical, optical, radio, acoustical or other form of propagated signals—such as carrier waves, infrared signals)”: Paragraph 0119: “Memory 603 stores instructions that when executed by processor 602 cause the UAV 150 to perform various operations dependent on the particular software being executed.”; Paragraph 0120: “The various instructions (e.g. 604, 605, 606, 607) may be a computer program comprising instructions which, when executed by the processor 602, cause the UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above. Memory 603 may be a computer-readable storage medium storing instructions which, when executed by the processor 602, cause UAV 150 to perform the method of broadcasting the geolocation information and the other information of the UAV as described above”;
Supplemental Note: the processor is able to execute various software used to perform wireless communication tasks which when combined with Al-Saadi, can be used to install the learning model for transmission rates for wireless communication as will be described further in the rejection)
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the invention disclosed by Al-Saadi with the teachings of Mahkonen with a reasonable expectation of success. Please refer to the claim rejection of claim 1 as both state the same claimed function therefore rejected under the same pretenses. 

Response to Arguments
Applicant’s arguments, see section Rejection under 35 U.S.C. 103 of the REMARKS, filed 10/15/2025, with respect to the 35 U.S.C. 103 prior art rejection of claims 1, 2 and 4 – 8 have been considered but are not persuasive. 	Applicant states neither Al-Saadi nor Mahkonen teach the amended limitations of “when the processor does not determine that performance of the learned model updated to the second transmission rate satisfies the second performance requirement, repeats the reinforcement learning until it is determined that the performance of the learned model has reached the first performance requirement,”. Applicant states that Al-Saadi performs reducing the transmission rate when conditions are not met during repetitive processing whereas the amended limitations claim the created model satisfies a first requirement performance at a the first transmission speed, searching for conditions that satisfy the first required performance and uses the results to set a model that satisfies the second required performance at the second transmission speed. Examiner respectfully disagrees that Al-Saadi does not teach the amended claim limitations. For example, the claimed second performance requirement is interpreted as the a transmission rate meeting a success threshold based on the consecutive number of successful transmissions. However, if the success threshold is not met and a failure threshold is met based on the consecutive number of failed transmissions, the transmission rate is increased. The transmission rate can be increased depending on the number of transmissions meeting a failure threshold to a higher transmission rate which can reach the first performance requirement, interpreted as the maximum transmission rate the devices can support (Al-Saadi; Page 3, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 3; Page 4, III. RATE ADAPTATION BASED ON REINFORCEMENT LEARNING, Paragraph 7; Fig. 1). Therefore the RARE algorithm is able to repeat reinforcement learning until the success threshold is met or keeps increasing the transmission rate until it reaches its maximum rate the devices can support, which is interpreted as the first performance requirement. Please see the rejection above in section Claim Rejections - 35 USC § 103 for the corresponding claims. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Tsujumoto et al. (US 20110269410 A1) – teaches a radio communication apparatus able to evaluate the transmission rates and configure frequences 
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIVAM SHARMA whose telephone number is (703)756-1726. The examiner can normally be reached Monday-Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Erin Bishop can be reached at 571-270-3713. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/SHIVAM SHARMA/Examiner, Art Unit 3665                                                                                                                                                                                                        
/Erin D Bishop/Supervisory Patent Examiner, Art Unit 3665
Read full office action
Prosecution Timeline

Sep 15, 2022
Application Filed
Aug 16, 2024
Non-Final Rejection — §103
Nov 27, 2024
Response Filed
Feb 13, 2025
Final Rejection — §103
May 16, 2025
Request for Continued Examination
May 21, 2025
Response after Non-Final Action
Jul 07, 2025
Non-Final Rejection — §103
Oct 15, 2025
Response Filed
Jan 08, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/588,258
Patent 12491869
METHOD FOR CONTROLLING VEHICLE, VEHICLE AND ELECTRONIC DEVICE
2y 5m to grant Granted Dec 09, 2025
17/759,375
Patent 12485897
METHOD FOR DETERMINING PASSAGE OF AUTONOMOUS VEHICLE AND RELATED DEVICE
2y 5m to grant Granted Dec 02, 2025
17/953,568
Patent 12434722
METHODS AND SYSTEMS FOR LATERAL CONTROL OF A VEHICLE
2y 5m to grant Granted Oct 07, 2025
17/379,264
Patent 12427919
VEHICLE BLIND-SPOT REDUCTION DEVICE
2y 5m to grant Granted Sep 30, 2025
17/670,358
Patent 12406535
INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD
2y 5m to grant Granted Sep 02, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
44%
Grant Probability
43%
With Interview (-1.3%)
3y 1m
Median Time to Grant
High
PTA Risk
Based on 34 resolved cases by this examiner. Grant probability derived from career allow rate.