DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Preliminary Amendment
Preliminary Amendment that was filed on 07/26/2023 is entered.
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 07/26/2023 IDS Considered have been placed in record and considered by the examiner.
Claim Objections
Claims 2-20 are objected to because of the following informalities: Specifically,
Claim 2 line 1: replace “The method according to claim 1” to " -- “The method according to claim 1, ”.
Claim 3 line 1: replace “The method according to claim 1” to " -- “The method according to claim 1, ”.
Claim 4 line 1: replace “The method according to claim 1” to " -- “The method according to claim 1, ”.
Claim 5 line 1: replace “The method according to claim 1” to " -- “The method according to claim 1, ”.
Claim 6 line 1: replace “The method according to claim 1” to " -- “The method according to claim 1, ”.
Claim 7 line 1: replace “The method according to claim 1” to " -- “The method according to claim 1, ”.
Claim 8 line 1: replace “The method according to claim 1” to " -- “The method according to claim 1, ”.
Claim 9 line 1: replace “The method according to claim 1” to " -- “The method according to claim 1, ”.
Claim 13 line 1: replace “The network node according to claim 12” to " -- “The network node according to claim 12, ”.
Claim 14 line 1: replace “The network node according to claim 12” to " -- “The network node according to claim 12, ”.
Claim 15 line 1: replace “The network node according to claim 12” to " -- “The network node according to claim 12, ”.
Claim 16 line 1: replace “The network node according to claim 12” to " -- “The network node according to claim 12, ”.
Claim 17 line 1: replace “The network node according to claim 12” to " -- “The network node according to claim 12, ”.
Claim 18 line 1: replace “The network node according to claim 12” to " -- “The network node according to claim 12, ”.
Claim 19 line 1: replace “The network node according to claim 12” to " -- “The network node according to claim 12, ”.
Claim 20 line 1: replace “The network node according to claim 12” to " -- “The network node according to claim 12, ”.
Appropriate action required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 10-11 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Claim 10 discloses “A computer program comprising instructions, which when executed by a processor, causes the processor to perform actions according to claim 1.”.
The claim is could be interpreted as directed to signal, and that does not contain at least one structural limitation, has no physical or tangible form, and thus does not fall within any statutory category. See MPEP 2106.03.
The claim can be amended to recite “A non-transitory computer program …” to overcome the rejection.
Claim 11 discloses “A carrier comprising the computer program of claim 10, wherein the carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium..”.
The claim is could be interpreted as directed to signal, and that does not contain at least one structural limitation, has no physical or tangible form, and thus does not fall within any statutory category. See MPEP 2106.03.
The claim can be amended to recite “A non-transitory computer program ….. or A non-transitory computer-readable storage medium..”.” to overcome the rejection.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-20 are rejected under 35 U.S.C. 102 (a)(1) or 102 (a)(2)) as being anticipated by G´eza Szab´o et al. (Information Gain Regulation In Reinforcement Learning With The Digital Twins’ Level of Realism; 2020 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS, IEEE, 31 August 2020 (2020-08-31), pages 1-7, XP033837540, DOI: 10.1109/PIMRC48278.2020.9217201; hereinafter as “GEZASZABO”; provided in IDS).
Examiner’s note: in what follows, references are drawn to GEZASZABO unless otherwise mentioned.
With respect to independence claim:
Regarding claim 12, GEZASZABO teaches A network node (==5G Radio in Fig. 1==network Node ) comprising a processor and a memory (5G radio device must have processor and memory) wherein said memory comprises instructions executable by said processor whereby said network node (==5G Radio in Fig. 1) is configured to apply machine learning in a wireless communication network, for training a communication policy controlling radio resources for communication of messages between the network node and a control node (==ARIAC Solution) operating a remotely controlled device (=robot arm as remotely controlled device) (aforesaid 5G radio will have “an architecture in which the radio
access control happens automatically to minimize the utilized radio resources while still maximizing the production KPIs of the robot cell.” , “ To achieve this, we apply Reinforcement Learning (RL) in a simulated environment to explore the environment fast, while the Digital Twin (DT) ensures that the learned policy can be applied on the real world environment as well. We show that the application of Ultra Reliable Low Latency Communication (URLLC) connection can be reduced to approx. 30% of the total radio time while achieving real-world accurate robot control” : [abstract] ‘’ The robot arm is connected robot controller
with an evaluated network setup. The edge cloud consists of robotic controller, a physical simulator that realizes the robotic scenario, a factory cell scheduler that processes the order
and the robot control that is connected to the real robot arm with the access controller. The access controller can switch between the QoC phases. The performance of the robot cell
deployment is evaluated with productivity KPIs. The robot arm”:: [section II, Page 2) ,
PNG
media_image1.png
226
779
media_image1.png
Greyscale
PNG
media_image2.png
334
463
media_image2.png
Greyscale
the network node (==5G Radio in Fig. 1==network Node ) further being configured to:
obtain said messages during one or more communication phases communicated when an initial first communication policy is applied for controlling a Quality of Service, QoS, mode in said communication, wherein the QoS mode is adapted to set to one of at least two predefined QoS modes having different levels of QoS for each of said one or more communication phases (aforesaid 5G Radio with automatic QoS process, “The manual QoC-tagging is replaced with an automatic process performed in the 5G radio.” First, the uplink packets are processed by a Deep Packet Inspection (DPI) module, which ensures that the automatic QoC setup module is aware of the current status of the robot. The status messages are used to feed the ML algorithms whose output sets up the packet scheduler in the 5G radio.” Section : III; “All the 15 ARIAC competition scenarios are considered during the training. At every policy evaluation episode one scenario is randomly selected. The maximum score for this scenario is known apriori and provided to the evaluation environment (max score). At every second the policy is queried based on the current observations, and an action is selected from the discrete action space whether to switch to high or low QoC mode ”: Section III),
train a machine learning model based on said messages and the first communication policy (“All the 15 ARIAC competition scenarios are considered during the training”: Section III; “ we suggest an implementation example in one of the state-of-the-art Reinforcement Learning (RL) trainers. We choose Proximal Policy Optimization (PPO) [21] and its implementation in Ray [22]. PPO perform comparably or better than state-ofthe- art approaches while being much simpler to implement and tune. Its simple implementation enables us to focus on the improvements. Also PPO uses simple synchronous sampling. It means that the trajectories are not buffered for later replay anywhere in the system, the PPO learner use them once” : Section IV, C),
produce a second communication policy based on the machine learning model, wherein the second communication policy comprises at least one adjusted QoS mode for at least one of the one or more communication phases ( “ The QoC switching action is realized as in [4] by selecting low Modulation and Coding Scheme (MCS) (QPSK and ½ rate coding) for the high QoC phase and a high MCS (64- QAM and 2/3 rate coding) for the low QoC phase. ….” “ It is a network delay in the end that the robot and the robot controller experiences. There can be various options to implement the low and high QoC phases on radio. This one is given as an example. here can be setups which the robustness of packet delivery is not affected. The overall goal of the two phases is to relax the radio requirements and utilize the network in a use case optimized way.”: Section IV, B ),
determine a performance score for the second communication policy in the one or more communication phases based on the radio resources used when communicating using the second communication policy and further based on a reduced operation precision when said one or more communication phases are communicated using the adjusted QoS mode (Modifying PPO policy in Listing 1: with “default_policy” PPOTrainer_dt = POTrainer.with_updates(name=" PPOTrainer_dt", default_policy=PPOTFPolicy_dt, make_workers=make_workers_with_dt)”: Section V; provide Specific Score : Section V ),
when the determined performance score indicates a performance exceeding a predetermined performance, apply the second communication policy to said communication between the network node and the control node ( “We checked the ARIAC scores and
ARIAC Total Processing Times (TPT) [14] and the low QoC ratios for the three cases. Figure 6 shows the results. First it is important to note that we do not teach robot control but influence an existing well-performing robot control over the radio quality switching policy. After successful learning, the ARIAC scores remain maxima in all cases. This is due to the fact that policy could be learnt and the reward function worked well. Based on the experienced difference between the low QoC ratios in various setups we can deduce the expected
situation that the real robot with real network is a noisier environment in total compared to the fully simulated setup. The fully simulated setup (Setup A) can achieve a 90% low QoC ratio, while the fully DT case (Setup C) can achieve about 71% without compromising on the perfect ARIAC scores. The policy with 90% low QoC ratio trained fully in simulation can achieve 39% low QoC ratio in the DT setup (Setup B). The difference on the noise of the observation space is most visible in this case. Also the cost of time spent on low QoC phase can be observed in the increased TPT. The robot controller had to compensate on certain points more and have to wait for a high QoC phase to do the accurate positioning. Note that this is the default built-in behavior of the robot controller. This is also an expected behavior that the accuracy loss of low quality network can be compensated by reducing the speed of the robot [26]. ”: SECTION V).
Regarding claim 1, GEZASZABO teaches, a method in a network node for applying machine learning in a wireless communication network for training a communication policy controlling radio resources for communication of messages between the network node and a control node
obtaining said messages during one or more communication phases communicated when an initial first communication policy is applied for controlling a Quality of Service, QoS, mode in said communication, wherein the QoS mode is set to one of at least two predefined QoS modes having different levels of QoS for each of said one or more communication phases,
training a machine learning model based on said messages and the first communication policy,
producing a second communication policy based on the machine learning model, wherein the second communication policy comprises at least one adjusted QoS mode for at least one of the one or more communication phases,
determining a performance score for the second communication policy in the one or more communication phases based on the radio resources used when communicating using the second communication policy and further based on a reduced operation precision when said one or more communication phases are communicating using the adjusted QoS mode,
when the determined performance score indicates a performance exceeding a predetermined performance, applying the second communication policy to said communication between the network node and the control node (Regarding claim 1, the claim is interpreted and rejected for the same reason as set forth in claim 12).
Regarding claim 10, GEZASZABO teaches, A computer program comprising instructions, which when executed by a processor, causes the processor to perform actions according to claim 1 (Regarding claim 10, the claim is interpreted and rejected for the same reason as set forth in claim 1).
Regarding claim 11, GEZASZABO teaches, A carrier comprising the computer program of claim 10, wherein the carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium (Regarding claim 11, the claim is interpreted and rejected for the same reason as set forth in claim 1).
Regarding claims 2, GEZASZABO teaches, The method according to claim 1 wherein said messages comprises a status indication received from the control node and control operations sent to the control node for controlling the remotely controlled device and wherein applying the second communication policy to said communication between the network node and the control node comprises sending the control operations to the control node and receiving the status indication from the control node using the second communication policy (“ the uplink packets are processed by a Deep Packet Inspection (DPI) module, which ensures that the automatic QoC setup module is aware of the current status of the robot. The status messages are used to feed the ML algorithms whose output sets up the packet scheduler in the 5G radio”: Section III ).
Regarding claim 3, GEZASZABO teaches, The method according to claim 1 wherein determining a performance score for the second communication policy further comprises computing the performance score for the second communication policy based on an intermediate reward for selecting a high level or low level QoS mode for the at least one adjusted QoS mode and further based on an end reward for a change in operation precision caused by said selection (“ the uplink packets are processed by a Deep Packet Inspection (DPI) module, which ensures that the automatic QoC setup module is aware of the current status of the robot. The status messages are used to feed the ML algorithms whose output sets up the packet scheduler in the 5G radio.”: Section III ; Modifying PPO policy in Listing 1: with “default_policy” PPOTrainer_dt = POTrainer.with_updates(name=" PPOTrainer_dt", default_policy=PPOTFPolicy_dt, make_workers=make_workers_with_dt)”: Section V; provide Specific Score : Section V ).
Regarding claim 4, GEZASZABO teaches, The method according to claim 1 wherein determining a performance score for the second communication policy comprises any of simulating or measuring the communication performed between the network node and the control node using the second communication policy (Modifying PPO policy in Listing 1: with “default_policy” PPOTrainer_dt = POTrainer.with_updates(name=" PPOTrainer_dt", default_policy=PPOTFPolicy_dt, make_workers=make_workers_with_dt)”: Section V; provide Specific Score : Section V ).
Regarding claim 5, GEZASZABO teaches, The method according to claim to claim 1 wherein training the machine learning model is further based on a first performance score of the first communication policy (“All the 15 ARIAC competition scenarios are considered during the training”: Section III; “ we suggest an implementation example in one of the state-of-the-art Reinforcement Learning (RL) trainers. We choose Proximal Policy Optimization (PPO) [21] and its implementation in Ray [22]. PPO perform comparably or better than state-ofthe- art approaches while being much simpler to implement and tune. Its simple implementation enables us to focus on the improvements. Also PPO uses simple synchronous sampling. It means that the trajectories are not buffered for later replay anywhere in the system, the PPO learner use them once” : Section IV, C).
Regarding claim 6, GEZASZABO teaches, The method according to claim 5 wherein the machine learning model is further trained based on a third communication policy, second messages communicated between the network node and the control node using the third communication policy, and a third performance score associated with the third communication policy ( “ The QoC switching action is realized as in [4] by selecting low Modulation and Coding Scheme (MCS) (QPSK and ½ rate coding) for the high QoC phase and a high MCS (64- QAM and 2/3 rate coding) for the low QoC phase. ….” “ It is a network delay in the end that the robot and the robot controller experiences. There can be various options to implement the low and high QoC phases on radio. This one is given as an example. here can be setups which the robustness of packet delivery is not affected. The overall goal of the two phases is to relax the radio requirements and utilize the network in a use case optimized way.”: Section IV, B ).
Regarding claim 7, GEZASZABO teaches, The method according to claim 1 wherein the at least one adjusted QoS mode is changed from a high level QoS to a low level QoS (“”The manual QoC-tagging is replaced with an automatic process performed in the 5G radio. First, the uplink packets are processed by a Deep Packet Inspection (DPI) module, which ensures that the automatic QoC setup module is aware of the current status of the robot. The status messages are used to feed the ML algorithms whose output sets up the packet scheduler in the 5G radio.”: Section: III; “ The scoring of the ARIAC environment is used together with a reward for the agent after every second to encourage the usage of the low QoC channel. Using low QoC channel provides 10 points to the agent, while using high QoC channel is penalized with 1 point after every second.”: Section III).
Regarding claim 8, GEZASZABO teaches, The method according to claim 1 wherein a high level QoS mode comprises the network node demanding Ultra-Reliable Low-Latency Communication, URLLC, for communicating with the control node ( “ We showed that the application of URLLC connection can be reduced to approx. 30% of the total radio time while achieving real world accurate robot control.”: Section VIII).
Regarding claim 9, GEZASZABO teaches, The method according to claim 1 wherein applying the second communication policy requires the determined performance score to indicate a performance exceeding a predefined performance by a predefined threshold ( “ The fully simulated setup (Setup A) can achieve a 90% low QoC ratio, while the fully DT case (Setup C) can achieve about 71% without compromising on the perfect ARIAC scores. The policy with 90% low QoC ratio trained fully in simulation can achieve 39% low QoC ratio in the DT setup (Setup B). The difference on the noise of the observation space is most visible in this case.” Section V B) .
Regarding claims 12-20, the claim is interpreted and rejected for the same reason as set forth in claims 1-9.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to M MOSTAZIR RAHMAN whose telephone number is (571)272-4785. The examiner can normally be reached 8:30am-5:00pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Derrick Ferris can be reached at 571-272-3123. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/M Mostazir Rahman/Examiner, Art Unit 2411
/DERRICK W FERRIS/Supervisory Patent Examiner, Art Unit 2411