Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mark et al. ( IEEE 2020 54th Asilomar Conference on Signals, Systems, and Computers, “Network Performance Adaptation in Wireless Control with Reinforcement Learning,” Page 413-417), hereinafter, Mark in view of Szigeti et al. (Patent No: US 2022/0321467 A1), hereinafter, Szigeti.
Regarding Claim 1, Mark teaches,
A device of a network for providing dynamic quality of service (QoS) to multiple devices using QoS-aware controls, the device identify first state information received from a first device using the network; identify second state information received from a second device using the network, the first state information and the second state information received using a wireless communication medium shared by the first device and the second device; -Page 614, Fig. 1 ( Fig. 1 shows states of multiple devices (sensors) are shared over a shared wireless channel to determine local control input. As shown in the Figure, sensors 1, 2 sends state information X1, t and x2,t etc. to the Edge processor.)
generate, using machine learning, based on the first state information, a first dynamic QoS to be applied to the first device at a first time, -(Fig. 1 shows a control system that takes first state information x1,t and using machine learning algorithm generates reliability (QoS) to applied to the first device adaptively. Page 415; col. 1; Paragraph 3 recites, “We emphasize that the formulation of the QoS-aware co design problem in (7a)-(7b) is distinct from other related formulations in wireless control” Abstract recites, “We utilize deep reinforcement learning techniques to train a target reliability policy…” reliability is QoS. Page 415; col. 2; Paragraph 4 recites, “To apply DRL to the problem in (7a)-(7b), it suffices to map the given problem to DRL framework described above” As explained, the decision process problem of equation (7a)-(7b) is done based on deep reinforcement learning (DRL) which takes into states of devices (7b) to determine the quality (QoS) to be applied to the device)
wherein the first dynamic QoS minimizes a network resource cost function of the first dynamic QoS while providing a first allocation of resources needed by the first device at the first time; -(Abstract recites, “We formulate a constrained Markov decision process that minimizes a cost of network performance as a means of maximizing resource efficiency, while maintaining control specific constraints such as task completion. We utilize deep reinforcement learning techniques to train a target reliability policy” Section III; Page 414; Paragraph 1 recites, “framework to dynamically adapt the network performance level targets, i.e. reliability qi,t and latency τi,t, of each plant to maximize the total network efficiency.” Section IV, Page 15, col. 2, last paragraph-Page 16, col. 1, Paragraph 1 recites,” To apply DRL to the problem in (7a)-(7b), it suffices to map the given problem to DRL framework described above. Given that the control systems operate independently of one another, we can assume each plant operates its own independent RL problem. Indeed, much of this mapping is immediate. For plant i, the action at is given by the network performance parameters qi,t and τi,t. Likewise, the state st is given by the plant state xi,t with observation ŝt given by yi,t. The state transition probability can then be derived from the dynamical equations and associated parameters” i is 1 for the first device)
generate, using the machine learning, based on the second state information, a second dynamic QoS to be applied to the second device at the first time, wherein the second dynamic QoS minimizes a network resource cost function of the second dynamic QoS while providing a second allocation of resources needed by the second device at the first time; -Section IV (Section IV, Page 15, col. 2, last paragraph-Page 16, col. 1, Paragraph 1 recites,” To apply DRL to the problem in (7a)-(7b), it suffices to map the given problem to DRL framework described above. Given that the control systems operate independently of one another, we can assume each plant operates its own independent RL problem. Indeed, much of this mapping is immediate. For plant i, the action at is given by the network performance parameters qi,t and τi,t. Likewise, the state st is given by the plant state xi,t with observation ŝt given by yi,t. The state transition probability can then be derived from the dynamical equations and associated parameters” i is 2 for the second device)
allocate the first allocation of resources to the first device, based on the first dynamic QoS, at the first time; and allocate the second allocation of resources to the second device, based on the second dynamic QoS, at the first time. -Page 416, Section V (recites, “The scheduling algorithm takes as inputs the set of reliability requirements {qi,t}m i=1 for all plants and uses a combination of probabilistic selective scheduling and rate selection to meet the reliability targets.” Reliability targets are QoS)
Although implicit and well understood, Mark does not explicitly mention,
A device of a network for providing dynamic quality of service (QoS) to multiple devices using QoS-aware controls, the device comprising processing circuitry coupled to storage,
However, in an analogous invention, Szigeti teaches,
A device of a network for providing dynamic quality of service (QoS) to multiple devices using QoS-aware controls, the device comprising processing circuitry coupled to storage, -Paragraph [0031, 0042] ([0042] recites, “In some cases, adaptive QoS process 248 may assess the captured telemetry data on a per-flow or per-packet basis. In other embodiments, adaptive QoS process 248 may assess telemetry data for a plurality of traffic flows based on any number of different conditions. For example, traffic flows may be grouped based on their sources, destinations, temporal characteristics (e.g., flows that occur around the same time, etc.), combinations thereof, or based on any other set of flow characteristics.” [0031] recites, “FIG. 2 is a schematic block diagram of an example node/device 200 (e.g., an apparatus) that may be used with one or more embodiments described herein, e.g., as any of the nodes or devices shown in FIG. 1 above or described in further detail below. The device 200 may comprise one or more network interfaces 210 (e.g., wired, wireless, etc.), at least one processor 220, and a memory 240 interconnected by a system bus 250”)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the “Network Performance Adaptation in Wireless Control with Reinforcement Learning” proposed by Mark to include the concept of “A device of a network for providing dynamic quality of service (QoS) to multiple devices using QoS-aware controls, the device comprising processing circuitry coupled to storage” of Szigeti. One of ordinary skill in the art would have been motivated to make this modification in order to adaptively change the QoS process [0033].
Regarding Claim 2, Mark and Szigeti teaches the limitations of Claim 1.
Mark further teaches,
The device of claim 1, wherein the processing circuitry is further configured to: generate, using the machine learning, a third dynamic QoS to be applied to the first device at a second time, wherein the third dynamic QoS minimizes a network resource cost function of the third dynamic QoS while providing a third allocation of resources needed by the first device at the second time; generate, using the machine learning, a fourth dynamic QoS to be applied to the second device at the second time, wherein the fourth dynamic QoS minimizes a network resource cost function of the fourth dynamic QoS while providing a fourth allocation of resources needed by the second device at the second time; allocate the third allocation of resources to the first device, based on the third dynamic QoS, at the second time; and allocate the fourth allocation of resources to the second device, based on the fourth dynamic QoS, at the second time. – Section III, Fig. 1, Page 414 (Claim 2 is the same as Claim 1 and the only difference is third and fourth dynamic QoS is assigned to first and second device in the second time interval. For dynamic optimization, it is easily understandable to an ordinary person that output (QoS) changes over time and is allocated among devices accordingly. Page 414 recites, “In this paper we devise a framework to dynamically adapt the network performance level targets, i.e. reliability qi,t and latency τi,t, of each plant to maximize the total network efficiency. To adapt such targets based on the state of the underlying plant, we consider functional policies πq i(·;θq i) and πτ i (·; θτ i) that determine the respective reliability and latency needed in the state transmission, i.e., qi,t := πq i(yi,t−1;θq i), τi,t := argmax (5) (6) {πτ i(yi,t−t ; θτ i) − t | πτ i(yi,t−t ;θτ i) + t≤ t}. t∈{1,...,t} Observe in (5)-(6) that the policies πq i and πτ i are functions of the previous measurements of the plant, and have the form of some specific architecture, such as a deep neural network (DNN), parameterized by vectors θq i ∈ Rnq i and θτ i ∈ Rnτ i , respectively.”)
Regarding Claim 3, Mark and Szigeti teaches the limitations of Claim 1.
Mark further teaches,
The device of claim 1, wherein the first dynamic QoS minimizes a network resource cost of the first device operating in a first state at the first time, wherein the first state information is indicative of the first state, wherein the second dynamic QoS minimizes a network resource cost of the second device operating in a second state at the first time, and wherein the second state information is indicative of the second state. -Section III/IV; Abstract, Page 413-416 (Abstract recites, “We formulate a constrained Markov decision process that minimizes a cost of network performance as a means of maximizing resource efficiency, while maintaining control specific constraints such as task completion. We utilize deep reinforcement learning techniques to train a target reliability policy” Section III, Page 414 recites, “framework to dynamically adapt the network performance level targets, i.e. reliability qi,t and latency τi,t, of each plant to maximize the total network efficiency.” Section IV, Page 15-16 recites,” To apply DRL to the problem in (7a)-(7b), it suffices to map the given problem to DRL framework described above. Given that the control systems operate independently of one another, we can assume each plant operates its own independent RL problem. Indeed, much of this mapping is immediate. For plant i, the action at is given by the network performance parameters qi,t and τi,t. Likewise, the state st is given by the plant state xi,t with observation ŝt given by yi,t. The state transition probability can then be derived from the dynamical equations and associated parameters” i =1, 2 can be for first and second device at time t and xi,t is the respective state at time t)
Regarding Claim 4, Mark and Szigeti teaches the limitations of Claim 1.
Mark further teaches,
The device of claim 1, wherein the machine learning comprises a first stage configured to learn a first ideal control policy for the first device using network conditions with no packet loss or delay, and to learn a second ideal control policy for the second device using network conditions with no packet loss or delay. -Page 416, Section V ( recites, “we train a network performance adaptation policy in two cases: (i) Reliability adaptation with f ixed delay τi,k = 1 and (ii) Delay adaptation with fixed packet loss probability qi,t = 0. Case (i) is trained using the state of the art Soft Actor-Critic (SAC) algorithm [23] for continuous actions, while case (ii) is trained using the Dueling DQN algorithm [24]..”learning (training) process is done for each individual device i-1,2 for ideal packet loss probability qi,t =0, i=1,2)
Regarding Claim 5, Mark and Szigeti teaches the limitations of Claim 4.
Mark further teaches,
The device of claim 4, wherein the machine learning further comprises a second stage configured to estimate, using reinforcement learning or supervised learning, a first current state of the first device at the first time based on the first state information, network latency, and network reliability, and a second current state of the second device at the first time based on the second state information, the network latency, and the network reliability, wherein the network latency and the network reliability are based on samples of states of the first device and the second device. -Section IV, Page 415; Section V, Page 416 ( recites, “For plant i, the action at is given by the network performance parameters qi,t and τi,t. Likewise, the state st is given by the plant state xi,t with observation ŝt given by yi,t. The state transition probability can then be derived from the dynamical equations and associated parameters” qi,t, τi,t and xi,t are samples of quality, latency and state of device i at time t. )
Regarding Claim 6, Mark and Szigeti teaches the limitations of Claim 5.
Mark further teaches,
The device of claim 5, wherein the machine learning further comprises a third stage configured to minimize, using reinforcement learning, the network resource cost function of the first dynamic QoS while providing the first allocation of resources needed by the first device at the first time and to minimize, using reinforcement learning, the network resource cost function of the second dynamic QoS while providing the second allocation of resources needed by the second device at the first time. -Abstract; Section III/IV, Page 413-416 (Abstract recites, “We formulate a constrained Markov decision process that minimizes a cost of network performance as a means of maximizing resource efficiency, while maintaining control specific constraints such as task completion. We utilize deep reinforcement learning techniques to train a target reliability policy” Section III, Page 414 recites, “framework to dynamically adapt the network performance level targets, i.e. reliability qi,t and latency τi,t, of each plant to maximize the total network efficiency.” Section IV, Page 15-16 recites,” To apply DRL to the problem in (7a)-(7b), it suffices to map the given problem to DRL framework described above. Given that the control systems operate independently of one another, we can assume each plant operates its own independent RL problem. Indeed, much of this mapping is immediate. For plant i, the action at is given by the network performance parameters qi,t and τi,t. Likewise, the state st is given by the plant state xi,t with observation ŝt given by yi,t. The state transition probability can then be derived from the dynamical equations and associated parameters” As described above QoS (reliability parameter qi,t ) is dynamically calculated and resources allocated accordingly over time, so the parameters are updated adaptively over time (1st, 2nd, 3rd,….) and resources allocated to devices accordingly)
Regarding Claim 7, Mark and Szigeti teaches the limitations of Claim 6.
Mark further teaches,
The device of claim 6, wherein the machine learning further comprises fourth stage configured to generate the first dynamic QoS and the second dynamic QoS using reinforcement learning. -Section III/IV, Page 414-416 (recites, “To apply DRL to the problem in (7a)-(7b), it suffices to map the given problem to DRL framework described above. Given that the control systems operate independently of one another, we can assume each plant operates its own independent RL problem. Indeed, much of this mapping is immediate. For plant i, the action at is given by the network performance plant state xi,t with observation ˆst given by yi,t. The state transition probability can then be derived from the dynamical equations and associated parameters” Section III, Page 414 recites, “We thus consider a loss function Ci : [0,1]×R+ → R that measures the cost of achieving a set of targets {q,τ} for plant i.” {q,τ} are relability/quality, and latency which are QoS in this case. Also, the process is adaptive and can happen as many times (stages) as needed. )
Regarding Claim 8, Mark and Szigeti teaches the limitations of Claim 1.
Mark further teaches,
The device of claim 1, further comprising a transceiver configured to transmit and receive wireless signals comprising the first state information and the second state information.-Fig. 1, Page 414 (Fig. 1 shows Edge Processor (transceiver) transmitting (y1,t; y2,t etc.) and receiving (x1,t; x2,t etc.) state information over wireless channel. Wireless control system. Sensing data is processed into state information x1,t at the edge. States of multiple control systems are sent over a shared wireless channel to determine local control inputs ui,t = gi(xi,t))
Regarding Claim 9, Mark and Szigeti teaches the limitations of Claim 8.
Mark further teaches,
The device of claim 8, further comprising an antenna coupled to the transceiver to cause to send the first state information and the second state information. -Section V, Page 416 ( recites, “we simulate a large series of robotic pick and place tasks in a WiFi 6 network with a 20 MHz bandwidth. All robotic systems share the channel and must complete their downlink transmissions from the centralized access point (AP) within a scheduling opportunity duration of 1 millisecond. The scheduling algorithm takes as inputs the set of reliability requirements {qi,t}m i=1 for all plants and uses a combination of probabilistic selective scheduling and rate selection to meet the reliability targets.” As explained in the above case transmission over WiFi 6 network and communication from/to AP to/from the multiple robotic systems (STA). Obviously, the wireless transmission happens from antenna attached to AP and STA.)
Claim 10 is another form of apparatus claim of claim 1 with computer-readable medium. The Applicant’s attention is drawn towards claim 1 above which is rejected. Claim 10 is rejected under the same rational as Claim 1. It will be apparent to those skilled in the art that other processor and memory types, including various computer-readable media, may be used to store and execute program instructions pertaining to the techniques described herein.
Claim 11 is essentially same as claim 2 except, claim 11 is dependent claim of independent claim 10, while claim 2 is dependent claim of independent claim 1. Applicant’s attention is drawn towards claim 2 above which is rejected. Claim 11 is rejected under the same rational as claim 2.
Claim 12 is essentially same as claim 3. Applicant’s attention is drawn towards claim 3 above which is rejected. Claim 12 is rejected under the same rational as claim 3.
Claim 13 is essentially same as claim 4. Applicant’s attention is drawn towards claim 4 above which is rejected. Claim 13 is rejected under the same rational as claim 4.
Claim 14 is essentially same as claim 5. Applicant’s attention is drawn towards claim 5 above which is rejected. Claim 14 is rejected under the same rational as claim 5.
Claim 15 is essentially same as claim 6. Applicant’s attention is drawn towards claim 6 above which is rejected. Claim 15 is rejected under the same rational as claim 6.
Claim 16 is essentially same as claim 7. Applicant’s attention is drawn towards claim 7 above which is rejected. Claim 16 is rejected under the same rational as claim 7.
Claim 17 is the method claim corresponding to the apparatus claim 1. The Applicant’s attention is directed towards claim 1 above which is rejected. Claim 17 is rejected under the same rational as claim 1.
Claim 18 is the method claim corresponding to the apparatus claim 2. The Applicant’s attention is directed towards claim 2 above which is rejected. Claim 18 is rejected under the same rational as claim 2.
Claim 19 is the method claim corresponding to the apparatus claim 3. The Applicant’s attention is directed towards claim 3 above which is rejected. Claim 18 is rejected under the same rational as claim 3.
Regarding Claim 20, Mark and Szigeti teaches the limitations of Claim 17.
Mark further teaches,
The method of claim 17, wherein the machine learning comprises four stages and uses reinforcement learning. -Section IV, Page 415 (recites, ” Deep reinforcement learning (DRL) is comprised of a set of numerical algorithms that solve MDPs by means of sampling in place of modeling…..In this section, we describe how the co-design problem in (7a)-(7b) can be mapped to a standard DRL problem formulation and discuss a particular set of model-free learning algorithms that we use to train the weights of the communication policies in (5) and (6). An RL algorithm is designed to find solutions to any Markov Decision Process (MDP), which can be defined by the tuple (S,A,Π,r), where the state space S is continuous and the action space A can be discrete or continuous.” It is easily understood to an ordinary person with the skill in the art that number of stages (four) depends on the design objective.)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AHMED SAIFUDDIN whose telephone number is (703)756-4581. The examiner can normally be reached Monday-Friday 8:30am-6:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KHALED M KASSIM can be reached on 571-270-3770. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AHMED SAIFUDDIN/Examiner, Art Unit 2475
/KHALED M KASSIM/supervisory patent examiner, Art Unit 2475