Last updated: May 29, 2026

Application No. 18/698,168

ENERGY-AWARE ROUTING BASED ON REINFORCEMENT LEARNING

Non-Final OA §103§112

Filed

Apr 03, 2024

Priority

Aug 03, 2021 — nonprovisional of PCTSE2021050769

Examiner

ABU ROUMI, MAHRAN Y

Art Unit

2455

Tech Center

2400 — Computer Networks

Assignee

Telefonaktiebolaget Lm Ericsson (Publ)

OA Round

1 (Non-Final)

Interview Optional

— +34.3% interview lift. Examiner has a relatively high allowance rate (72%); +34.3% interview lift. A written response may suffice.

Based on 598 resolved cases, 2023–2026

Examiner Intelligence

ABU ROUMI, MAHRAN Y View full profile →

Grants 72% — above average

Career Allowance Rate

433 granted / 598 resolved

+14.4% vs TC avg

Strong +34% interview lift

Without

With

+34.3%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

23 currently pending

Career history

622

Total Applications

across all art units

Statute-Specific Performance

§101

0.8%

-39.2% vs TC avg

§103

92.6%

+52.6% vs TC avg

§102

4.8%

-35.2% vs TC avg

§112

1.4%

-38.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 598 resolved cases

Office Action

§103 §112

DETAILED ACTION
This communication is in responsive to Application 18/698168 filed on 4/3/2024. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims:
		Claims 1-15 and 17-18 are presented for examination.

Information Disclosure Statement
3.	The Information Disclosure Statement (IDS) submitted on 4/3/2024 complies with 37 CFR 1.97 provisions.  Accordingly, the Examiner has considered the IDS.

Examiner’s Note
Examiner called the undersigned, Mr. Timothy Wall, on 4/13/2026 to move prosecution forward. Examiner has not yet heard back. 

Claims NOT Rejected Over the Prior Art
5.	Prior art rejections have not been provided for claims 4-5 as the claims include a combination of subject matter not disclosed by the prior art of record.  However, these claims stand rejected at least over 35 U.S.C. 112, see above, which must be overcome before the claims can be designated as allowable.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-15 and 17-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1 and 17-18 include limitations that lack antecedent basis. For example, the last limitation in each claim recite “…using the one or more stored experience sets in the buffer.” The limitation “the one or more stored experience sets,” and “the buffer” lack antecedent basis. Claims 2-15 are also rejected for depending on claim 1. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-3, 6-15 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over WO 2019/114959 A1 (hereinafter Ceccarelli) in view of “Resource Allocation in mmWave 5G IAB networks (IDS file 4/3/2024 entry 3, hereinafter Bibo) and further in view of Li et al. (hereinafter Li) US 2021/0111994 A1.

Regarding claim 1, Ceccarelli teaches a method for training a reinforcement learning system for optimising routing for a network including a plurality of Integrated Access and Backhaul (IAB) nodes connected to an IAB donor, the method comprising: 
acquiring observations characterising a current state of the plurality of IAB nodes (p. 4, lines 21-25; a reinforcement learning agent receives an observation from the environment in state S and selects an action to maximize the expected future reward r. Based on the expected future rewards, a value function V for each state can be calculated and an optimal policy p that maximizes the long term value function can be derived. 
Also, see p. 6, lines 17-25; the step of acquiring 102 may comprise measuring the one or more parameters relating to traffic flow between the first group of nodes. For example, measuring one or more parameters relating to traffic flow between one or more pairs of nodes in the first group of nodes. In some embodiments, the step of acquiring 102 may comprise receiving measurements of one or more parameters relating to traffic flow), wherein the observations comprise: routing information for routing packets in the network (see p. 6, lines 17-25; the step of acquiring 102 may comprise measuring the one or more parameters relating to traffic flow between the first group of nodes. For example, measuring one or more parameters relating to traffic flow between one or more pairs of nodes in the first group of nodes. In some embodiments, the step of acquiring 102 may comprise receiving measurements of one or more parameters relating to traffic flow), and traffic information indicative of data traffic performance of each of the plurality of IAB nodes (routing metrics and values of one or more parameters relating to traffic flow, see block 102 of method 100 & see p. 6, lines 17-25); 
and performing the following steps iteratively until a termination condition is met (p. 6, lines 26-33; the values of the one or more parameters may be acquired periodically. For example, in some embodiments, acquiring 102 may comprise periodically collecting performance and/or telemetry metrics from the network in order to detect congestion/failure situations (this means once a failure situation or congestion is detected, the collecting performance is reported in step 102 which implies the end of acquiring those metrics for that period of time. Acquiring periodic measurements also enables the first reinforcement learning agent to adjust the first routing metric (in step 104 as will be described below) based on real-time (or near real-time) information): determining an action to be performed from a predetermined set of actions (p. 4, lines 21-25; a reinforcement learning agent receives an observation from the environment in state S and selects an action to maximize the expected future reward r. Also in p. 7, lines 14-20; the first reinforcement learning agent may be configured to operate a policy optimisation process. As noted above, a policy may comprise a set of learnt rules or actions that the reinforcement learning agent has learnt and can therefore be used to produce a (more) predictable outcome) using a selection policy and based on latest acquired observations (p. 7, lines 14-21; the first reinforcement agent may operate according to one or more principles of a reinforcement learning concept and/or according to a related algorithm for policy optimization. A policy in this sense comprises a set of learnt rules or actions that the reinforcement learning agent has learnt produces a particular outcome. Examples of reinforcement learning concepts include, for example, policy-gradient, REINFORCE, DON (Deep Q Network), TRPO (Trust Region Policy Optimization), A3S and proximal policy optimization (PPO)), wherein the predetermined set of actions include adding an entry to the routing information and removing an entry to the routing information, wherein an entry is indicative of how packets are to be routed with respect to an IAB node of the plurality of IAB nodes (p. 7, lines 21-29; obvious because first reinforcement learning agent dynamically adjusts the first routing metric, based on the values of the one or more parameters so as to alter the traffic flow through the network this includes updating/adding/removing routing information. As will be familiar to the skilled person, the reinforcement learning agent may dynamically adjust the first routing metric periodically (e.g. at regular intervals) or in response to a change in conditions in the traffic flow through the first set of nodes (e.g. in response to detecting traffic congestion between first and second nodes or in response to detection of a possible congestion scenario developing between first and second nodes in the first group of nodes), adjusting here includes adding/removing/updating routing information); 
executing the action by initiating update of the routing information based on the determined action (implied from the above limitation in p. 7, lines 21-29 because first reinforcement learning agent dynamically adjusts the first routing metric based on the change in conditions in the network); 
acquiring observations characterising an updated state of the plurality of IAB nodes subsequent to execution of the action (implied from the above limitation in p. 7, lines 21-29 because first reinforcement learning agent dynamically adjusts routing metric and it is done periodically or in response to a change in conditions);  
determining a reward for the determined action, based on the updated state of the plurality of IAB nodes (p. 10, lines 21-34; the reinforcement learning agent receives a reward in response to a change in state caused by each action performed by the reinforcement learning agent. The skilled person will be familiar with rewards given to reinforcement learning agents. In some embodiments the reward is allocated by a reward function. A reward function may be configured, for example, by a network administrator according to an objective (or goal)); 
and training the reinforcement learning system to maximise reward with respect to an optimisation objective, using the one or more stored experience sets in the buffer (p. 7, lines 14-20; the first reinforcement learning agent may be configured to operate a policy optimisation process. As noted above, a policy may comprise a set of learnt rules or actions that the reinforcement learning agent has learnt and can therefore be used to produce a (more) predictable outcome. Policy optimisation comprises using the principles of reinforcement learning to improve (e.g. optimise) the rules/actions used to adjust the system. The skilled person will be familiar with policy optimisation processes, such as for example, the aforementioned Markov Decision Process).
Ceccarelli does not teach “…IAB nodes… energy information indicative of an energy performance of each of the plurality of IAB nodes… and storing an experience set including the determined action, the observations characterising the state of the plurality of IAB nodes prior to execution of the determined action, the observations characterising the state of the plurality of IAB nodes subsequent to execution of the determined action, and the determined reward;”
	Bibo teaches “…IAB nodes (abstract and Fig. 1. Also see section 5.3.1, see Action Space, last paragraph in p. 10 and first and third paragraphs in p. 11, where each pattern e contains links as sub-actions, wherein a pattern representing a certain routing environment is selected according to a policy and then activated which means that the routing information is activated/added or removed)…and storing an experience set including the determined action, the observations characterising the state of the plurality of IAB nodes prior to execution of the determined action, the observations characterising the state of the plurality of IAB nodes subsequent to execution of the determined action, and the determined reward (Section. 5.3.3 Training Algorithm, see last paragraph in p. 12, “data collection loop focuses on an experience of the different actions…observable case can be selected)”
	It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed limitation to incorporate the teachings of Bibo into the system of Ceccarelli in order to tune IAB network resources (abstract). Utilizing such teachings enable the system to carefully tune a complex setting, including directional transmission, device heterogeneity and intermittent links with different level of availability. 
Ceccarelli in view of Bibo does not expressly teach “…energy information indicative of an energy performance of each of the plurality of IAB nodes…”
Li is analogues art because Li teaches energy efficient traffic and energy aware routing traffic scheduling. 
Li teaches energy information indicative of an energy performance of each of the plurality of IAB nodes (¶0013-¶0020; observing energy efficient scheduling of data flows).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed limitation to incorporate the teachings of Li into the system of Ceccarelli in view of Bibo in order to minimize network energy consumption while ensuring the capability of bearing all network data flows (¶0013). Utilizing such teachings enable the system to dynamically adjust the working state of the virtual sub-topology in the upper layer according to current link utilization. A path also with a minimum number of hops and lowest maximum link utilization can be found in the booted sub-topology, to route the data flow, solving the problem that a “rich-connection” data center network has low energy resource utilization at low load (abstract).

Regarding Claim 2, Ceccarelli in view of Bibo and further in view of Li teaches the method according to claim 1, Ceccarelli further teaches wherein the termination condition is one of: the reward associated with the latest determined action being lower or equal in value to the reward associated with the determined action in the previous iteration, and the value of the reward associated with the latest determined action exceeding a predetermined threshold (p. 10, lines 21-35; obvious because the reward termination is configurable by network administrator. For example, A reward function may be configured, for example, by a network administrator according to an objective (or goal). The reward function may be configured, for example, to allocate rewards so as to optimise one or more key performance indicators of the network. In some embodiments, a reward received by the first reinforcement learning agent with respect to an adjustment of the first routing metric is determined based on a change in distribution of traffic between different nodes in the first group of nodes. For example, where an action or adjustment of the first routing metric (such as an administrative link cost) performed by the first reinforcement learning agent results in a state change (e.g. new traffic flow distribution), the first reinforcement agent may receive a reward based on whether the state change produces a result that is closer or further away from the objective (e.g. goal) of the first reinforcement learning agent).

Regarding Claim 3, Ceccarelli in view of Bibo and further in view of Li teaches the method according to claim 1, Ceccarelli further teaches wherein the optimisation objective comprises at least one of: optimising routing of packets in the plurality of IAB nodes to use high-performing data paths and optimising routing of packets in the plurality of IAB nodes (p. 7, lines 14-20; the first reinforcement learning agent may be configured to operate a policy optimisation process. As noted above, a policy may comprise a set of learnt rules or actions that the reinforcement learning agent has learnt and can therefore be used to produce a (more) predictable outcome. Policy optimisation comprises using the principles of reinforcement learning to improve (e.g. optimise) the rules/actions used to adjust the system. The skilled person will be familiar with policy optimisation processes, such as for example, the aforementioned Markov Decision Process. In some embodiments, the the policy optimisation process of the first reinforcement learning agent may be configured to optimise a first aspect of the traffic flow through the first group of nodes. For example the first reinforcement learning agent may have a first objective (or goal). The policy optimisation may be based on one or more of the criteria above, for example, the optimisation task may comprise, for example, the first reinforcement learning agent being configured to optimise the distribution of traffic through the first group of nodes, move the distribution of traffic towards a predefined distribution, change the distribution to reduce overload on a node, change the distribution of traffic so that a load on a particular link moves towards a predefined threshold load and/or adjust the distribution of traffic so that a performance indicator changes to within a predefined range) to use green energy data paths (this part is interpreted to be intended use).

Regarding Claim 6, Ceccarelli in view of Bibo and further in view of Li teaches the method according to claim 1, Bibo further teaches wherein an action is characterised by an action space, wherein the action space includes an operation and a set of route parameters, and wherein the set of route parameters include a type of packet to route, a destination of the respective route, and an interface to use for the respective route (section 5.3.1, see Action Space).

Regarding Claim 7, Ceccarelli in view of Bibo and further in view of Li teaches the method according to claim 1, Ceccarelli further teaches wherein the routing information comprises a routing table, wherein the routing table comprises a plurality of route entries each associated with a route, and wherein each route is characterised by a IAB node to perform routing according to the route, a rule filter, a route direction, and a next node in the route (same citation as claim 1, this limitation is obvious because using a routing table is implied when the node is configured to acquire values of one or more parameters relating to traffic flow between a first group of nodes in the network. The node is configured to use a first reinforcement learning agent to dynamically adjust a first routing metric used to route traffic through the first group of nodes, based on the values of the one or more parameters, so as to alter the traffic flow through the first group of nodes).
Regarding Claim 8, Ceccarelli in view of Bibo and further in view of Li teaches the method according to claim 1, Li further teaches wherein the energy information comprises an energy index table (see table 1 in ¶0096), wherein the energy index table comprises a historical list of energy entries for each of the plurality of IAB nodes, and wherein each energy index entry includes a timestamp and at least one of an energy efficiency value, a clean energy source percentage, and a carbon emissions value (obvious from table 1. See Bibo for IAB nodes).

Regarding Claim 9, Ceccarelli in view of Bibo and further in view of Li teaches the method according to claim 1, Bibo further teaches wherein the traffic information comprises one of: a list of uplink and downlink throughput over a sampled period for each of the plurality of IAB nodes (see Figs. 1 & 2); and a data probability distribution type and a set of parameters for the data probability distribution type for each of the plurality of IAB nodes, wherein the data probability distribution type and the set of parameters characterise data traffic at a respective IAB node over a predetermined time period.

Regarding Claim 10, Ceccarelli in view of Bibo and further in view of Li teaches the method according to claim 1, Ceccarelli further teaches wherein the observations characterising the updated state of the plurality of IAB nodes comprises only updated information with respect to a previous state (P. 4, lines 21-25; obvious because a reinforcement learning agent receives an observation from the environment in state S and selects an action to maximize the expected future reward r.).
Regarding Claim 11, Ceccarelli in view of Bibo and further in view of Li teaches the method according to claim 1, Bibo further teaches wherein the selection policy is one of an epsilon-greedy policy and a softmax policy (see p. 1; different distributed greedy hop by hop or offline training).

Regarding Claim 12, Ceccarelli in view of Bibo and further in view of Li teaches the method according to claim 1, Bibo further teaches wherein the reinforcement learning system applies a deep Q-network or an actor-critic algorithm (P. 1; The work in [13] pro-emphasis on realistic scenarios and unreliable networks that poses spectrum allocation algorithms based on Double Deepare vulnerable to recurrent and dynamic blockages. We pro-Q-network (DQN) and Actor Critic for IAB networks. [24]).

Regarding Claim 13, Ceccarelli in view of Bibo and further in view of Li teaches the method according to claim 1, Bibo further teaches wherein the method is performed at the IAB donor (see introduction; IAB donor), or at a core network, or at a local cloud.

Regarding Claim 14, Ceccarelli in view of Bibo and further in view of Li teaches the The method according to claim 1, Bibo further teaches further comprising: using the trained reinforcement learning system to determine an action for the IAB node upon a condition trigger (see 5.1; when the agent selections an action based on a condition or state).

Regarding Claim 15, Ceccarelli in view of Bibo and further in view of Li teaches the method according to claim 14, Bibo further teaches wherein the condition trigger is one of: a new IAB node connecting to the network, wherein the new IAB node forms part of the plurality of IAB nodes; one of the plurality of IAB nodes receiving a hardware or software update; and an expiry of a predetermined periodic timer (obvious from section 5.3.2 when new IAB is added, the state of the network is changed and updated).

Claims 17-18 are substantially similar to claim 1, thus the same rationale applies. 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAHRAN ABU ROUMI whose telephone number is (469)295-9170. The examiner can normally be reached Monday-Thursday 6AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emmanuel Moise can be reached at 571-272-3865. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MAHRAN ABU ROUMI
Primary Examiner
Art Unit 2455



/MAHRAN Y ABU ROUMI/Primary Examiner, Art Unit 2455

Read full office action

Prosecution Timeline

Apr 03, 2024

Application Filed

Apr 23, 2026

Non-Final Rejection mailed — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/084,280

Patent 12640997

PROVIDING ACCESS ON-DEMAND TO CELLULAR WIRELESS TELECOMMUNICAITON NETWORK FUNCTIONALITY

3y 5m to grant Granted May 26, 2026

17/796,001

Patent 12634679

METHOD FOR PERFORMING COMMUNICATION USING MULTIPLE USIMS AND DEVICE THEREFOR

3y 9m to grant Granted May 19, 2026

17/848,194

Patent 12621359

METHODS, SYSTEMS, ARTICLES OF MANUFACTURE, AND APPARATUS TO FACILITATE MULTI-PARTICIPANT CONVERSATION

3y 10m to grant Granted May 05, 2026

18/100,514

Patent 12621209

UTILIZATION OF NETWORK FUNCTION (NF) NODE GROUPS FOR COMPUTE OPTIMIZATION AND NF RESILIENCY IN A WIRELESS TELECOMMUNICATION NETWORK

3y 3m to grant Granted May 05, 2026

17/741,733

Patent 12615629

METHOD FOR SPATIAL RESOURCE OF IAB NODE FOR SIMULTANEOUS OPERATION AND APPARATUS USING THE METHOD

3y 11m to grant Granted Apr 28, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

72%

Grant Probability

99%

With Interview (+34.3%)

3y 0m (~10m remaining)

Median Time to Grant

Low

PTA Risk

Based on 598 resolved cases by this examiner. Grant probability derived from career allowance rate.