Last updated: April 19, 2026
Application No. 18/701,743
Managing Energy in a Network

Final Rejection §102§103
Filed
Apr 16, 2024
Examiner
SUN, PINPING
Art Unit
2872
Tech Center
2800 — Semiconductors & Electrical Systems
Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
OA Round
2 (Final)
Interview Optional

— +38.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 456 resolved cases, 2023–2026
Examiner Intelligence

SUN, PINPING View full profile →
Grants 75% — above average
Career Allow Rate
341 granted / 456 resolved
+6.8% vs TC avg
Strong +38% interview lift
Without
With
+38.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
20 currently pending
Career history
476
Total Applications
across all art units
Statute-Specific Performance

§101
0.6%
-39.4% vs TC avg
§103
53.5%
+13.5% vs TC avg
§102
20.3%
-19.7% vs TC avg
§112
19.0%
-21.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 456 resolved cases
Office Action

§102 §103
Notice of Pre-AIA  or AIA  Status
1.The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Argument
 2. The 101 rejections in non-final office action has been withdrawn because the amendment of claim 20 and 35.
Regard to claim 20 and claim 35, Applicant argues that the prior art Piovesan does not teach each action in the set of feasible actions ….; and includes at least one of the following: charging one or more energy storages in the plurality of energy storages and corresponding one or more charging rates, discharging one or more energy storages in the plurality of energy storages and corresponding one or more discharging rates, and
adjusting a configuration of one or more energy sources in the plurality of energy storages because based on equation (6) of Piovesan, only candidate for action a (t) are switching each Small base Station (SBS) on or off. And Piovesan’s RL agent gets awards based on “load control” of turning SBS on or off, not based on actions on the batteries at the respective SBS sites or sharing the exceeding energy. 
The applicant further argues that Piovesan only teaches some outcomes of the “Bound policy” learned and used by Piovesan’s RL agent include increased SBS traffic and energy consumption and less SBS sharing to MBS, but those outcomes are not candidate actions (see page 13-17 of Applicant’s response). 
This is not found persuasive. Regarding the argument that Piovesan’s RL agent gets awards based on “load control” of turning SBS on or off, not based on actions on the batteries at the respective SBS sites or sharing the exceeding energy, the examiner disagrees because when SBS turned on/off, those actions includes at least one of the following as described in claim 19 or claim 36, charging one or more energy storages in the plurality of energy storages and corresponding one or more charging rates, discharging one or more energy storages in the plurality of energy storages and corresponding one or more discharging rates, and
adjusting a configuration of one or more energy sources in the plurality of energy storages.  Piovesan teaches SBS including the batteries which serves as energy storages (page 514, para under III. Reference scenario). When the SBS is turned on/off, the batteries in SBS are charged by the solar power, and the cost of solar power is 1.17$/W. which teaches “charging one or more energy storages at corresponding one or more charging rates.” Piovesan also teaches when the SBS is switch on, the power in the batteries of the SBS can be discharged to the MBS( page 514, para under III. Reference scenario), and batterie is discharged at 131 $/kWh at page 523, para under H. Energy saving and Cost analysis), which teaches each action ( SBS switch on action) includes discharging one or more energy storages in the plurality of energy storages and corresponding one or more discharging rates.  In addition, Piovesan teaches when the SBS is switched on, batteries in one SBS can become energy source for other SBS based on energy sharing policy ( page 513, first para and Renewable Energy Powered with Energy Sharing, which teaches “adjusting a configuration of one or more energy sources in the plurality of energy storages.”  
Regarding to the argument that Piovesan only teaches some outcomes of the “Bound policy” learned and used by Piovesan’s RL agent include increased SBS traffic and energy consumption and less SBS sharing to MBS, but those outcomes are not candidate actions`, and the only candidate actions at each time interval are turning each of the SBS on or off. 
Examiner disagrees. There is no limitation in claim 20 and 35 to exclude the actions that accompanies with SBS turning on or off.
With regard to claim 24, Applicant argues that dependent claim 24 specifies a duration of an episode associated with the first reinforcement learning system is determined by “a hyperparameter optimization technique” such as grid search or Bayes search. The prior art Pino merely selecting a season of interest based on well-known climate data is not a hyperparameter optimization technique.
Applicant’s arguments with respect to claim(s) 24 have been considered but are moot (the scope of claim 24 has been changed because the amendment of claim 20) because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

3. Claim(s) 20,23, 27, 29-30, 32-35, 39 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Piovesan ( Joint Load Control and Energy Sharing for Renewable Powered Small Base Stations: A Machine Learning Approach, IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, VOL. 5, NO. 1, MARCH 2021)
With regard to claim 20, Piovesan teaches a computer-implemented method for managing a plurality of energy storages (batteries, abstract) at a plurality of sites ( base station, abstract) in a network ( micro-grid, abstract), the method comprising:
generating a first simulated environment of the network ( IV system model, page 514) based on a first dataset including power consumption data of at least a subset of the plurality of energy storages over a predetermined amount of time (page power consumption model, see equation 5, baseline power consumption of the BS at zero load at a time t)  ; and
training a first reinforcement learning system ( page 517, B. reinforcement learning) by performing the following operations iteratively until a termination condition is met (page 517, para under B. reinforcement learning, the optimal behavior of the system that maximize a cumulative reward as termination condition).:
selecting an action from a set of feasible actions, wherein each action in the set of feasible actions is bounded by a set of constraints ( different actions taken by the agent to maximize the cumulative reward, and best action is taken  at each decision cycle, page 517, para under B. reinforcement learning, and action is taken according to                         
                            ∈
                        
                    -greedy policy, this policy can be a set of contains page 518, para under the equation 13) and includes at least one of the following:
charging one or more energy storages in the plurality of energy storages and corresponding one or more charging rates (SBS including the batteries which serves as energy storages ( page 514, para under III. Reference scenario). When the SBS is turned on/off, the batteries in SBS are charged by the solar power, and the cost of solar power is 1.17$/W at page 523, para under H. Energy saving and Cost analysis)), 
discharging one or more energy storages in the plurality of energy storages and corresponding one or more discharging rates( when the SBS is switch on, the power in the batteries of the SBS can be discharged to the MBS( page 514, para under III. Reference scenario), and batterie is discharged at 131 $/kWh at page 523, para under H. Energy saving and Cost analysis) and
adjusting a configuration of one or more energy sources in the plurality of energy storages (when the SBS is switched on, batteries in one SBS can become energy source for other SBS based on energy sharing policy ( page 513, first para and Renewable Energy Powered with Energy Sharing).  
calculating a reward of the selected action based on the generated first simulated environment of the network ( see equation 11 at page 517 to calculate reward); and
training the first reinforcement learning system to maximize reward for a given state of the network , based on the calculated reward for the selected action ( page 517, para under B. reinforcement learning, By trying different actions, the agent learns the optimal behaviour of the system that maximizes a cumulative reward)
With regard to claim 23, Piovesan teaches all the limitations of claim 20 and further teaches 
the first reinforcement learning system is an episodic reinforcement learning system (page 517 B, Reinforcement learning:  see page 518, Algorithm 4 table, episodes),
each episode associated with the first reinforcement learning system is divided into a plurality of decision time windows (see page 518, Algorithm 4 table, see decision time windows t=1, …,T) , and each iteration in the training of the first reinforcement learning system corresponds to one of the decision time windows ( page 517 B, Reinforcement learning: The objective of RL is to learn how to map the experienced situation (i.e., the state of the system) into the best action to take at every decision cycle t. also see page 518, table Algorithm 4, line 5-12, shows that 1 to T decision time windows, each has a iteration).
With regard to claim 27, Piovesan teaches all the limitations of claim 20  and further teaches wherein the action selected in a current iteration is different from the action selected in a previous iteration ( try different actions at every decision cycle, page 517, B reinforcement learning, and equation 11 shows the action a(t) at time step t)
With regard to claim 29, Piovesan teaches all the limitations of claim 20  and further teaches the reward is inversely proportional of a total cost of input power ( see page 517 para under B Reinforcement Learning is used to maximizes a cumulative reward, see page 523 and 524 H, Energy saving and Cost Analysis: the implementation of DQL algorithm reduce the grid energy consumption and reduce the costs of the energy,  those two parts shows that the purpose of the Algorithm is to maximize the reward while reduce the input energy cost, therefore the reward is inversely proportional of a total cost of input power).
With regard to claim 30, Piovesan teaches all the limitations of claim 29 and further teaches the total cost corresponds to at least one of a monetary cost ( see Page 524 Table IV cost are calculated in $) and a carbon footprint cost.
With regard to claim 32, Piovesan teaches all the limitations of claim 20 and further teaches  comprising using the trained first reinforcement learning system to determine an action for a current state of the network (page 517, the first para B reinforcement learning,  the reinforcement learning determine ( try an action) that maximize a cumulative reward) .
With regard to claim 33, Piovesan teaches all the limitations of claim 32 and further teaches  the trained first reinforcement learning system is deployed at each of the plurality of energy storages ( see equation (1) and equation (2) B(t) the network model used for reinforcement learning includes the energy stored into the each battery).
With regard to claim 34, Piovesan teaches all the limitations of claim 32 and further teaches  
monitoring at least one of ( this means only need key performance indicator or average reward value) a key performance indicator value of the network ( e.g., loss function, equation 13, page 518) and an average reward value achieved by the trained first reinforcement learning system; and
initiating retraining of the first reinforcement learning system (During the training of the ANN, stochastic gradient decent is used to minimise a sequence of loss functions that changes at every training iteration I and yi is the target at iteration, page 518, para before and after equation 13),  if the at least one of the key performance indicator value of the network and the average reward value achieved by the trained first reinforcement learning system does not satisfy a corresponding predetermined threshold ( yi, target of iteration, page 518, para after the equation 13)or a corresponding predetermined range.
With regard to claim 35, Piovesan teaches a system configured to manage a plurality of energy storages(batteries, abstract)  at a plurality of sites ( base station, abstract) in a network ( micro-grid, abstract), 
processing circuitry ( page 519, para after the Table I, simulation run on machine with Intel Core i5-6300 CPU), and
memory (page 519, para after the Table I, RAM) operably coupled to the processing circuitry ( page 519, para after the Table I, simulation run on machine with Intel Core i5-6300 CPU) and storing computer readable instructions (learning algorithm, page 518 last para)   that, when executed by the processing circuitry( page 519, para after the Table I, simulation run on machine with Intel Core i5-6300 CPU), cause the system to:
generate a first simulated environment of the network ( IV system model, page 514) based on a first dataset including power consumption data of at least a subset of the plurality of energy storages over a predetermined amount of time (page power consumption model, see equation 5, baseline power consumption of the BS at zero load at a time t)  ; and
train a first reinforcement learning system ( page 517, B. reinforcement learning) by performing the following operations iteratively until a termination condition is met (page 517, para under B. reinforcement learning, the optimal behavior of the system that maximize a cumulative reward as termination condition).:
select an action from a set of feasible actions, wherein each action in the set of feasible actions is bounded by a set of constraints ( different actions taken by the agent to maximize the cumulative reward, and best action is taken  at each decision cycle, page 517, para under B. reinforcement learning, and action is taken according to                         
                            ∈
                        
                    -greedy policy, this policy can be a set of contains page 518, para under the equation 13) and includes at least one of the following:
charging one or more energy storages in the plurality of energy storages and corresponding one or more charging rates (SBS including the batteries which serves as energy storages ( page 514, para under III. Reference scenario). When the SBS is turned on/off, the batteries in SBS are charged by the solar power, and the cost of solar power is 1.17$/W at page 523, para under H. Energy saving and Cost analysis)), 
discharging one or more energy storages in the plurality of energy storages and corresponding one or more discharging rates( when the SBS is switch on, the power in the batteries of the SBS can be discharged to the MBS( page 514, para under III. Reference scenario), and batterie is discharged at 131 $/kWh at page 523, para under H. Energy saving and Cost analysis) and
adjusting a configuration of one or more energy sources in the plurality of energy storages (when the SBS is switched on, batteries in one SBS can become energy source for other SBS based on energy sharing policy ( page 513, first para and Renewable Energy Powered with Energy Sharing).  
calculate a reward of the selected action based on the generated first simulated environment of the network ( see equation 11 at page 517 to calculate reward); and
train the first reinforcement learning system to maximize reward for a given state of the network , based on the calculated reward for the selected action ( page 517, para under B. reinforcement learning, By trying different actions, the agent learns the optimal behaviour of the system that maximizes a cumulative reward)
With regard to claim 36, Piovesan teaches all the limitations of claim 35, and further teaches
each action in the set of feasible actions includes at least one of the following:
charging one or more energy storages in the plurality of energy storages and corresponding one or more charging rates ( manage energy inflow and spending, sharing the exceeding energy at some sites, page 512, last para, page 513 first para, and the charging has a charging rates, also see page 523 last para, grid energy has a cost of 0.21 $/KWH, which is the charging cost)
discharging one or more energy storages in the plurality of energy storages and corresponding one or more discharging rates, and
adjusting a configuration of one or more energy sources in the plurality of energy storages (sharing the exceeding energy that may be available at some BS sites within the micro-grid. Page 513 first para ).
With regard to claim 39, Piovesan teaches all the limitations of claim 35, and further teaches
execution of the instructions by the processing circuitry further configures the system to use the trained first reinforcement learning system to determine an action for a current state of the network. (page 517, the first para B reinforcement learning, the reinforcement learning determine ( try an action) that maximize a cumulative reward) .

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

4. Claim 22 and 37 is rejected under 35 U.S.C. 103 as being unpatentable over Piovesan ( Joint Load Control and Energy Sharing for Renewable Powered Small Base Stations: A Machine Learning Approach, IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, VOL. 5, NO. 1, MARCH 2021) in view of  Zhang (US20200124429A1) 
With regard to claim 22, Piovesan teaches all the limitations of claim 20, and further teaches but not wherein calculating a reward of the selected action comprises:
acquiring initial observations of an initial state of the first simulated environment prior to the selected action being executed , wherein the initial observations comprise at least one of the following: respective energy levels of the plurality of energy storages, respective current power outputs of the plurality of energy storages, respective current charging costs of the plurality of energy storages, and respective battery types of the plurality of energy storages;
executing the selected action in the first simulated environment;
acquiring updated observations of an updated state of the first simulated environment subsequent to the selected action being executed; and
calculating the reward of the selected action based on the initial observations and the updated observations.
However, Zhang teaches wherein calculating a reward of the selected action comprises:
acquiring initial observations of an initial state of the first simulated environment prior to the selected action being executed ( cost before applying action, [0093]) , wherein the initial observations comprise at least one of the following: respective energy levels of the plurality of energy storages, respective current power outputs of the plurality of energy storages, respective current charging costs of the plurality of energy storages ( cost[0093] and [0048] The expense may include power expense (such as electricity expense, other type of energy expenses, or any combination thereof), , and respective battery types of the plurality of energy storages;
executing the selected action in the first simulated environment;
acquiring updated observations of an updated state of the first simulated environment subsequent to the selected action being executed ( cost after applying the one or more action, [0093]; and
calculating the reward of the selected action based on the initial observations and the updated observations( the reward is calculated based on the cost changes after applying one or more action[0093]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the  Piovesan, to acquire initial observations of an initial state of the first simulated environment prior to the selected action being executed , wherein the initial observations comprise at least one of the following: respective energy levels of the plurality of energy storages, respective current power outputs of the plurality of energy storages, respective current charging costs of the plurality of energy storages, and respective battery types of the plurality of energy storages; execute the selected action in the first simulated environment; acquire updated observations of an updated state of the first simulated environment subsequent to the selected action being executed; and
calculate the reward of the selected action based on the initial observations and the updated observations , as taught by Zhang, in order to determine optimum solutions with a minimum cost [0092].
With regard to claim 37, Piovesan teaches all the limitations of claim 35, and further teaches but not wherein calculating a reward of the selected action comprises:
acquiring initial observations of an initial state of the first simulated environment prior to the selected action being executed , wherein the initial observations comprise at least one of the following: respective energy levels of the plurality of energy storages, respective current power outputs of the plurality of energy storages, respective current charging costs of the plurality of energy storages, and respective battery types of the plurality of energy storages;
executing the selected action in the first simulated environment;
acquiring updated observations of an updated state of the first simulated environment subsequent to the selected action being executed; and
calculating the reward of the selected action based on the initial observations and the updated observations.
However, Zhang teaches wherein calculating a reward of the selected action comprises:
acquiring initial observations of an initial state of the first simulated environment prior to the selected action being executed ( cost before applying action, [0093]) , wherein the initial observations comprise at least one of the following: respective energy levels of the plurality of energy storages, respective current power outputs of the plurality of energy storages, respective current charging costs of the plurality of energy storages ( cost[0093] and [0048] The expense may include power expense (such as electricity expense, other type of energy expenses, or any combination thereof), , and respective battery types of the plurality of energy storages;
executing the selected action in the first simulated environment;
acquiring updated observations of an updated state of the first simulated environment subsequent to the selected action being executed ( cost after applying the one or more action, [0093]; and
calculating the reward of the selected action based on the initial observations and the updated observations( the reward is calculated based on the cost changes after applying one or more action[0093]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the  Piovesan, to acquire initial observations of an initial state of the first simulated environment prior to the selected action being executed , wherein the initial observations comprise at least one of the following: respective energy levels of the plurality of energy storages, respective current power outputs of the plurality of energy storages, respective current charging costs of the plurality of energy storages, and respective battery types of the plurality of energy storages; execute the selected action in the first simulated environment; acquire updated observations of an updated state of the first simulated environment subsequent to the selected action being executed; and
calculate the reward of the selected action based on the initial observations and the updated observations, as taught by Zhang, in order to determine optimum solutions with a minimum cost [0092].

5. Claims 24 and 25 is rejected under 35 U.S.C. 103 as being unpatentable over Piovesan ( Joint Load Control and Energy Sharing for Renewable Powered Small Base Stations: A Machine Learning Approach, IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, VOL. 5, NO. 1, MARCH 2021) in view of Lindauer (BOAH: A Tool Suite for Multi-Fidelity Bayesian Optimization & Analysis of Hyperparameters, 2019 Aug 16)

With regard to claim 24, Piovesan teaches all the limitations of claim 23, but not  at least one of the following is determined by a hyperparameter optimization technique: a duration of an episode associated with the first reinforcement learning system, and a duration of the decision time window corresponding to each iteration.
However, Lindauer teaches at least one of the following is determined by a hyperparameter optimization technique, a duration of an episode associated with the first reinforcement learning system (hyperparameters of RL algorithms with episode length being the
Budgets, page 3, 3.2 BOHB: Bayesian Optimization with Hyperband), and a duration of the decision time window corresponding to each iteration 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the  Piovesan, to configure at least one of the following is determined by a hyperparameter optimization technique: a duration of an episode associated with the first reinforcement learning system, and a duration of the decision time window corresponding to each iteration, as taught by Lindauer, in order to allowing users to not only optimize their hyperparameters much more effectively, but to also automatically analyze the optimization process and the importance of the various hyperparameters.( page 1 , last para) and choose the important parameters that strongly affect the behavior of the control agent.
With regard to claim 25, the combination of Piovesan and Lindauer teaches all the limitations of claim 24, Lindauer also teaches the hyperparameter optimization technique comprises at least one of a grid search and a Bayes search (Bayesian optimization, see title and abstract).

6. Claim 26 is rejected under 35 U.S.C. 103 as being unpatentable over Piovesan ( Joint Load Control and Energy Sharing for Renewable Powered Small Base Stations: A Machine Learning Approach, IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, VOL. 5, NO. 1, MARCH 2021) in view of Mbuwir( Battery Energy Management in a Microgrid Using Batch Reinforcement Learning, Energies: November 201710(11):1846)
With regard to claim 26, Piovesan teaches all the limitations of claim 23, but not the set of constraints includes one or more of the following:
a first constraint that, for each of the plurality of sites, a value of power input from one or more energy storages corresponding to the site is identical to a value of power output at the respective site;
a second constraint that, for each of the plurality of energy storages, a value of discharge during a time window should not exceed a value of an energy level at a beginning of the time window plus a value of charge limit for the time window;
a third constraint that, for each of the plurality of energy storages, a value of a current energy level is identical to the following: a value of a previous energy level, plus a value of charge during a time window, minus an amount of discharge during the time window;
a fourth constraint that, for each of the plurality of energy storages, a value of energy level at any time window should not exceed a capacity of the energy storage; and
a fifth constraint that, when charging an energy storage from a plurality of energy storages, the resultant energy level caused by the plurality of energy storages is below a predetermined threshold.
However, Mbuwir teaches the set of constraints includes one or more of the following:
a first constraint that, for each of the plurality of sites, a value of power input from one or more energy storages corresponding to the site is identical to a value of power output at the respective site;
a second constraint that, for each of the plurality of energy storages, a value of discharge during a time window should not exceed a value of an energy level at a beginning of the time window plus a value of charge limit for the time window;
a third constraint that, for each of the plurality of energy storages, a value of a current energy level is identical to the following: a value of a previous energy level, plus a value of charge during a time window, minus an amount of discharge during the time window;
a fourth constraint that, for each of the plurality of energy storages, a value of energy level at any time window should not exceed a capacity of the energy storage ( page 4, equation (2) capacity constraint, the battery cannot be charged above Emax) ; and
a fifth constraint that, when charging an energy storage from a plurality of energy storages, the resultant energy level caused by the plurality of energy storages is below a predetermined threshold.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the claim 23, to configure the set of constrains as a fourth constraint that, for each of the plurality of energy storages, a value of energy level at any time window should not exceed a capacity of the energy storage , as taught by Mbuwir, in order to protect the batteries and avoid the damage of batteries.

7. Claims 28 and 38 are rejected under 35 U.S.C. 103 as being unpatentable over Piovesan ( Joint Load Control and Energy Sharing for Renewable Powered Small Base Stations: A Machine Learning Approach, IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, VOL. 5, NO. 1, MARCH 2021) in view of Sui (A Multi-Agent Reinforcement Learning Framework for Lithium-ion Battery Scheduling Problems,  energies, published 17 April 20)
With regard to claim 28,  Piovesan  teaches all the limitations of claim 20, but does not teach
the first dataset corresponds to a first modality including at least one of a first type of battery technology and a first type of network condition; and
the method further comprises:
generating a second simulated environment of the network based on a second dataset including power consumption data of at least a subset of the plurality of energy storages over a predetermined amount of time, wherein the second dataset corresponds to a second modality including at least one of a second type of battery technology and a second type of network condition; and
training a second reinforcement learning system by performing the following operations iteratively until a termination condition is met:
selecting an action from the set of feasible actions;
calculating a reward of the selected action based on the generated second simulated environment of the network; and
training the second reinforcement learning system to maximize reward for a given state of the network, based on the calculated reward for the selected action.
However, Sui teaches about the first dataset corresponds to a first modality including at least one of a first type of battery technology and a first type of network condition ( RL agents are trained using the simulated (lithium-ion) battery data, see page 11 , conclusion and future work).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify claim 20, to configure the first dataset corresponds to a first modality including at least one of a first type of battery technology and a first type of network condition, as taught by Sui, in order to solve the specific problems related to this type of battery technology. In this case Sui teaches about using multi-agent reinforcement learning frameworks to solve lithium-ion battery scheduling problems ( RL agents are trained using the simulated (lithium-ion) battery data, see page 11 , conclusion and future work).
Furthermore it is obvious to  duplicate the operation in the first simulation environment of Piovesan described in claim 20 and the first dataset of Sui, to generate a second simulated environment of the network based on a second dataset including power consumption data of at least a subset of the plurality of energy storages over a predetermined amount of time, wherein the second dataset corresponds to a second modality including at least one of a second type of battery technology and a second type of network condition; and training a second reinforcement learning system by performing the following operations iteratively until a termination condition is met: selecting an action from the set of feasible actions; calculating a reward of the selected action based on the generated second simulated environment of the network; and training the second reinforcement learning system to maximize reward for a given state of the network, based on the calculated reward for the selected action., in order to generate an optimized reward for different simulation environment with different energy storage system. Further, absent any criticality, the duplicate of the first reinforcement learning system’s operation to be the second reinforcement learning system is only obvious modification of Wen since it has been held that mere duplication of the essential working parts of a device involves only routine skill in the art. St Regis Paper Co. v. Bemis Co., 193 USPQ 8. In this case, the more reinforcement learning system, the more accurate reward for different subset network or energy storage structure it provides but the functionality of the system does not change.
With regard to claim 38, Piovesan teaches all the limitations of claim 35, but not
the first dataset corresponds to a first modality including at least one of a first type of battery technology and a first type of network condition; and
execution of the instructions by the processing circuitry further configures the system to:
generate a second simulated environment of the network based on a second dataset including power consumption data of at least a subset of the plurality of energy storages over a predetermined amount of time, wherein the second dataset corresponds to a second modality including at least one of a second type of battery technology and a second type of network condition; and
train a second reinforcement learning system by performing the following operations iteratively until a termination condition is met:
selecting an action from the set of feasible actions;
calculate a reward of the selected action based on the generated second simulated environment of the network; and
train the second reinforcement learning system to maximize reward for a given state of the network, based on the calculated reward for the selected action
However, Sui teaches about the first dataset corresponds to a first modality including at least one of a first type of battery technology and a first type of network condition ( RL agents are trained using the simulated (lithium-ion) battery data, see page 11 , conclusion and future work).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify claim 20, to configure the first dataset corresponds to a first modality including at least one of a first type of battery technology and a first type of network condition, as taught by Sui, in order to solve the specific problems related to this type of battery technology. In this case Sui teaches about using multi-agent reinforcement learning frameworks to solve lithium-ion battery scheduling problems ( RL agents are trained using the simulated (lithium-ion) battery data, see page 11 , conclusion and future work).
Furthermore it is obvious to  duplicate the operation in the first simulation environment of Piovesan described in claim 35 and the first dataset of Sui, to generate a second simulated environment of the network based on a second dataset including power consumption data of at least a subset of the plurality of energy storages over a predetermined amount of time, wherein the second dataset corresponds to a second modality including at least one of a second type of battery technology and a second type of network condition; and train a second reinforcement learning system by performing the following operations iteratively until a termination condition is met: select an action from the set of feasible actions; calculate a reward of the selected action based on the generated second simulated environment of the network; and train the second reinforcement learning system to maximize reward for a given state of the network, based on the calculated reward for the selected action., in order to generate an optimized reward for different simulation environment with different energy storage system. Further, absent any criticality, the duplicate of the first reinforcement learning system’s operation to be the second reinforcement learning system is only obvious modification of Wen since it has been held that mere duplication of the essential working parts of a device involves only routine skill in the art. St Regis Paper Co. v. Bemis Co., 193 USPQ 8. In this case, the more reinforcement learning system, the more accurate reward for different subset network or energy storage structure it provides but the functionality of the system does not change.

8. Claim 31 is rejected under 35 U.S.C. 103 as being unpatentable over Piovesan ( Joint Load Control and Energy Sharing for Renewable Powered Small Base Stations: A Machine Learning Approach, IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, VOL. 5, NO. 1, MARCH 2021) in view of HIWADA (JP 2020153571 A)
With regard to claim 31, Piovesan teaches all the limitations of claim 20 but not wherein the termination condition is one of the following:
the reward associated with the latest selected action being lower or equal in value to the reward associated with the selected action in the previous iteration, and
the value of the reward associated with the latest selected action exceeding a predetermined threshold .
However, HIWADA teaches termination condition is one of the following:
the reward associated with the latest selected action being lower or equal in value to the reward associated with the selected action in the previous iteration, and
the value of the reward associated with the latest selected action exceeding a predetermined threshold . )[0066] ), the evaluation unit 303 determines whether or not the calculated reward is equal to or greater than a predetermined threshold value. If it is determined that the calculated reward is less than the predetermined threshold, the process proceeds to step 5, and if it is determined that the calculated reward is equal to or greater than the predetermined threshold, the reinforcement learning process is terminated.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the claim 20, to configure termination condition to be one of the following: the reward associated with the latest selected action being lower or equal in value to the reward associated with the selected action in the previous iteration, and
the value of the reward associated with the latest selected action exceeding a predetermined threshold, as taught by HIWADA , in order to optimize the results with an efficient time constraint to utilize the resource efficiently.

Conclusion
9. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Jomaa (Hyp-RL: Hyperparameter Optimization by Reinforcement Learning ,27 Jun 2019) teaches about hyperparameters tuning
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PINPING SUN whose telephone number is (571)270-1284. The examiner can normally be reached 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PINPING SUN/            Supervisory Patent Examiner, Art Unit 2872
Read full office action
Prosecution Timeline

Apr 16, 2024
Application Filed
May 08, 2025
Non-Final Rejection — §102, §103
Aug 06, 2025
Response Filed
Nov 26, 2025
Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/755,400
Patent 12584848
DETECTION OPTICAL SYSTEM, DETECTION DEVICE, FLOW CYTOMETER, AND IMAGING CYTOMETER
2y 5m to grant Granted Mar 24, 2026
17/813,251
Patent 12541088
OBJECTIVE OPTICAL SYSTEM FOR ENDOSCOPE AND ENDOSCOPE
2y 5m to grant Granted Feb 03, 2026
18/669,852
Patent 12506362
SELF-POWERED AUTONOMOUS RECONFIGURABLE INTELLIGENT SURFACES USING WIDE RADIO FREQUENCY POWER RANGE HARVESTING CIRCUIT
2y 5m to grant Granted Dec 23, 2025
17/652,469
Patent 11693521
Detecting Twist Input with an Interactive Cord
2y 5m to grant Granted Jul 04, 2023
17/719,472
Patent 11689089
WIRELESS POWER TRANSFER VIA ELECTRODYNAMIC COUPLING
2y 5m to grant Granted Jun 27, 2023
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
75%
Grant Probability
99%
With Interview (+38.5%)
3y 1m
Median Time to Grant
Moderate
PTA Risk
Based on 456 resolved cases by this examiner. Grant probability derived from career allow rate.