DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-7 and 9-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
INDEFINITENESS – “FUSION RATIO OF THE OFFLINE CHARGING-DISCHARGING ACTION TO THE ONLINE CHARGING-DISCHARGING ACTION”
Claim 1 recites “acquiring a fusion ratio of the offline charging-discharging action to the online charging-discharging action” and “fusing … according to the fusion ratio.” The claim language is unclear as to what quantity constitutes the “fusion ratio,” and what “of … to …” specifically denotes (e.g., offline: online, online: offline, offline weighting factor, online weighting factor, or another mapping).
The specification describes a fusion ratio “k” and provides an example fusion formula a2 = ak + a1(1−k), where “a” is the online action and “a1” is the offline action (see, e.g., [0077]). Under that disclosed example, “k” appears to be an online weighting factor (and (1−k) an offline weighting factor). However, claim 1 characterizes the ratio as “of the offline … to the online …,” which reasonably reads as an offline: online ratio, not necessarily the disclosed “k.” The scope is therefore unclear as to whether the claimed “fusion ratio” corresponds to “k,” to “(1−k),” to “(1−k):k,” or to another value/structure entirely.
INDEFINITENESS – “COMMUNICATION DELAY AMOUNT” AND “DELAY DEGREE”
Claim 1 recites “acquiring a fusion ratio … according to a communication delay amount and a delay degree.” Claims 6-7 further recite obtaining a “correspondence between any communication delay amount and delay degree and the fusion ratio through pre-training.”
The claim language is indefinite because “communication delay amount” and “delay degree” lack clear, objective boundaries and do not specify how either parameter is defined or measured. The specification discusses delay generally as including processing delays and transmission delay, and separately discusses packet loss (see, e.g., [0045]). However, the meaning of “delay degree” is not established with reasonable certainty (e.g., whether “delay degree” refers to packet loss rate, delay variance/jitter, a categorical label, a normalized delay index, or another metric), nor does the claim specify units/timebase for “delay amount” (e.g., milliseconds end-to-end, one-way delay, round-trip time, or application-layer latency).
Accordingly, one of ordinary skill in the art would not be informed, with reasonable certainty, what values fall within “delay degree,” how to determine them in operation, and how the claimed “correspondence” is keyed/indexed to those values.
INDEFINITENESS – “OFFLINE CHARGING-DISCHARGING ACTION,” “ONLINE CHARGING-DISCHARGING ACTION,” AND “FUSION RESULT”
Claim 1 recites determining “offline charging-discharging action” and “online charging-discharging action,” then “fusing” them and “outputting a fusion result to the energy storage system.”
The claim is indefinite because it is unclear what the claimed “action” is in terms of control type and physical meaning at the energy storage system interface. For example, the specification alternates between describing charging/discharging “actions” as a “charging-discharging threshold” (see, e.g., [0044], [0048]) and discussing current instruction/control signals in the background figures discussion (e.g., IL* current instruction, PWM, etc., [0005]). The claim does not specify whether the “action” is (i) a voltage threshold, (ii) a current command, (iii) a power command, (iv) converter duty cycle(s), (v) an action-value/Q-value selection mapped to a physical command, or (vi) another quantity. As a result, the “fusion result” lacks reasonably certain scope because it depends on what the “action” is.
INDEFINITENESS – CLAIM 2 “USING THE OFFLINE CHARGING-DISCHARGING ACTION AS AN INITIAL VALUE OF A NEURAL NETWORK”
Claim 2 recites “using the offline charging-discharging action as an initial value of a neural network” and “training the neural network using training data.”
This limitation is indefinite because a “charging-discharging action” (as claimed) is not reasonably understood to be an “initial value of a neural network” without further detail, and the claim does not specify what “initial value” means in the context of the neural network (e.g., initial weights, initial biases, an initial Q-table/action-value estimate, initialization of an output layer for a specific action, or seeding a replay buffer). The specification describes, in one place, assigning a maximum value to an action-value function corresponding to the offline action and using that as an initial condition (see, e.g., [0048]), which is conceptually different from using the action itself as a neural network “initial value.” The claim therefore fails to provide reasonable certainty regarding the required structure/operation.
INDEFINITENESS – CLAIMS 4-5 “ACTION INTERVAL” AND “IMPACT … GREATER THAN A THRESHOLD VOLTAGE”
Claim 4 recites “acquiring an action interval of the energy storage system,” wherein the state includes a state of a substation, a state of a train, and a state of an energy storage apparatus “in the action interval.” Claim 5 recites determining whether “impact of the train at different positions on a terminal voltage of the central substation is greater than a threshold voltage,” and if so, determining that “the action interval comprises the central substation and a substation where the train is located.”
These limitations are indefinite for at least the following reasons.
First, “action interval” is unclear as to its type and boundaries. The term “interval” could reasonably denote a time interval (control period), a spatial interval/region along a rail line, or an index/set of substations and trains considered in a state vector. While the specification discusses “strong coupling interval” and searching left/right for an interval (see, e.g., [0068]), the claim language does not clearly define whether the “action interval” is spatial, temporal, or topological (set-based), nor how its endpoints are represented in the control method.
Second, “impact … on a terminal voltage” is indefinite because “impact” is not defined with an objective metric. It is unclear whether “impact” means absolute voltage deviation |Umid−Uoc|, a percentage change, a maximum deviation over a window, a derivative/slope, or another measure. The claim also does not specify the evaluation conditions (e.g., train power fixed at maximum, as described in the specification, see [0068]) as part of the claim, leaving uncertainty as to the testing scenario for determining “impact.”
INDEFINITENESS – CLAIM 7 “CHANGE RATIO OF THE FUSION RATIO REACHES A TERMINATION VALUE”
Claim 7 recites “repeating the step of updating the fusion ratio until a change ratio of the fusion ratio reaches a termination value.”
This limitation is indefinite because neither “change ratio” nor “termination value” is defined with reasonable certainty in the claims. It is unclear whether “change ratio” is an absolute change |k(i)−k(i−1)|, a relative change |Δk|/|k|, a moving-average change, or another statistic. It is also unclear whether “termination value” is a fixed constant, a learned parameter, user-selected, or derived from system constraints. While the specification gives an example value (e.g., 0.001) (see, e.g., [0080]), the claims do not recite an objective definition that would inform the scope.
PRIOR ART REFERENCES RELIED UPON
Reference 1: CN 105226790 B (Urban rail super capacitor energy-storage system capacity control method)
Reference 2: US 9,679,258 B2 (Methods and apparatus for reinforcement learning)
Reference 3: US 2018/0088576 A1 (System delay corrected control method for autonomous vehicles)
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-7 and 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Reference 1 in view of Reference 2 and further in view of Reference 3.
────────────────────────────────────────
A method for controlling an energy storage system for rail transit, comprising: determining an offline charging-discharging action according to a state of an energy storage system based on an offline algorithm; determining an online charging-discharging action according to the state of the energy storage system based on a deep reinforcement learning algorithm; acquiring a fusion ratio of the offline charging-discharging action to the online charging-discharging action according to a communication delay amount and a delay degree; and fusing the offline charging-discharging action and the online charging-discharging action according to the fusion ratio and outputting a fusion result to the energy storage system.
CLAIM 1 – LIMITATION-BY-LIMITATION ANALYSIS
A. “A method for controlling an energy storage system for rail transit”
Reference 1 is directed to an “urban rail super capacitor energy-storage system capacity control method,” i.e., controlling an energy storage system for rail transit (urban rail). Reference 1 expressly operates in “each controlling cycle” and determines charge/discharge behavior of the supercapacitor energy-storage system based on measured system voltages, thereby constituting a rail-transit energy-storage control method.
B. “determining an offline charging-discharging action according to a state of an energy storage system based on an offline algorithm”
Reference 1 teaches determining a charging/discharging “action” (control decision) based on the “state” of the energy storage system using a predetermined algorithmic relationship (i.e., a fixed computational rule set / formula implemented by a threshold calculation module). Specifically:
Reference 1 discloses that, in each controlling cycle, a “discharge and recharge threshold calculation module” outputs a charge threshold U_char and a discharge threshold U_dis in real time based on supercapacitor module terminal voltage U_sc and constant parameters k1, k2, U_ref1, U_ref2, U_sc_min, and U_sc_max. These thresholds (U_char, U_dis) define the charging/discharging action to be applied.
Reference 1 further discloses using the magnitude relationship between the direct current supply net voltage U_dc (traction supply network voltage) and the thresholds U_char and U_dis to determine whether the system should be in a charged state, discharge state, or holding state.
Accordingly, Reference 1 teaches determining an actionable charge/discharge control output (thresholds and associated charge/discharge/hold decision) according to system state variables (at least U_sc and U_dc) using a predetermined algorithmic computation.
As applied in the combination, the predetermined threshold computation and state-to-action mapping of Reference 1 is reasonably characterized as an “offline algorithm” in the sense that the control law (including k1, k2, U_ref1, U_ref2 and related bounds) is designed and fixed prior to runtime (i.e., not learned online), and then executed during operation to yield an “offline” charging-discharging action (e.g., a baseline, rule-based threshold action).
C. “determining an online charging-discharging action according to the state of the energy storage system based on a deep reinforcement learning algorithm”
Reference 2 teaches reinforcement learning using neural networks (including deep neural networks) for systems with multiple states and actions, and teaches training a neural network used to select actions based on training data (transitions including state data, action data, next state data) and reward data.
Reference 1 already provides a charge/discharge action space naturally expressed as selectable charge/discharge threshold actions (e.g., selecting or adjusting threshold(s) corresponding to a charge/discharge decision for the rail energy storage system). It would have been obvious to one of ordinary skill in the art to apply the deep reinforcement learning framework of Reference 2 to the rail energy storage control context of Reference 1 to determine an “online” charging-discharging action from the system state (e.g., the measured voltages U_sc and U_dc, and/or other sensed rail power variables) because doing so is a recognized technique to adapt action selection to time-varying system conditions via reward-driven learning rather than using only fixed thresholds.
Thus, the combination teaches determining an online charging-discharging action according to system state using a deep reinforcement learning algorithm (Reference 2) in the rail energy storage control setting of Reference 1.
D. “acquiring a fusion ratio of the offline charging-discharging action to the online charging-discharging action according to a communication delay amount and a delay degree”
Reference 3 teaches explicitly obtaining/estimating communication delay and using weighting coefficients in a weighted algorithm tied to delay parameters. Specifically:
Reference 3 defines communication delay 324 as the delay/cost between a control system and a communication bus and responses from the vehicle.
Reference 3 discloses that a system delay determination module 304 calculates an overall/final system delay using a predetermined delay algorithm that is a weighted algorithm, where each of steering control delay 321, speed control delay 322, computational delay 323, or communication delay 324 is assigned a weight factor or coefficient.
Reference 3 further discloses a scenario/delay mapping table 150 created offline (by a data analytics system 103) and used later in real time via lookup, where mapping entries map scenarios to system delay and incorporate communication delay.
These teachings collectively evidence acquiring (i) communication delay amount (communication delay 324) and (ii) a delay “degree” in the sense of a characterized delay condition reflected by scenario/delay mapping table 150 and/or the multi-factor delay model using multiple delay components and adjustable weights.
It would have been obvious to use Reference 3’s delay-derived weighting coefficient(s) as a “fusion ratio” to govern how much to rely on the baseline/offline action (Reference 1’s threshold-based action) versus the online deep-RL action (Reference 2 as applied to Reference 1), because Reference 3 expressly teaches assigning weights responsive to delay characteristics to adjust control computation, and the claimed fusion ratio is a predictable form of such weighting when blending two control sources under communication delay constraints.
E. “fusing the offline charging-discharging action and the online charging-discharging action according to the fusion ratio and outputting a fusion result to the energy storage system”
Reference 3 teaches performing weighted calculations using weight factors/coefficients within a predetermined weighted algorithm. Applying this teaching to the combined control approach (Reference 1 baseline + Reference 2 online RL) would have rendered it obvious to compute a weighted combination (fusion) of:
the offline charging-discharging action (baseline threshold action derived from U_sc/U_dc and fixed parameters in Reference 1), and
the online charging-discharging action (deep-RL selected action of Reference 2 as applied to Reference 1),
using a fusion ratio derived from the delay metrics (Reference 3 communication delay 324 and associated weighting coefficients), and then output the fused action as the applied control command to the energy storage system (i.e., implement the resulting charge/discharge threshold/action for the rail energy storage apparatus of Reference 1).
Therefore, the combined references teach (or at least render obvious) fusing the offline and online actions according to a delay-dependent fusion ratio and outputting the fusion result to the energy storage system.
CLAIM 1 – MOTIVATION TO COMBINE / OBVIOUSNESS RATIONALE
It would have been obvious to one of ordinary skill in the art to modify the urban rail supercapacitor threshold-based control of Reference 1 by incorporating deep reinforcement learning action selection of Reference 2 to improve adaptability to time-varying rail traction conditions (state-dependent, reward-driven selection of charge/discharge actions rather than solely fixed rule thresholds). It would have further been obvious to incorporate the delay-aware weighted control teachings of Reference 3 (communication delay 324, weighted algorithm with coefficients, and offline-created scenario/delay mapping table 150) to robustly govern how much to rely on learned online action versus baseline action when communication delay/quality degrades, because Reference 3 explicitly teaches delay-dependent weighting for control computations and offline-constructed mappings used in real time. The resulting “fusion ratio” and weighted blending of offline and online actions is a predictable use of known weighting techniques to achieve stable control output under communication delay, consistent with KSR-type reasoning (combining known elements according to known methods to yield predictable results).
──────────────────────────────────────── CLAIM 2 (Rejected under 35 U.S.C. § 103 over References 1-3) ────────────────────────────────────────
The method for controlling an energy storage system for rail transit according to claim 1, wherein the step of determining an online charging-discharging action according to the state of the energy storage system based on a deep reinforcement learning algorithm comprises: receiving the state of the energy storage system and the offline charging-discharging action; using the offline charging-discharging action as an initial value of a neural network and training the neural network using training data, wherein the neural network outputs an action-value function according to the state of the energy storage system; and acquiring the online charging-discharging action based on the action-value function and a greedy strategy.
CLAIM 2 – LIMITATION-BY-LIMITATION ANALYSIS
A. “receiving the state of the energy storage system and the offline charging-discharging action”
Reference 1 teaches obtaining/receiving measured system state values including supercapacitor module terminal voltage U_sc and direct current supply net voltage U_dc (gathered by sensors per Reference 1’s disclosure of gathering voltages), and computing the offline action represented by thresholds U_char and U_dis. Thus, Reference 1 teaches receiving state information and receiving/obtaining the baseline/offline action (thresholds and corresponding charge/discharge/hold determination).
In the combined system, the deep reinforcement learning agent of Reference 2 necessarily receives state data as input (Reference 2 teaches transitions comprising starting state data, action data, next state data) and can also receive the offline action output as an input feature or supervisory signal (baseline action recommendation).
B. “using the offline charging-discharging action as an initial value of a neural network and training the neural network using training data”
Reference 2 teaches training a neural network to select actions using training data including transitions (starting state data, action data, next state data) and reward data, and deriving target values for training from another neural network instance whose parameters are periodically updated from the action-selecting network (target network concept).
While Reference 2 teaches the general mechanism of neural-network-based reinforcement learning training, Reference 1 provides a baseline offline action (U_char/U_dis threshold action) deterministically computed from state. It would have been obvious to use the baseline action output from Reference 1 to initialize, bias, or seed the neural network training process of Reference 2 (i.e., an initial value guiding action preference) because doing so reduces random exploration and accelerates convergence, especially in safety-critical control systems where a known-safe baseline policy is available.
Thus, the combination renders obvious using the offline action as an initial value (initial action preference / initial Q bias / initial policy bias) and training the neural network using training data.
C. “wherein the neural network outputs an action-value function according to the state of the energy storage system”
Reference 2 explicitly describes Q-learning where the “Quality of a state-action combination” describes an action-value function used to determine expected utility of an action. Reference 2 teaches training a neural network used to select actions, which inherently corresponds to estimating/representing an action-value function over state-action pairs (i.e., Q(s,a) concept as described by Reference 2). Accordingly, Reference 2 teaches a neural network outputting an action-value function responsive to state input.
Applied to the state variables of Reference 1 (U_sc, U_dc, etc.), the neural network outputs action-value information for charge/discharge threshold actions.
D. “acquiring the online charging-discharging action based on the action-value function and a greedy strategy”
Reference 2 teaches selecting actions using the learned action-value function (expected utility of actions in given states). A greedy strategy (choosing the action with the highest action-value, or otherwise prioritizing maximum-utility actions) is a well-known and predictable action selection approach in Q-learning systems. Therefore, it would have been obvious to select the online charging-discharging action as the action maximizing the neural-network-estimated action-value function for the present state (i.e., greedy selection), especially given Reference 2’s disclosure of using the action-value function to determine expected utility and thereby select actions.
CLAIM 2 – MOTIVATION TO COMBINE / OBVIOUSNESS RATIONALE
It would have been obvious to implement the deep reinforcement learning of claim 1 in a practical rail energy storage controller by (i) feeding the controller’s measured state (Reference 1: U_sc, U_dc) into the RL neural network (Reference 2 training transitions require state input), (ii) providing the baseline threshold action of Reference 1 (U_char/U_dis) to initialize or bias the RL network’s initial action preference to improve training speed and stability, and (iii) selecting actions greedily with respect to the action-value function because greedy action selection is a conventional, predictable technique for deploying Q-learning-based controllers to choose the highest-utility control action.
──────────────────────────────────────── CLAIM 3 (Rejected under 35 U.S.C. § 103 over References 1-3) ────────────────────────────────────────
The method for controlling an energy storage system for rail transit according to claim 2, wherein the step of determining an online charging-discharging action according to the state of the energy storage system based on a deep reinforcement learning algorithm further comprises: storing used training data, and randomly extracting training data from the used training data to train the neural network again.
CLAIM 3 – LIMITATION-BY-LIMITATION ANALYSIS
A. “storing used training data”
Reference 2 teaches “training data” comprising a plurality of transitions (starting state data, action data, next state data) and (per its system claims) reward data associated with actions. Such training data, to be used for neural network training, is inherently stored at least temporarily in memory for batch or iterative training.
Reference 3 likewise teaches a data processing system 110 and offline-created mapping artifacts (scenario/delay mapping table 150) created by a data analytics system 103 and later used, evidencing storage and retrieval of data structures used by control/learning functions. This reinforces the well-known implementation of storing historical data used for later computations.
B. “randomly extracting training data from the used training data to train the neural network again”
Reference 2’s reinforcement learning framework is implemented by training on transitions from prior interactions. It would have been obvious to randomly sample from stored transitions during training (rather than only using newest samples in sequence) because random sampling is a known technique to reduce temporal correlation in sequential control data and improve stability of neural network training in reinforcement learning. This is a predictable design choice for implementing neural-network RL systems in control contexts.
CLAIM 3 – MOTIVATION TO COMBINE / OBVIOUSNESS RATIONALE
It would have been obvious to store and reuse training transitions (Reference 2: transitions including state/action/next state and reward) in a data processing system (Reference 3: data processing system 110; offline/online stored tables such as mapping table 150) and to randomly extract stored transitions for subsequent training iterations to improve stability and convergence of an RL controller applied to rail energy storage control, because sequential rail system data is time-correlated and random sampling is a conventional method to mitigate correlation and prevent unstable learning behavior.
──────────────────────────────────────── CLAIM 4 (Rejected under 35 U.S.C. § 103 over References 1-3) ────────────────────────────────────────
The method for controlling an energy storage system for rail transit according to claim 1, wherein before the step of determining an offline charging-discharging action according to a state of an energy storage system based on an offline algorithm, the method further comprises: acquiring an action interval of the energy storage system, wherein the state of the energy storage system comprises a state of a substation, a state of a train, and a state of an energy storage apparatus in the action interval.
CLAIM 4 – LIMITATION-BY-LIMITATION ANALYSIS
A. “before … determining an offline charging-discharging action … acquiring an action interval of the energy storage system”
Reference 1 teaches operating “in each controlling cycle,” i.e., a repeated control interval during which state is measured and thresholds U_char/U_dis are computed and applied. Thus, Reference 1 teaches acquiring/establishing an action interval (control cycle) before determining thresholds and corresponding charge/discharge/hold actions.
B. “wherein the state of the energy storage system comprises a state of a substation, a state of a train, and a state of an energy storage apparatus in the action interval”
Reference 1 teaches the system state used for control includes at least:
direct current supply net voltage U_dc (traction power supply network voltage). This is reasonably attributable to the substation/supply network condition providing DC traction power, and thus corresponds to a “state of a substation” (supply network output condition).
supercapacitor module terminal voltage U_sc, and associated bounds U_sc_min and U_sc_max. These correspond to the energy storage apparatus state.
Regarding “state of a train,” Reference 1’s U_dc traction supply net voltage reflects the instantaneous loading and regenerative conditions of the traction network driven by trains (traction/braking), and thus reflects train operating influence within the control interval. It would have been obvious to explicitly include available train-related state (e.g., traction/braking state, power demand/regen indication) as an input feature to the offline threshold algorithm of Reference 1 and/or the RL algorithm of Reference 2, because the rail traction network voltage U_dc is directly impacted by train behavior and controllers routinely incorporate train-derived measurements or estimates to improve charging/discharging decisions.
Therefore, the combined teachings render obvious defining the control state within each action interval as including substation-related state (U_dc), energy storage state (U_sc), and train-related state (train traction/braking influence), within each action interval.
CLAIM 4 – MOTIVATION TO COMBINE / OBVIOUSNESS RATIONALE
It would have been obvious to formalize the control cycle of Reference 1 (“each controlling cycle”) as an “action interval” and to define the controller state vector to include the traction supply network/substation condition (Reference 1: U_dc), the energy storage condition (Reference 1: U_sc and related bounds), and train-related condition because train behavior is the dominant cause of traction network voltage variation and regenerative energy availability. Further, RL-based control per Reference 2 predictably benefits from including relevant environment state variables, and incorporating train state information is a routine, predictable enhancement to improve decision quality in rail energy storage control.
──────────────────────────────────────── CLAIM 5 (Rejected under 35 U.S.C. § 103 over References 1-3) ────────────────────────────────────────
The method for controlling an energy storage system for rail transit according to claim 4, wherein the step of acquiring an action interval of the energy storage system comprises: selecting a central substation; determining whether impact of the train at different positions on a terminal voltage of the central substation is greater than a threshold voltage; and when the impact is greater than the threshold voltage, determining that the action interval comprises the central substation and a substation where the train is located.
CLAIM 5 – LIMITATION-BY-LIMITATION ANALYSIS
A. “selecting a central substation”
Reference 1 is expressly based on the traction DC supply network voltage U_dc and thus contemplates one or more substations supplying the urban rail DC traction network (i.e., the “direct current supply net”). Selecting a particular substation (a “central” substation) as a control reference is an obvious implementation step when monitoring a DC supply network voltage and applying charge/discharge thresholds relative to that network’s operating condition.
B. “determining whether impact of the train at different positions on a terminal voltage of the central substation is greater than a threshold voltage”
Reference 1 already compares measured voltages to thresholds (U_dc compared to U_char and U_dis; and the thresholds are computed from U_sc using slopes k1 and k2 and bounds U_ref1 and U_ref2). Thus, Reference 1 teaches the general technique of threshold-based voltage comparison to classify control regimes.
Reference 3 likewise teaches threshold-related weighting adjustments dependent on scenario and delay and teaches that weights may be adjusted in real time based on conditions (e.g., weight coefficient adjusted higher when conditions exceed a predetermined threshold).
Applying these teachings, it would have been obvious to determine whether train-caused impact on a selected substation terminal voltage exceeds a threshold voltage, because threshold evaluation of voltage deviation is a predictable use of Reference 1’s voltage-threshold comparison approach when defining a control region/interval influenced by train position (i.e., if train influence causes voltage variation beyond threshold, include that region in the control interval).
C. “when the impact is greater than the threshold voltage, determining that the action interval comprises the central substation and a substation where the train is located”
It would have been obvious, once determining that a train’s impact on a selected substation terminal voltage exceeds a threshold, to define the action interval to include the relevant substations connected by the traction network path between the central substation and the train’s located substation, because those substations define the electrically coupled portion of the rail power supply network that materially influences U_dc and thus the effectiveness of charging/discharging actions. This is a predictable, common-sense partitioning of a distributed traction supply network into a region of influence for control.
CLAIM 5 – MOTIVATION TO COMBINE / OBVIOUSNESS RATIONALE
It would have been obvious to implement action-interval acquisition by selecting a representative (“central”) substation reference point for measuring U_dc and then using threshold-based voltage-impact criteria to decide which substations fall within the effective region of control influence, because Reference 1 already teaches threshold-based voltage comparisons driving control decisions, and rail traction networks are known to be electrically coupled such that only sufficiently impactful train conditions justify expanding the control region. Defining the action interval to include the train’s substation when the voltage-impact exceeds a threshold is a predictable engineering choice to focus control computation on the region that materially affects energy transfer and voltage stabilization.
──────────────────────────────────────── CLAIM 6 (Rejected under 35 U.S.C. § 103 over References 1-3) ────────────────────────────────────────
The method for controlling an energy storage system for rail transit according to claim 1, wherein the step of acquiring a fusion ratio of the offline charging-discharging action to the online charging-discharging action according to a communication delay amount and a delay degree comprises: acquiring a correspondence between any communication delay amount and delay degree and the fusion ratio through pre-training; and based on the correspondence, acquiring the fusion ratio of the offline charging-discharging action to the online charging-discharging action according to the communication delay amount and the delay degree.
CLAIM 6 – LIMITATION-BY-LIMITATION ANALYSIS
A. “acquiring a correspondence between any communication delay amount and delay degree and the fusion ratio through pre-training”
Reference 3 teaches an offline-created mapping artifact tied to delay, namely scenario/delay mapping table 150, created offline by a data analytics system 103 and later used in real time. Reference 3 further teaches communication delay 324 as a component of overall delay, and a predetermined weighted algorithm where delay components are assigned weight factors/coefficients.
These teachings render obvious acquiring, through offline processing (“pre-training” in the sense of offline analysis/learning), a correspondence that maps delay conditions (communication delay 324 and/or scenario-classified delay conditions) to a weighting coefficient used during real-time control computations.
Adapting this to the combined control architecture, that weight coefficient is used as the “fusion ratio” between the offline action (Reference 1 thresholds) and the online action (Reference 2 learned action).
B. “based on the correspondence, acquiring the fusion ratio … according to the communication delay amount and the delay degree”
Reference 3 explicitly teaches performing a lookup operation in scenario/delay mapping table 150 in real time after identifying the applicable scenario, thereby acquiring delay information for subsequent control use. Additionally, Reference 3 teaches adjusting weight factors/coefficients for delays, including communication delay 324, potentially in real time.
Thus, Reference 3 teaches acquiring a delay-dependent coefficient using an offline-created correspondence and applying it in real time, which corresponds to acquiring the claimed fusion ratio based on delay amount and delay degree.
CLAIM 6 – MOTIVATION TO COMBINE / OBVIOUSNESS RATIONALE
It would have been obvious to use the offline mapping and weighted-delay computation approach of Reference 3 (scenario/delay mapping table 150 created offline by data analytics system 103; communication delay 324; weight factors/coefficients in a weighted algorithm) to pre-establish a mapping from measured/estimated communication delay conditions to a coefficient controlling how much the system should rely on a learned online policy versus a baseline offline policy, because Reference 3 expressly teaches offline creation of delay mappings for later real-time control compensation and delay-dependent weighting. Using that coefficient as a fusion ratio to blend two candidate charge/discharge actions is a predictable application of known delay-dependent weighting techniques to achieve robust control output under varying communication delay.
──────────────────────────────────────── CLAIM 7 (Rejected under 35 U.S.C. § 103 over References 1-3) ────────────────────────────────────────
The method for controlling an energy storage system for rail transit according to claim 6, wherein the step of acquiring a correspondence between any communication delay amount and delay degree and the fusion ratio through pre-training comprises: initializing the fusion ratio; under any communication delay amount and delay degree, acquiring the online charging-discharging action according to the state of the energy storage system; acquiring the offline charging-discharging action according to the state of the energy storage system; calculating a fused charging-discharging action based on the online charging-discharging action, the offline charging-discharging action and the fusion ratio; performing the offline charging-discharging action and the fused charging-discharging action separately, to obtain a first reward signal that is based on the fused charging-discharging action and a second reward signal that is based on the offline charging-discharging action; updating the fusion ratio based on the first reward signal and the second reward signal, wherein when the first reward signal is greater than the second reward signal, the fusion ratio is increased, and when the first reward signal is less than the second reward signal, the fusion ratio is reduced; and repeating the step of updating the fusion ratio until a change ratio of the fusion ratio reaches a termination value.
CLAIM 7 – LIMITATION-BY-LIMITATION ANALYSIS
A. “initializing the fusion ratio”
Reference 3 teaches the use of weight factors/coefficients in a weighted algorithm for delay determination (weights assigned to delays such as steering control delay 321, speed control delay 322, computational delay 323, communication delay 324). Such weight factors are necessarily initialized to some starting values before adjustment. Thus, Reference 3 renders obvious initializing the fusion ratio (as a weight coefficient).
B. “under any communication delay amount and delay degree, acquiring the online charging-discharging action according to the state of the energy storage system”
Reference 3 teaches estimating communication delay 324 and using delay information in control computations. Reference 2 teaches selecting actions using a reinforcement learning agent and neural network based on state inputs. Thus, for a given delay condition, the controller can acquire the online action produced by the RL agent (Reference 2) from the system state (Reference 1 state variables such as U_sc and U_dc).
C. “acquiring the offline charging-discharging action according to the state of the energy storage system”
Reference 1 explicitly teaches computing thresholds U_char and U_dis from state variables such as U_sc (and using U_dc relative to thresholds to determine charge/discharge/hold). Therefore, Reference 1 teaches acquiring the offline charging-discharging action according to system state.
D. “calculating a fused charging-discharging action based on the online charging-discharging action, the offline charging-discharging action and the fusion ratio”
Reference 3 teaches a weighted algorithm using weight factors/coefficients applied to combine delay components in calculating overall system delay. This is an express teaching of weighted combination using coefficients. Applying this technique, it would have been obvious to compute a fused action as a weighted combination of two candidate actions (offline baseline action and online learned action) using a fusion ratio (a weight coefficient), because that is a predictable extension of weighted computation to combine multiple candidate values.
E. “performing the offline charging-discharging action and the fused charging-discharging action separately, to obtain a first reward signal … and a second reward signal …”
Reference 2 teaches reinforcement learning based on reward data defining reward values or costs resulting from actions. Thus, Reference 2 teaches that executing different actions yields different reward signals, and those reward signals are used to update learning.
In the combined system, it would have been obvious during offline development (“pre-training” stage) to evaluate candidate control strategies (baseline/offline-only vs fused action) by simulating/executing them and measuring resulting performance (reward/cost). This is a predictable and routine approach for selecting/tuning controller parameters in reinforcement learning and control design.
F. “updating the fusion ratio based on the first reward signal and the second reward signal … increasing when first reward > second reward … reducing when first reward < second reward”
Reference 2 teaches updating learning targets and network parameters based on reward data to increase expected utility of selected actions. Reference 3 teaches adjusting weight coefficients in real time depending on conditions. Together, these teachings render obvious adjusting a fusion coefficient in the direction that improves reward (i.e., if fused action yields higher reward than baseline, increase reliance on fused/online component; if lower, decrease it), because that is a straightforward optimization rule consistent with reward-driven adaptation and coefficient adjustment.
G. “repeating … until a change ratio of the fusion ratio reaches a termination value”
Reference 2 teaches iterative training of neural networks using repeated transitions and periodic updates. Iterative procedures require stopping criteria (termination) such as convergence. Therefore, it would have been obvious to repeat coefficient updates until the coefficient change falls below a threshold (termination value), which is a standard convergence criterion in iterative optimization and learning.
CLAIM 7 – MOTIVATION TO COMBINE / OBVIOUSNESS RATIONALE
It would have been obvious to implement the delay-to-coefficient correspondence (Reference 3 mapping table 150 and weighted-delay coefficients including communication delay 324) through an offline tuning process that uses reward-driven evaluation (Reference 2 reward data tied to actions) to select or adjust the fusion coefficient, because (i) Reference 2 teaches using reward outcomes to drive parameter updates toward higher-utility behavior, (ii) Reference 3 teaches adjustable weighting coefficients in control computations and offline-created mappings later used online, and (iii) evaluating baseline-only versus blended control and updating a blending coefficient based on comparative performance is a predictable, routine parameter-tuning technique to improve robustness and performance under varying delay conditions.
──────────────────────────────────────── CLAIM 9 (Rejected under 35 U.S.C. § 103 over References 1-3) ────────────────────────────────────────
An electronic device, comprising: a memory and a processor, wherein the memory and the processor are in communication connection with each other, the memory stores computer instructions, and the processor is configured to execute the computer instructions to perform the method for controlling an energy storage system for rail transit according to claim 1.
CLAIM 9 – LIMITATION-BY-LIMITATION ANALYSIS
A. “An electronic device, comprising: a memory and a processor, wherein the memory and the processor are in communication connection with each other, the memory stores computer instructions”
Reference 3 teaches a data processing system 110 implementing planning module 301, control module 302, and system delay determination module 304, and further teaches offline-created scenario/delay mapping table 150 used via lookup operations. Such modules and stored tables evidence a processor executing instructions and a memory storing instructions/data in communication with the processor.
B. “the processor is configured to execute the computer instructions to perform the method … according to claim 1”
Reference 1 teaches the control method for the urban rail supercapacitor energy-storage system (computing thresholds U_char and U_dis and determining charge/discharge/hold). Reference 2 teaches implementing reinforcement learning action selection using neural networks based on training data transitions and reward data. Reference 3 teaches delay estimation (communication delay 324) and weighted algorithms with weight coefficients and stored mapping table 150 created offline and used online.
Thus, it would have been obvious to implement the combined method of claim 1 on an electronic device comprising a processor and memory storing instructions, as taught by Reference 3’s data processing system 110 and its executable modules.
CLAIM 9 – MOTIVATION TO COMBINE / OBVIOUSNESS RATIONALE
It would have been obvious to implement the combined rail energy storage control method on an electronic device with processor and memory because Reference 3 explicitly discloses computerized modules (planning module 301, control module 302, system delay determination module 304) and stored data structures (mapping table 150) that execute delay-aware control computations, and References 1-2 similarly require computation over sensed state and learned policy outputs. Implementing such methods as processor-executed instructions stored in memory is a conventional and predictable design choice.
──────────────────────────────────────── CLAIM 10 (Rejected under 35 U.S.C. § 103 over References 1-3) ────────────────────────────────────────
A non-transitory computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the computer instructions are used for enabling a computer to perform the method for controlling an energy storage system for rail transit according to claim 1.
CLAIM 10 – LIMITATION-BY-LIMITATION ANALYSIS
A. “A non-transitory computer-readable storage medium … stores computer instructions”
Reference 3 teaches stored data structures and software functionality in a data processing system 110, including scenario/delay mapping table 150 created offline and used by modules during operation. This evidences storage of instructions and control-related data on non-transitory media.
B. “the computer instructions are used for enabling a computer to perform the method … according to claim 1”
For the same reasons stated for claim 9, References 1-3 collectively teach or render obvious the method steps of claim 1 as implemented by a computer system. Therefore, it would have been obvious to provide instructions on a non-transitory computer-readable storage medium to enable performance of that method.
CLAIM 10 – MOTIVATION TO COMBINE / OBVIOUSNESS RATIONALE
It would have been obvious to provide the control method as computer instructions stored on non-transitory media because References 1-3 are directed to computational control methods (threshold computation, neural-network RL training/action selection, delay estimation and weighted control computation) that are conventionally embodied in stored program instructions executed by a processor (Reference 3: data processing system 110 with modules and stored mapping table 150).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON C SMITH whose telephone number is (703)756-4641. The examiner can normally be reached Monday - Friday 8:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Allen Shriver can be reached at (303) 297-4337. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Jason C Smith/ Primary Examiner, Art Unit 3613