Last updated: April 19, 2026
Application No. 18/381,342
COMPUTER-IMPLEMENTED METHOD FOR CONFIGURING A CONTROLLER FOR A TECHNICAL SYSTEM

Non-Final OA §101§102§103
Filed
Oct 18, 2023
Examiner
EVERETT, CHRISTOPHER E
Art Unit
2117
Tech Center
2100 — Computer Architecture & Software
Assignee
Siemens Schweiz AG
OA Round
1 (Non-Final)
Interview Optional

— +23.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 830 resolved cases, 2023–2026
Examiner Intelligence

EVERETT, CHRISTOPHER E View full profile →
Grants 83% — above average
Career Allow Rate
692 granted / 830 resolved
+28.4% vs TC avg
Strong +24% interview lift
Without
With
+23.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
37 currently pending
Career history
867
Total Applications
across all art units
Statute-Specific Performance

§101
8.3%
-31.7% vs TC avg
§103
53.4%
+13.4% vs TC avg
§102
25.7%
-14.3% vs TC avg
§112
7.6%
-32.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 830 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  


Claim Objections
Claim 5 is objected to because of the following informalities: the limitation “one or more ambient variables around the building, the ambient temperature around the building” is unclear.  Are the phrases separate (i.e., ambient variables and ambient temperature are different limitations) or connected (i.e., ambient variables include ambient temperature)?  Appropriate correction is required.
Claim 6 is objected to because of the following informalities: the limitation “building, ;” includes a grammatical error.  Appropriate correction is required.
Claim 7 is objected to because of the following informalities: the limitations “the maximum room temperature” and “the minimum temperature” lack antecedent basis.  Appropriate correction is required.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claims 13-14 are rejected under 35 U.S.C. 101 because the applicant has provided evidence that the applicant intends the term "computer program product” and “computer program” to include non-statutory matter.  The applicant describes a computer program product and compute program as including open ended language and thus it is reasonable to interpret it to include all possible mediums, including non-statutory mediums (see lines 13-17, page 6).  The words "computer" and/or "product" are insufficient to convey only statutory embodiments to one of ordinary skill in the art absent an explicit and deliberate limiting definition or clear differentiation between storage media and transitory media in the disclosure.  As such, the claims are drawn to a form of energy.  Energy is not one of the four categories of invention and therefore these claims are not statutory.  Energy is not a series of steps or acts and thus is not a process.  Energy is not a physical article or object and as such is not a machine or manufacture.  Energy is not a combination of substances and therefore not a composition of matter.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.



Claims 1-6 and 10-15 are rejected under 35 U.S.C. 102(a)(1) as being unpatentable by
U.S. Patent Application Publication No. 2021/0191342 (Lee) (cited by Applicant).


Claim 1:
The cited prior art describes a computer-implemented method for configuring a controller for a technical system, (Lee: “The present disclosure relates generally to HVAC control systems, and more particularly to training models used to control HVAC systems.” Paragraph 0001)
where the controller controls the technical system based on an output data set determined by the controller for an input data set, (Lee: see the hvac controller 914 as illustrated in figure 9; “Controller 914 may be configured to determine the current state of HVAC system 916 using sensor measurements from HVAC system 916. Controller 914 may be configured to use RL model 912 to determine the preferred action for HVAC system 916 to perform based on the output of RL model 912. Controller 914 may be configured to send one or more control signals or instructions to HVAC system 916 based on output of RL model 912. Controller 914 may be configured to generate real experience data and store the real experience data in real experience storage 918.” Paragraph 0220; “Referring now to FIG. 11, a RL model controller 1100 is shown, according to one embodiment.” Paragraph 0231)
where the output data set comprises respective future values of one or more control variables for one or more subsequent time points not before a current time point, (Lee: “Controller 914 may be configured to use RL model 912 to determine the preferred action for HVAC system 916 to perform based on the output of RL model 912. Controller 914 may be configured to send one or more control signals or instructions to HVAC system 916 based on output of RL model 912.” Paragraph 0220; “RL model 606 can define a plurality of states. In some embodiments, the plurality of states can each be defined to include one or more of a system setpoint, current temperature measurement, occupancy data, weather data, or any other endogenous or exogenous parameters discussed herein. A future state may be defined as the state of HVAC system 612 after a pre-determined time interval. In some embodiments, the time interval is based on a frequency to update control parameters (e.g., every minute, hour, day, etc.). In some embodiments, the time period is based on the time day (e.g., before working hours, during the work day, after the work day, etc.). In some embodiments, the pre-determined time interval is based on how often control signals are sent to the HVAC system 612 using the RL model 606.” Paragraph 0171; “At 1006, the HVAC system is controlled using the trained RL model. The HVAC system may be controlled by a controller using the RL model. The controller may determine a current state of the HVAC system using sensor and operational measurements from the HVAC system. The controller may determine a control action for the HVAC system to perform based on the output of the RL model. The controller may send one or more control signals or instructions to the HVAC system based on the output of the RL model.” Paragraph 0225)
where the input data set comprises (Lee: “The surrogate model may be trained using experience data generated by a calibrated simulation model. In some embodiments, the surrogate model may be trained using initial real experience data generated by the HVAC system during an offline period of time prior to an online operation, in addition to or alternatively to other training data. In some embodiments, the initial real experience data may be random experience data generated from random control actions of the HVAC system.” Paragraph 0223)
respective past values of one or more state variables for one or more subsequent time points not after the current time point and (Lee: “Sample input data is generally categorized either as an exogenous parameter or an endogenous parameter. Exogenous parameters are parameters pertaining to the environment or otherwise outside of the control of the HVAC system, such as time of day, weather and weather forecasts, occupancy schedules, and occupancy trends.” Paragraph 0121)
respective past values of one or more target variables for one or more subsequent time points not after the current time point and (Lee: “Herein, any training data, experience data, or measured data (such as that received from an HVAC system) can include timeseries data. A timeseries can include a series of values for the same point and a timestamp for each of the data values. For example, a timeseries for a point provided by a temperature sensor can include a series of temperature values measured by the temperature sensor and the corresponding times at which the temperature values were measured.” Paragraph 0162; “AHU controller 330 may also receive a measurement of the temperature of building zone 306 from a temperature sensor 364 located in building zone 306.” Paragraph 0084)
respective past values of the one or more control variables for one or more subsequent time points before the current time point, wherein the method comprises: (Lee: “Sample input data is generally categorized either as an exogenous parameter or an endogenous parameter. . . . Endogenous parameters are variables chosen by a control system, such as setpoints or operating conditions.” Paragraph 0121)
training a first data driven model with training data comprising several pre-known input data sets and corresponding pre-known output data sets for the respective pre-known input data sets, where the first data driven model predicts respective future values of the one or more target variables for one or more subsequent time points after the current time point; and (Lee: see the surrogate model trainer 904 and surrogate model 902 as illustrated in figure 9; see the future values up to time corresponding to the end of peak hours on the current day T as described in paragraphs 0139, 0140, 0141; “Surrogate model trainer 904 may generally initialize, train, and retain surrogate model 902. Surrogate model trainer 904 may initially train surrogate model 902 using sample data from a different HVAC system than the HVAC system 916 controlled by the RL model 910. In some embodiments, surrogate model trainer 904 may use a calibrated simulation model of the HVAC system to generate sample data, and initially train the surrogate model 902 with the sample data.” Paragraph 0217; “The surrogate model may be trained using experience data generated by a calibrated simulation model. In some embodiments, the surrogate model may be trained using initial real experience data generated by the HVAC system during an offline period of time prior to an online operation, in addition to or alternatively to other training data. In some embodiments, the initial real experience data may be random experience data generated from random control actions of the HVAC system.” Paragraph 0223)
training a second data driven model with the training data using reinforcement learning with a reward depending on the respective future values of the one or more target variables which are predicted by the trained first data driven model, where the trained second data driven model is configured to determine the output data set for the input data set within the controller. (Lee: see the RL trainer 910 and RL model 912 as illustrated in figure 9 and the train a RL model using the simulated experience data 1004 as illustrated in figure 10; “In some embodiments, RL trainer 910 is configured to only use simulated experience data to retrain the RL model 912. In some such embodiments, RL trainer 910 may be configured to sample the simulated experience data based on when the simulated experience data as generated, such that experience data generated by the retrained surrogate model is prioritized for RL model training. In some embodiments, RL trainer 910 is configured to use both real and simulated experience data to retrain the RL model 912.” Paragraph 0219; “RL model 606 may also define a reward function that defines a value for entering a state. The reward function may be based on the current state of the system. In some embodiments, the reward function may be based on both the current state of the system and the action performed in the current state. In some embodiments, the reward function is a look-up table of reward values for a state or state-action pair. In some embodiments, the reward function is an efficiency or cost function based on the current state or action parameters.” Paragraph 0173)

Claim 2:
The cited prior art describes the method according to claim 1, wherein the input data set further includes respective future values of at least one predetermined state variable out of the one or more state variables for one or more subsequent time points after the current time point. (Lee: “Sample input data is generally categorized either as an exogenous parameter or an endogenous parameter. Exogenous parameters are parameters pertaining to the environment or otherwise outside of the control of the HVAC system, such as time of day, weather and weather forecasts, occupancy schedules, and occupancy trends.” Paragraph 0121; see the future values up to time corresponding to the end of peak hours on the current day T as described in paragraphs 0139, 0140, 0141)

Claim 3:
The cited prior art describes the method according to claim 1, wherein the input data set includes one or more variables, each variable indicating a corresponding goal of optimization in the reward. (Lee: “For use with a RL model, experience data can generally include, but is not limited to, the current state of a system at a given time, one or more actions performed by the controller in response to the current state of the system, a future or resultant state of the system caused by the performed action, and/or a determined reward responsive to performing the action in the current state. Experience data may also include other values, such as other valuation metrics or error measurements.” Paragraph 0063; “The state, action, reward, and next state can be stored in simulated experience storage 604 as a single experience data point.” Paragraph 0142)

Claim 4:
The cited prior art describes the method according to claim 1, wherein the technical system is a building management system for a building. (Lee: “In some embodiments, the one or more processing circuits are integrated within a building management system.” Paragraph 0013; “Referring particularly to FIG. 1, a perspective view of a building 10 is shown. Building 10 is served by a BMS. A BMS is, in general, a system of devices configured to control, monitor, and manage equipment in or around a building or building area. A BMS can include, for example, a HVAC system, a security system, a lighting system, a fire alerting system, any other system that is capable of managing building functions or devices, or any combination thereof.” Paragraph 0065)

Claim 5:
The cited prior art describes the method according to claim 4, wherein the one or more state variables comprise at least one of the following variables:
the occupancy of at least one room in the building; (Lee: “Exogenous parameters are parameters pertaining to the environment or otherwise outside of the control of the HVAC system, such as time of day, weather and weather forecasts, occupancy schedules, and occupancy trends.” Paragraph 0121)
the solar radiation from outside the building; and
one or more ambient variables around the building, the ambient temperature around the building.

Claim 6:
The cited prior art describes the method according to claim 4, wherein the one or more target variables comprise at least one of the following variables:
one or more variables within at least one room in the building, ; (Lee: “Herein, any training data, experience data, or measured data (such as that received from an HVAC system) can include timeseries data. A timeseries can include a series of values for the same point and a timestamp for each of the data values. For example, a timeseries for a point provided by a temperature sensor can include a series of temperature values measured by the temperature sensor and the corresponding times at which the temperature values were measured.” Paragraph 0162; “AHU controller 330 may also receive a measurement of the temperature of building zone 306 from a temperature sensor 364 located in building zone 306.” Paragraph 0084)
the cooling power for cooling at least one room in the building; and 
the heating power for heating at least one room in the building.

Claim 10:
The cited prior art describes the method according to claim 1, wherein the first data driven model is a neural network which includes one or more layers of LSTM cells and/or one or more layers with several multi-layer perceptrons. (Lee: “In some embodiments, both the zone sub-models and the AHU sub-models have the same structure. The structure of both the zone and the AHU sub-models may consist of two connected LSTMs that perform state estimation and prediction, respectively. The first LSTM, which is referred to as the “encoder”, may take the values of past u and y as inputs at each timestep. After iterating over all of the past timesteps, the internal states of the encoder may then be used as the initial states of the second LSTM, which is referred to as the “decoder”. For each timestep in the “decoder”, the LSTM may also receive u and y as inputs. However, the y values may now be the predictions made at the previous timestep. In some embodiments, filtering in the DNN model is completely in parallel, but during prediction, the zone sub-model outputs may be inputs to the AHU sub-model.” Paragraph 0151; “In some embodiments, the first experience data is generated using a simulation model or a surrogate model of the HVAC system, wherein the surrogate model can include a deep neural network.” Paragraph 0004); “In some embodiments, the dynamic model of predictive modeler 602 is a surrogate model, which may include a deep neural network (DNN) model of HVAC dynamics. Surrogate models are generally designed to simulate how a particular system may react to a given input. The surrogate model can include a DNN of any configuration, such as a convolution neural network (CNN), recurrent neural network (RNN), long short term memory (LSTM) architecture, LSTM sequence-to-sequence (LSTM-S2S) framework, or a gated recurrent unit (GRU). In some embodiments, the surrogate model may be trained by data generated by a calibrated simulation model so as to optimize the weights used in the edges and nodes of the neural network.” Paragraph 0168)

Claim 11:
The cited prior art describes the method according to claim 1, wherein the second data driven model is a neural network which includes a multi-layer perceptron. (Lee: “For example, in embodiments wherein the RL model 606 comprises a neural network, the neural network can be used to generate control values for inputs not used to train the neural network, despite not having been trained using a particular input state or input action.” Paragraph 0176; “In some embodiments, RL model 606 is a Q-Learning model. A Q-Learning model generates a Q value for a given state and action (i.e., a state-action pair) that represents the short-term and long-term value of performing the action in the given state. In some embodiments, the Q-Learning model may use a lookup table to determine a Q value for the state-action pair. The Q table may be updated based on training data to adjust the Q values for each state-action pair. In some embodiments, the Q-Learning model is a deep Q-Learning model wherein the model uses a neural network (e.g., a DNN, CNN, etc.) to generate a Q value for the given state-action pair input. The neural network may contain one or more hidden layers. The weights of the neural network may be updated based on a backpropagation algorithm using training data.” Paragraph 0174)

Claim 12:
The cited prior art describes a controller for a technical system, wherein the controller is configured to carry out a method according to claim 1. (Lee: “Referring now to FIG. 11, a RL model controller 1100 is shown, according to one embodiment. RL model controller 1100 may be configured within, or performed by a BMS, such as BMS 400. RL model controller 1100 may be configured as one or more distinct processing circuits 1102 that comprise at least a processor 1104 and memory 1106.” Paragraph 0231)

Claim 13:
The cited prior art describes a computer program product, comprising a computer readable hardware storage device having computer readable program code stored therein, said program code executable by a processor of a computer system to implement a method with program code stored on a machine-readable carrier for carrying out a method according to claim 1 when the program code is executed on a computer. (Lee: “The present disclosure contemplates methods, systems and program products on any machine-readable media for accomplishing various operations. The embodiments of the present disclosure can be implemented using existing computer processors, or by a special purpose computer processor for an appropriate system, incorporated for this or another purpose, or by a hardwired system. Embodiments within the scope of the present disclosure include program products comprising machine-readable media for carrying or having machine-executable instructions or data structures stored thereon. Such machine-readable media can be any available media that can be accessed by a general purpose or special purpose computer or other machine with a processor.” Paragraph 0241; “Referring now to FIG. 11, a RL model controller 1100 is shown, according to one embodiment. RL model controller 1100 may be configured within, or performed by a BMS, such as BMS 400. RL model controller 1100 may be configured as one or more distinct processing circuits 1102 that comprise at least a processor 1104 and memory 1106.” Paragraph 0231)

Claim 14:
The cited prior art describes a computer program with program code for carrying out a method according to claim 1 when the program code is executed on a computer. (Lee: “The present disclosure contemplates methods, systems and program products on any machine-readable media for accomplishing various operations. The embodiments of the present disclosure can be implemented using existing computer processors, or by a special purpose computer processor for an appropriate system, incorporated for this or another purpose, or by a hardwired system. Embodiments within the scope of the present disclosure include program products comprising machine-readable media for carrying or having machine-executable instructions or data structures stored thereon. Such machine-readable media can be any available media that can be accessed by a general purpose or special purpose computer or other machine with a processor.” Paragraph 0241; “Referring now to FIG. 11, a RL model controller 1100 is shown, according to one embodiment. RL model controller 1100 may be configured within, or performed by a BMS, such as BMS 400. RL model controller 1100 may be configured as one or more distinct processing circuits 1102 that comprise at least a processor 1104 and memory 1106.” Paragraph 0231)

Claim 15:
The cited prior art describes the method according to claim 6, wherein the one or more variables within at least one room in the building is the room temperature. (Lee: “Herein, any training data, experience data, or measured data (such as that received from an HVAC system) can include timeseries data. A timeseries can include a series of values for the same point and a timestamp for each of the data values. For example, a timeseries for a point provided by a temperature sensor can include a series of temperature values measured by the temperature sensor and the corresponding times at which the temperature values were measured.” Paragraph 0162; “AHU controller 330 may also receive a measurement of the temperature of building zone 306 from a temperature sensor 364 located in building zone 306.” Paragraph 0084)


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over 
U.S. Patent Application Publication No. 2021/0191342 (Lee) (cited by Applicant) in view of
U.S. Patent Application Publication No. 2022/0344937 (Hu).


Claim 7:
Lee does not explicitly describe control variables as described below.  However, Hu teaches the executing control variables as described below.  
The cited prior art describes the method according to claim 4, wherein the one or more control variables comprise at least one of the following variables:
a cooling setpoint indicating the maximum room temperature allowed for at least one room in the building; and (Lee: “Endogenous parameters are variables chosen by a control system, such as setpoints or operating conditions.” Paragraph 0121) (Hu: “The technical solution can validate the script by determining whether the script achieves the desired objective, such as reducing the load of the site. The system can measure the load, energy use, or other parameters (e.g., heating, cooling, or ventilation) of the building prior to executing the script, and compare the load or energy consumption of the building subsequent to execution of the script to validate whether the script achieves the desired objective while satisfying the constraints or conditions (e.g., within minimum and maximum temperature setpoints).” Paragraph 0028)
a heating setpoint indicating the minimum temperature allowed for at least one room in the building. (Lee: “Endogenous parameters are variables chosen by a control system, such as setpoints or operating conditions.” Paragraph 0121) (Hu: “The technical solution can validate the script by determining whether the script achieves the desired objective, such as reducing the load of the site. The system can measure the load, energy use, or other parameters (e.g., heating, cooling, or ventilation) of the building prior to executing the script, and compare the load or energy consumption of the building subsequent to execution of the script to validate whether the script achieves the desired objective while satisfying the constraints or conditions (e.g., within minimum and maximum temperature setpoints).” Paragraph 0028)
One of ordinary skill in the art would have recognized that applying the known technique of Lee, namely, training a hvac control model using a surrogate model, with the known techniques of Hu, namely, a load modification response system, would have yielded predictable results and resulted in an improved system.  Accordingly, applying the teachings of Lee to train a hvac control model based on various data with the teachings of Hu to control a load based on various data would have been recognized by those of ordinary skill in the art as resulting in an improved controller training system (i.e., the combination of the references provides for a control system training system using various data based on the teachings of a hvac training system based on various data in Lee and the teachings of load control system using various data in Hu).


Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over 
U.S. Patent Application Publication No. 2021/0191342 (Lee) (cited by Applicant) in view of
Brandi, Silvio, et al. "Deep reinforcement learning to optimise indoor temperature control and heating energy consumption in buildings." Energy and Buildings 224 (2020): 110225 (Brandi).


Claim 8:
Lee does not explicitly describe a reward as described below.  However, Brandi teaches the reward as described below.  
The cited prior art describes the method according to claim 6, wherein the reward is defined such that the reward is higher for predicted values of the room temperature lying between a predicted future value of the heating setpoint and a predicted future value of the cooling setpoint than for other values of room temperatures and that the reward raises with a decreasing predicted value of the cooling power and a decreasing predicted value of the heating power. (Brandi: see the reward function as illustrated in equation 5 and as described in section 6.4.2) (Lee: see the RL trainer 910 and RL model 912 as illustrated in figure 9 and the train a RL model using the simulated experience data 1004 as illustrated in figure 10; “In some embodiments, RL trainer 910 is configured to only use simulated experience data to retrain the RL model 912. In some such embodiments, RL trainer 910 may be configured to sample the simulated experience data based on when the simulated experience data as generated, such that experience data generated by the retrained surrogate model is prioritized for RL model training. In some embodiments, RL trainer 910 is configured to use both real and simulated experience data to retrain the RL model 912.” Paragraph 0219; “RL model 606 may also define a reward function that defines a value for entering a state. The reward function may be based on the current state of the system. In some embodiments, the reward function may be based on both the current state of the system and the action performed in the current state. In some embodiments, the reward function is a look-up table of reward values for a state or state-action pair. In some embodiments, the reward function is an efficiency or cost function based on the current state or action parameters.” Paragraph 0173)
One of ordinary skill in the art would have recognized that applying the known technique of Lee, namely, training a hvac control model using a surrogate model, with the known techniques of Brandi, namely, deep reinforcement learning to optimize indoor temperature control and heating energy consumption in buildings, would have yielded predictable results and resulted in an improved system.  Accordingly, applying the teachings of Lee to train a hvac control model based on various data with the teachings of Brandi to use various rewards for temperature control would have been recognized by those of ordinary skill in the art as resulting in an improved controller training system (i.e., the combination of the references provides for a control system training system using various data and rewards based on the teachings of a hvac training system based on various data in Lee and the teachings of temperature system control based on various rewards in Brandi).


Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over 
U.S. Patent Application Publication No. 2021/0191342 (Lee) (cited by Applicant) in view of
U.S. Patent Application Publication No. 2020/0226430 (Ahuja).


Claim 9:
Lee does not explicitly describe a probabilistic model as described below.  However, Ahuja teaches the probabilistic model as described below.  
The cited prior art describes the method according to claim 1, wherein 
the first data driven model is a probabilistic model providing predicted future values of the one or more target variables together with an uncertainty and (Ahuja: see the predictions with uncertainty from the probabilistic DNN models 403, 404 as illustrated in figure 4; “Probabilistic DNN model 403 is used to predict an object associated with data 401. Probabilistic DNN model 403 generates a prediction for observed data 401 and an uncertainty metric associated with that prediction.” Paragraph 0095)
the second data driven model incorporates the one or more uncertainties as one or more corresponding penalization terms in the reward. (Ahuja: “Newly annotated data 409 may be used to retrain Probabilistic DNN model 420. . . . Newly annotated data 410 may be used to retrain Probabilistic DNN model 420.” Paragraph 0095; “Collecting data associated with the object 1314. Annotating the data to more accurately identify of the object, according to the determined annotation method 1316. Retraining the model of the continuous learning device with the annotated data 1318.” Paragraph 0142) (Lee: see the RL trainer 910 and RL model 912 as illustrated in figure 9 and the train a RL model using the simulated experience data 1004 as illustrated in figure 10; “In some embodiments, RL trainer 910 is configured to only use simulated experience data to retrain the RL model 912. In some such embodiments, RL trainer 910 may be configured to sample the simulated experience data based on when the simulated experience data as generated, such that experience data generated by the retrained surrogate model is prioritized for RL model training. In some embodiments, RL trainer 910 is configured to use both real and simulated experience data to retrain the RL model 912.” Paragraph 0219; “RL model 606 may also define a reward function that defines a value for entering a state. The reward function may be based on the current state of the system. In some embodiments, the reward function may be based on both the current state of the system and the action performed in the current state. In some embodiments, the reward function is a look-up table of reward values for a state or state-action pair. In some embodiments, the reward function is an efficiency or cost function based on the current state or action parameters.” Paragraph 0173)
One of ordinary skill in the art would have recognized that applying the known technique of Lee, namely, training a hvac control model using a surrogate model, with the known techniques of Ahuja, namely, an identification system, would have yielded predictable results and resulted in an improved system.  Accordingly, applying the teachings of Lee to train a hvac control model based on various data with the teachings of Ahuja to train an identification system using various data would have been recognized by those of ordinary skill in the art as resulting in an improved controller training system (i.e., the combination of the references provides for a control system training system using various data based on the teachings of a hvac training system based on various data in Lee and the teachings of identification training using probabilistic model with uncertainties in Ahuja).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
U.S. Patent Application Publication No. 2021/0200163 describes reinforcement learning for controlling electrical equipment in buildings.
Ghane, Sara, et al. "Supply temperature control of a heating network with reinforcement learning." 2021 IEEE International Smart Cities Conference (ISC2). IEEE, 2021 describes supply temperature control of a heating network with reinforcement learning.
U.S. Patent Application Publication No. 2013/0325776 describes reinforcement learning for artificial neural networks.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRISTOPHER E EVERETT whose telephone number is (571)272-2851. The examiner can normally be reached Monday-Friday 8:00 am to 5:00 pm (Pacific).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Robert Fennema can be reached at 571-272-2748. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Christopher E. Everett/Primary Examiner, Art Unit 2117
Read full office action
Prosecution Timeline

Oct 18, 2023
Application Filed
Jan 26, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/660,300
Patent 12603509
MICROGRID WITH AUTOMATIC LOAD SHARING CONTROL DURING OFF-GRID STANDALONE OPERATION
2y 5m to grant Granted Apr 14, 2026
18/193,374
Patent 12602032
METHOD AND SYSTEM FOR MANAGING ENTERPRISE DIGITAL AUTOMATION PROCESSES
2y 5m to grant Granted Apr 14, 2026
17/924,788
Patent 12596352
System and method for controlling a production plant consisting of a plurality of plant parts, in particular a production plant for producing industrial goods such as metallic semi-finished products
2y 5m to grant Granted Apr 07, 2026
18/020,682
Patent 12596338
METHOD AND APPARATUS FOR PERFORMING OPTIMAL CONTROL
2y 5m to grant Granted Apr 07, 2026
18/065,970
Patent 12585251
METHOD FOR THE DISTRIBUTED CALCULATION OF COMPUTATIONAL TASKS
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
99%
With Interview (+23.6%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 830 resolved cases by this examiner. Grant probability derived from career allow rate.