Last updated: April 19, 2026
Application No. 17/359,919
METHOD OF SHORT-TERM LOAD FORECASTING VIA ACTIVE DEEP MULTI-TASK LEARNING, AND AN APPARATUS FOR THE SAME

Non-Final OA §101§103
Filed
Jun 28, 2021
Examiner
NGUYEN, TRI T
Art Unit
2128
Tech Center
2100 — Computer Architecture & Software
Assignee
Samsung Electronics Co., Ltd.
OA Round
3 (Non-Final)
Interview Optional

— +13.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 183 resolved cases, 2023–2026
Examiner Intelligence

NGUYEN, TRI T View full profile →
Grants 68% — above average
Career Allow Rate
125 granted / 183 resolved
+13.3% vs TC avg
Moderate +13% lift
Without
With
+13.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 10m
Avg Prosecution
31 currently pending
Career history
214
Total Applications
across all art units
Statute-Specific Performance

§101
15.7%
-24.3% vs TC avg
§103
57.5%
+17.5% vs TC avg
§102
7.2%
-32.8% vs TC avg
§112
14.2%
-25.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 183 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 07/22/2025 has been entered.

Response to Amendment
The amendment filed 07/22/2025 has been entered. Claims 1, 3-8, 10-13, 15-17 and 19-20 remain pending in the application. 

Response to Arguments
Applicant’s arguments, filed 07/22/2025, with respect to the rejections of claims 1, 8, 13 and 17 under 103 have been fully considered and are persuasive because of the amendments, therefore, the rejections have been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Cini et al. (Cluster-based Aggregate Load Forecasting with Deep Neural Networks) in view of Moon et al. (US Pub. 2021/0215370) in view of Yildiz et al. (Household electricity load forecasting using historical smart meter data with clustering and classification techniques) in view of Chen et al. (US Pub. 2022/0296930) and further in view of Li et al. (US Pub. 2017/0124466).
Applicant's arguments, filed 07/22/2025, with respect to the rejections of the claims under 101 have been fully considered and are not persuasive.
Applicant argues (pages 17-18)
Under the 2019 Revised Subject Matter Eligibility Guidance ("Revised Guidance"), a claim is patent eligible under Step 2A unless it (i) recites a judicial exception (Prong One) and (ii) fails to integrate that exception into a practical application (Prong Two). If no judicial exception is recited under Prong One, the eligibility analysis concludes in favor of eligibility. Even if a judicial exception is recited, eligibility is still found under Prong Two if the claim integrates that exception into a practical application. Only if the exception is not so integrated does the analysis proceed to Step 2B.
Applicant respectfully submits that independent claims 1 and 8 do not recite a judicial exception, such as an abstract idea involving a mental process or a mathematical concept.
In particular, the pending claims do not recite steps that can be performed mentally or that correspond to a mere mathematical formula. Rather, the claims are directed to a specific technological solution implemented by way of a particular LS TM-based multi-task deep learning model, which is structured with:
• an input layer configured to receive input data for multiple electricity consuming objects in a cluster,
• an LSTM block comprising a plurality of LSTM layers shared across prediction tasks,
• a dense layer shared across tasks, and
• an output layer producing respective predictions,
• followed by an aggregation layer that combines those outputs to generate a cluster-level forecast.
This architecture is not a mental process and not a mathematical concept, but rather a specific application of computer technology to solve a computer-centric problem: improving the accuracy and scalability of electricity load forecasting using cluster-based modeling and multitask LSTM networks. The claimed invention cannot practically be performed in the human mind or by mere paper-and-pencil calculation.
Thus, under Step 2A, Prong One, the claims are not directed to a judicial exception, and the eligibility inquiry should conclude.
In response
As stated in the 101 rejections section below and in the previous Office Action, the claim limitations clearly recite mental processes since the claim reciting the steps of “predicting, based on an output of the first cluster model, a future electricity consumption for each of the first electricity consuming objects of the first cluster” and “the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data …”. These steps are based on observations, evaluations, judgments or opinion that are performable in the human mind or with the aid of pencil and paper (see MPEP 2106.04(a)(2)(III). For example, a user can predict that an air conditioner will consume more power in July and August, while a heater will consume less power in the same months. The user can also group the appliances or households together based on the amount of power consumption.
The claim also recites the model using the joint loss function to perform and optimize a task, thus reciting a mathematical concept.
Therefore, the claim recites an abstract idea.
The Applicant then argued that the claim recites the structure of the model including multiple layers such as input layer, LSTM layers, dense layer and output layer, and this structure is not a mental process and not a mathematical concept. However, a structure is not a process. To determine whether the claim recites a mental process, a process (not a structure) is analyzed to determine if that process can be performed in a human mind. The structure of the model is an additional limitation which would be analyzed under step 2A Prong2 and 2B if the claim recites an abstract idea.

Applicant argues (pages 18-19)
However, even assuming, for the sake of argument, that the claims are viewed as reciting a judicial exception (which Applicant does not concede), the claims nonetheless integrate any such exception into a practical application under Step 2A, Prong Two. Specifically:
• The claims apply the alleged abstract idea (e.g., forecasting) in the context of a concrete, technical system using a structured neural network architecture configured for cluster-specific load prediction tasks.
• The claimed model is trained and applied in conjunction with real-world environmental and calendar data received from external electronic devices via a communication interface, further grounding the invention in a specific technological context.
• The output of the prediction tasks is not merely theoretical, but is processed through an aggregation neural network layer to generate actionable and interpretable forecasting results.
• These features impose a meaningful limit on the use of any nominal exception and are not simply a drafting effort to monopolize an abstract idea. Rather, they reflect a specific improvement to computer functionality and a practical implementation of advanced machine learning techniques to solve a problem in the field of smart energy management.
Furthermore, the claims have been amended to recite a step of "displaying content corresponding to the predicted future electricity consumption." This step is not merely a passive or generic output function, but rather represents a user-facing interface or system-level component that enables the results of the prediction model to be consumed, visualized, and acted upon by external systems or human users. For example, the predicted values may be shown in a time-series chart or dashboard used by utility providers or smart energy systems to optimize electricity distribution.
The display step is therefore not insignificant extra-solution activity, but rather a critical component that ties the forecasting logic into a real-world, actionable application. It reinforces the claim's grounding in practical use and technological implementation, thereby further integrating any nominal exception into a practical application under Step 2A, Prong Two.
Accordingly, the claims are patent-eligible under Step 2A, Prong Two.
In response
This judicial exception is not integrated into a practical application. As stated in the 101 section below, the claim recites the additional elements that either amount to insignificant extra-solution activities of data gathering and transmitting, linking the use of a judicial exception to a particular technological environment or field of use, insignificant extra-solution activities of data outputting and mere instructions to apply the exception using the generic computer components. These additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Also, the amended limitation of “displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster”. This limitation amounts to insignificant extra-solution activities of data outputting (MPEP 2106.05(g)). The courts have similarly found limitations directed to [displaying, presenting, outputting] a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.")
Further, the claim limitations do not show an improvement to the functioning of a computer or to any other technology or technical field. The claim does not recite how the model is trained or operated to implement the process such that the model/computer is improved. The claim only recites using a machine learning technology as a tool to perform a mental process of “forecasting the loads”, wherein, the model is trained using a certain training data to maybe improve the forecasting process. However, an improvement in an abstract idea itself is not an improvement in technology. The claim must recite additional elements which provide the improvement.

Applicant argues (page 19)
Even under Step 2B, assuming arguendo that the claims are directed to a judicial exception and not integrated into a practical application (which Applicant respectfully disputes), the claims nonetheless recite significantly more than the judicial exception.
The particular combination of elements-including the LS TM-based multi-task learning architecture, joint loss optimization, clustering based on historical consumption and external data, and final aggregation through a neural network-is not well-understood, routine, or conventional. The prior art fails to disclose or suggest such a combination. Nor does it show or imply the specific system-level architecture recited in the claims. These technical features reflect a specific, inventive concept that improves the field of load forecasting and computer-based modeling itself.
In response
As stated in the 101 rejections below, Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a Long Short-Term Memory”, “a communication interface”, “external electronic devices” and “a first cluster model” to perform the predicting step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.
The additional elements of “receiving, via a communication interface, present environmental data and present calendar data …” and “inputting, into a first cluster model, the present environmental data and the present calendar data” amount to insignificant extra-solution activity related to mere data gathering and transmitting (MPEP 2106.05(g)).  The courts have found limitations directed to receiving and transmitting information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”).
The additional elements of “wherein the first cluster model comprises MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that is specific to the … the model including … wherein the present environmental data refers to … wherein the present calendar data refers to … and wherein the future electricity consumption refers to … amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not amount to significantly more than the judicial exception (see MPEP 2106.05(h)).
The claim recites the additional element of “displaying, via display, a representation of the future electricity consumption …”. The courts have similarly found limitations directed to [displaying, presenting, outputting] a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.")
The additional elements of “wherein the first cluster model is trained based on …”, and “a fully connected neural network aggregation layer configured to generate an aggregated load forecast for the first cluster” amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 3-8, 10-13, 15-17 and 19-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites a method which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “predicting, based on an output of the first cluster model, a future electricity consumption for each of the first electricity consuming objects of the first cluster”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “predicting” in the context of this claim encompasses the user, based on some given data such as historical data and/or weather data, predicting the air conditioner will consume more power in July, while the heater will consume less power in the same month.
The limitation of “wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “clustering” in the context of this claim encompasses the user grouping the appliances based on the amount of power consuming of each appliance.
The limitations of “the first cluster model comprises a first multi-task learning process having a first joint loss function, and each input of the first multi-task learning process corresponds to a respective electricity consuming object of the first cluster … wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster” recite mathematical concepts.
Therefore, the claim recites an abstract idea.
Step 2A (prong 2):
This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of “a Long Short-Term Memory”, “a communication interface”, “external electronic devices” and “a first cluster model”. The additional elements are recited at a high-level of generality (i.e., as a generic device performing the generic computer functions) such that they amount no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
The additional elements of “receiving, via a communication interface, present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster … from external electronic devices” and “inputting, into a first cluster model, the present environmental data and the present calendar data” amount to insignificant extra-solution activities of data gathering and transmitting which do not amount to significantly more than the abstract idea (MPEP 2106.05(g)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The limitations of “wherein the first cluster model comprises MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that is specific to the first cluster”, “the model including: an input layer … an LSTM block comprising a plurality of LSTM layers … a dense layer … and an output layer including outputs for respective prediction tasks”, “wherein the present environmental data refers to current information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data”, “wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year” and “wherein the future electricity consumption refers to an amount of consumption of the electricity over a predetermined period of time in the future” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not integrate into a practical application (see MPEP 2106.05(h)).
The additional element of “displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster” amounts to insignificant extra-solution activities of data outputting (MPEP 2106.05(g)). Accordingly, these additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
The claim recites the additional elements of “wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects”, and “wherein predicted future electricity consumption values for the electricity consuming objects are input into a fully connected neural network aggregation layer configured to generate an aggregated load forecast for the first cluster”.  These limitations are recited at a high-level of generality (i.e., as a generic device performing the generic computer function of training) such that they amount to no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a Long Short-Term Memory”, “a communication interface”, “external electronic devices” and “a first cluster model” to perform the predicting step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.
The additional elements of “receiving, via a communication interface, present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster … from external electronic devices” and “inputting, into a first cluster model, the present environmental data and the present calendar data” are recited at a high level of generality and amount to insignificant extra-solution activity related to mere data gathering and transmitting (MPEP 2106.05(g)).  The courts have found limitations directed to receiving and transmitting information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”).
The additional elements of “wherein the first cluster model comprises MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that is specific to the first cluster”, “the model including: an input layer … an LSTM block comprising a plurality of LSTM layers … a dense layer … and an output layer including outputs for respective prediction tasks”, “wherein the present environmental data refers to current information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data”, “wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year” and “wherein the future electricity consumption refers to an amount of consumption of the electricity over a predetermined period of time in the future” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not amount to significantly more than the judicial exception (see MPEP 2106.05(h)).
The claim recites the additional element of “displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster”. The courts have similarly found limitations directed to [displaying, presenting, outputting] a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.")
The additional elements of “wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects”, and “wherein predicted future electricity consumption values for the electricity consuming objects are input into a fully connected neural network aggregation layer configured to generate an aggregated load forecast for the first cluster” amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.

Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites a method which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “predicting, based on an output of the second cluster model, a future electricity consumption for each of the first electricity consuming objects of the second cluster”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “predicting” in the context of this claim encompasses the user, based on some given data such as historical data and/or weather data, predicting the air conditioner will consume more power in July, while the heater will consume less power in the same month.
The limitation of “obtaining a final forecast based on the predicted future electricity consumption of the first and the second electricity consuming objects of the first and the second clusters”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “obtaining” in the context of this claim encompasses the user combining two or more values to get a total value (for example a total loads).
The limitation of “wherein the second cluster model comprises a second multi-task learning process having a second joint loss function, and each input of the second multi-task learning process corresponds to a respective electricity consuming object of the second cluster” recites mathematical concepts.
Step 2A (prong 2):
This judicial exception is not integrated into a practical application.
The additional element of “inputting, into a second cluster model, present environmental data and present calendar data corresponding to second electricity consuming objects of a second cluster among the plurality of clusters” amounts to insignificant extra-solution activities of data gathering or transmitting which does not amount to significantly more than the abstract idea (MPEP 2106.05(g)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
The claim recites the additional element of “wherein the second cluster model is trained based on second reference electricity consumption data, second reference environmental data, and second reference calendar data, with regard to the second cluster among the plurality of clusters”.  This limitation is recited at a high-level of generality (i.e., as a generic device performing the generic computer function of training) such that it amounts to no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “inputting, into a second cluster model, present environmental data and present calendar data corresponding to second electricity consuming objects of a second cluster among the plurality of clusters” is recited at a high level of generality and amounts to insignificant extra-solution activity related to mere data gathering or transmitting (MPEP 2106.05(g)).  The courts have found limitations directed to receiving and transmitting information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”).
The additional element of “wherein the second cluster model is trained based on second reference electricity consumption data, second reference environmental data, and second reference calendar data, with regard to the second cluster among the plurality of clusters” amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.

Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites a method which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “the first and the second joint loss functions of the first and the second multi-task learning processes treat all learning tasks with equal importance” recites mathematical concepts.
Step 2A Prong Two, Step 2B:
The claim does not include any additional elements.

Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites a method which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “obtaining the final forecast comprises combining the predicted future electricity consumption of the electricity consuming objects of the first and the second clusters”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “obtaining” in the context of this claim encompasses the user combining two or more values to get a total value (for example a total electricity consumption).
Step 2A (prong 2):
This judicial exception is not integrated into a practical application.
The claim recites the additional element of “combining the predicted future electricity consumption of the electricity consuming objects of the first and the second clusters using a fully connected neural network output layer”.  This limitation is recited at a high-level of generality (i.e., as a generic device performing the generic computer function of combining) such that it amounts to no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
The additional element of “combining the predicted future electricity consumption of the electricity consuming objects of the first and the second clusters using a fully connected neural network output layer” amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.

Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites a method which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 2):
This judicial exception is not integrated into a practical application. The claim recites an additional element of “the present environmental data and present calendar data corresponding to the first electricity consuming objects of the first cluster and the present environmental data and present calendar data corresponding to the second electricity consuming objects of the second cluster comprise time series data sets in which a final time corresponds to the current time”. This limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitation that amounts to merely indicating a field of use or technological environment in which to apply a judicial exception does not integrate into a practical application (see MPEP 2106.05(h)).
Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “the present environmental data and present calendar data corresponding to the first electricity consuming objects of the first cluster and the present environmental data and present calendar data corresponding to the second electricity consuming objects of the second cluster comprise time series data sets in which a final time corresponds to the current time” amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitation that amounts to merely indicating a field of use or technological environment in which to apply a judicial exception does not amount to significantly more than the judicial exception (see MPEP 2106.05(h)).

Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites a method which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “obtaining a final forecast based on predicted future electricity consumption of the electricity consuming objects of the first through the Nth clusters”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “obtaining” in the context of this claim encompasses the user combining two or more values to get a total value (for example a total loads).
The limitation of “wherein the first through the Nth joint loss functions of the multi-task learning processes treat all learning tasks with equal importance” recites mathematical concepts.
Step 2A (prong 2):
This judicial exception is not integrated into a practical application.
The limitations of “wherein the first through Nth cluster models comprise multi-task learning processes having the first joint loss function through an Nth joint loss function, respectively” and “wherein the inputs of the multi-task learning processes correspond to electricity consuming objects of corresponding clusters” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not integrate into a practical application (see MPEP 2106.05(h)).
The claim recites the additional element of “obtaining cluster models for each of the first through the Nth clusters”.  This limitation is recited at a high-level of generality (i.e., as a generic device performing the generic computer function) such that it amounts to no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
The additional elements of “wherein the first through Nth cluster models comprise multi-task learning processes having the first joint loss function through an Nth joint loss function, respectively” and “wherein the inputs of the multi-task learning processes correspond to electricity consuming objects of corresponding clusters” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not amount to significantly more than the judicial exception (see MPEP 2106.05(h)).
The additional element of “obtaining cluster models for each of the first through the Nth clusters” amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.

Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites an apparatus which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “predict, based on an output of the first cluster model, a future electricity consumption for each of the first electricity consuming objects of the first cluster”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “predict” in the context of this claim encompasses the user, based on some given data such as historical data and/or weather data, predicting the air conditioner will consume more power in July, while the heater will consume less power in the same month.
The limitation of “wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “clustering” in the context of this claim encompasses the user grouping the appliances based on the amount of power consuming of each appliance.
The limitations of “the first cluster model comprises a first multi-task learning process having a first joint loss function, and each input of the first multi-task learning process corresponds to a respective electricity consuming object of the first cluster … wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster” recite mathematical concepts.
Therefore, the claim recites an abstract idea.
Step 2A (prong 2):
This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of “an apparatus”, “a Long Short-Term Memory”, “a communication interface”, “a display”, “one memory”, “processor”, “external electronic devices” and “a first cluster model”. The additional elements are recited at a high-level of generality (i.e., as a generic device performing the generic computer functions) such that they amount no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
The additional elements of “receive present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster … from external electronic devices” and “input, into a first cluster model, the present environmental data and the present calendar data” amount to insignificant extra-solution activities of data gathering and transmitting which do not amount to significantly more than the abstract idea (MPEP 2106.05(g)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The limitations of “wherein the first cluster model comprises MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that is specific to the first cluster”, “the model including: an input layer … an LSTM block comprising a plurality of LSTM layers … a dense layer … and an output layer including outputs for respective prediction tasks”, “wherein the present environmental data refers to current information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data”, “wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year” and “wherein the future electricity consumption refers to an amount of consumption of the electricity over a predetermined period of time in the future” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not integrate into a practical application (see MPEP 2106.05(h)).
The additional element of “display, via the display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster” amounts to insignificant extra-solution activities of data outputting (MPEP 2106.05(g)). Accordingly, these additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
The claim recites the additional elements of “wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects”, and “wherein predicted future electricity consumption values for the electricity consuming objects are input into a fully connected neural network aggregation layer configured to generate an aggregated load forecast for the first cluster”.  These limitations are recited at a high-level of generality (i.e., as a generic device performing the generic computer function of training) such that they amount to no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “an apparatus”, “a Long Short-Term Memory”, “a communication interface”, “a display”, “one memory”, “processor”, “external electronic devices” and “a first cluster model” to perform the predicting step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.
The additional elements of “receive present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster … from external electronic devices” and “input, into a first cluster model, the present environmental data and the present calendar data” are recited at a high level of generality and amount to insignificant extra-solution activity related to mere data gathering and transmitting (MPEP 2106.05(g)).  The courts have found limitations directed to receiving and transmitting information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”).
The additional elements of “wherein the first cluster model comprises MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that is specific to the first cluster”, “the model including: an input layer … an LSTM block comprising a plurality of LSTM layers … a dense layer … and an output layer including outputs for respective prediction tasks”, “wherein the present environmental data refers to current information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data”, “wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year” and “wherein the future electricity consumption refers to an amount of consumption of the electricity over a predetermined period of time in the future” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not amount to significantly more than the judicial exception (see MPEP 2106.05(h)).
The claim recites the additional element of “display, via the display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster”. The courts have similarly found limitations directed to [displaying, presenting, outputting] a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.")
The additional elements of “wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects”, and “wherein predicted future electricity consumption values for the electricity consuming objects are input into a fully connected neural network aggregation layer configured to generate an aggregated load forecast for the first cluster” amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.

Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites an apparatus which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “predict, based on an output of the second cluster model, a future electricity consumption for each of the first electricity consuming objects of the second cluster”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “predicting” in the context of this claim encompasses the user, based on some given data such as historical data and/or weather data, predicting the air conditioner will consume more power in July, while the heater will consume less power in the same month.
The limitation of “obtain a final forecast based on the predicted future electricity consumption of the first and the second electricity consuming objects of the first and the second clusters”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “obtaining” in the context of this claim encompasses the user combining two or more values to get a total value (for example a total loads).
The limitation of “wherein the second cluster model comprises a second multi-task learning process having a second joint loss function, and each input of the second multi-task learning process corresponds to a respective electricity consuming object of the second cluster” recites mathematical concepts.
Step 2A (prong 2):
This judicial exception is not integrated into a practical application.
The additional element of “input, into a second cluster model, present environmental data and present calendar data corresponding to second electricity consuming objects of a second cluster among the plurality of clusters” amounts to insignificant extra-solution activities of data gathering or transmitting which does not amount to significantly more than the abstract idea (MPEP 2106.05(g)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
The claim recites the additional element of “wherein the second cluster model is trained based on second reference electricity consumption data, second reference environmental data, and second reference calendar data, with regard to the second cluster among the plurality of clusters”.  This limitation is recited at a high-level of generality (i.e., as a generic device performing the generic computer function of training) such that it amounts to no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “input, into a second cluster model, present environmental data and present calendar data corresponding to second electricity consuming objects of a second cluster among the plurality of clusters” is recited at a high level of generality and amounts to insignificant extra-solution activity related to mere data gathering or transmitting (MPEP 2106.05(g)).  The courts have found limitations directed to receiving and transmitting information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”).
The additional element of “wherein the second cluster model is trained based on second reference electricity consumption data, second reference environmental data, and second reference calendar data, with regard to the second cluster among the plurality of clusters” amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.

Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites an apparatus which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “the first and the second joint loss functions of the first and the second multi-task learning processes treat all learning tasks with equal importance” recites mathematical concepts.
Step 2A Prong Two, Step 2B:
The claim does not include any additional elements.

Claim 12 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites an apparatus which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “obtain a final forecast based on predicted future electricity consumption of the electricity consuming objects of the first through the Nth clusters”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “obtaining” in the context of this claim encompasses the user combining two or more values to get a total value (for example a total loads).
The limitation of “wherein the first through the Nth joint loss functions of the multi-task learning processes treat all learning tasks with equal importance” recites mathematical concepts.
Step 2A (prong 2):
This judicial exception is not integrated into a practical application.
The limitations of “wherein the first through Nth cluster models comprise multi-task learning processes having the first joint loss function through an Nth joint loss function, respectively” and “wherein the inputs of the multi-task learning processes correspond to electricity consuming objects of corresponding clusters” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not integrate into a practical application (see MPEP 2106.05(h)).
The claim recites the additional element of “obtain cluster models for each of the first through the Nth clusters”.  This limitation is recited at a high-level of generality (i.e., as a generic device performing the generic computer function) such that it amounts to no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
The additional elements of “wherein the first through Nth cluster models comprise multi-task learning processes having the first joint loss function through an Nth joint loss function, respectively” and “wherein the inputs of the multi-task learning processes correspond to electricity consuming objects of corresponding clusters” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not amount to significantly more than the judicial exception (see MPEP 2106.05(h)).
The additional element of “obtain cluster models for each of the first through the Nth clusters” amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.

Claim 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites a non-transitory computer-readable medium which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “predict, based on an output of the first cluster model, a future electricity consumption for each of the first electricity consuming objects of the first cluster”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “predicting” in the context of this claim encompasses the user, based on some given data such as historical data and/or weather data, predicting the air conditioner will consume more power in July, while the heater will consume less power in the same month.
The limitation of “wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “clustering” in the context of this claim encompasses the user grouping the appliances based on the amount of power consuming of each appliance.
The limitations of “the first cluster model comprises a first multi-task learning process having a first joint loss function, and each input of the first multi-task learning process corresponds to a respective electricity consuming object of the first cluster … wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster” recite mathematical concepts.
Therefore, the claim recites an abstract idea.
Step 2A (prong 2):
This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of “A non-transitory computer-readable medium”, “processor”, “a Long Short-Term Memory”, “a communication interface”, “external electronic devices” and “a first cluster model”. The additional elements are recited at a high-level of generality (i.e., as a generic device performing the generic computer functions) such that they amount no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
The additional elements of “receive, via a communication interface, present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster … from external electronic devices” and “input, into a first cluster model, the present environmental data and the present calendar data” amount to insignificant extra-solution activities of data gathering and transmitting which do not amount to significantly more than the abstract idea (MPEP 2106.05(g)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The limitations of “wherein the first cluster model comprises MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that is specific to the first cluster”, “the model including: an input layer … an LSTM block comprising a plurality of LSTM layers … a dense layer … and an output layer including outputs for respective prediction tasks”, “wherein the present environmental data refers to current information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data”, “wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year” and “wherein the future electricity consumption refers to an amount of consumption of the electricity over a predetermined period of time in the future” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not integrate into a practical application (see MPEP 2106.05(h)).
The additional element of “display, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster” amounts to insignificant extra-solution activities of data outputting (MPEP 2106.05(g)). Accordingly, these additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
The claim recites the additional elements of “wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects”, and “wherein predicted future electricity consumption values for the electricity consuming objects are input into a fully connected neural network aggregation layer configured to generate an aggregated load forecast for the first cluster”.  These limitations are recited at a high-level of generality (i.e., as a generic device performing the generic computer function of training) such that they amount to no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “A non-transitory computer-readable medium”, “processor”, “a Long Short-Term Memory”, “a communication interface”, “external electronic devices” and “a first cluster model” to perform the predicting step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.
The additional elements of “receive, via a communication interface, present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster … from external electronic devices” and “input, into a first cluster model, the present environmental data and the present calendar data” are recited at a high level of generality and amount to insignificant extra-solution activity related to mere data gathering and transmitting (MPEP 2106.05(g)).  The courts have found limitations directed to receiving and transmitting information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”).
The additional elements of “wherein the first cluster model comprises MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that is specific to the first cluster”, “the model including: an input layer … an LSTM block comprising a plurality of LSTM layers … a dense layer … and an output layer including outputs for respective prediction tasks”, “wherein the present environmental data refers to current information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data”, “wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year” and “wherein the future electricity consumption refers to an amount of consumption of the electricity over a predetermined period of time in the future” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not amount to significantly more than the judicial exception (see MPEP 2106.05(h)).
The claim recites the additional element of “display, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster”. The courts have similarly found limitations directed to [displaying, presenting, outputting] a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.")
The additional elements of “wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects”, and “wherein predicted future electricity consumption values for the electricity consuming objects are input into a fully connected neural network aggregation layer configured to generate an aggregated load forecast for the first cluster” amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.

Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites a non-transitory computer-readable medium which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “predict, based on an output of the second cluster model, a future electricity consumption for each of the first electricity consuming objects of the second cluster”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “predicting” in the context of this claim encompasses the user, based on some given data such as historical data and/or weather data, predicting the air conditioner will consume more power in July, while the heater will consume less power in the same month.
The limitation of “obtain a final forecast based on the predicted future electricity consumption of the first and the second electricity consuming objects of the first and the second clusters”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “obtaining” in the context of this claim encompasses the user combining two or more values to get a total value (for example a total loads).
The limitation of “wherein the second cluster model comprises a second multi-task learning process having a second joint loss function, and each input of the second multi-task learning process corresponds to a respective electricity consuming object of the second cluster” recites mathematical concepts.
Step 2A (prong 2):
This judicial exception is not integrated into a practical application.
The additional element of “input, into a second cluster model, present environmental data and present calendar data corresponding to second electricity consuming objects of a second cluster among the plurality of clusters” amounts to insignificant extra-solution activities of data gathering or transmitting which does not amount to significantly more than the abstract idea (MPEP 2106.05(g)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
The claim recites the additional element of “wherein the second cluster model is trained based on second reference electricity consumption data, second reference environmental data, and second reference calendar data, with regard to the second cluster among the plurality of clusters”.  This limitation is recited at a high-level of generality (i.e., as a generic device performing the generic computer function of training) such that it amounts to no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “input, into a second cluster model, present environmental data and present calendar data corresponding to second electricity consuming objects of a second cluster among the plurality of clusters” is recited at a high level of generality and amounts to insignificant extra-solution activity related to mere data gathering or transmitting (MPEP 2106.05(g)).  The courts have found limitations directed to receiving and transmitting information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”).
The additional element of “wherein the second cluster model is trained based on second reference electricity consumption data, second reference environmental data, and second reference calendar data, with regard to the second cluster among the plurality of clusters” amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.

Claim 16 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites a non-transitory computer-readable medium which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “the first and the second joint loss functions of the first and the second multi-task learning processes treat all learning tasks with equal importance” recites mathematical concepts.
Step 2A Prong Two, Step 2B:
The claim does not include any additional elements.

Claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites a method which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “obtaining reference electricity consumption data, reference environmental data, and reference calendar data for a plurality of electricity consuming objects over a period of time”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “obtaining” in the context of this claim encompasses the user collecting data such as weather data, time data and power consumption data of the appliances to analyze.
The limitation of “clustering the plurality of electricity consuming objects into a plurality of clusters based on the obtained reference electricity consumption data, the plurality of clusters comprising a first cluster and a second cluster”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “clustering” in the context of this claim encompasses the user grouping the appliances based on the amount of power consumption of each appliance.
The limitation of “predicting, based on an output of the first cluster model, a future electricity consumption for each of the first electricity consuming objects of the first cluster”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “predicting” in the context of this claim encompasses the user, based on some given data such as historical data and/or weather data, predicting the air conditioner will consume more power in July, while the heater will consume less power in the same month.
The limitation of “wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “clustering” in the context of this claim encompasses the user grouping the appliances based on the amount of power consuming of each appliance.
The limitations of “the first cluster model comprises a first multi-task learning process having a first joint loss function, and each input of the first multi-task learning process corresponds to a respective electricity consuming object of the first cluster … wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster” recite mathematical concepts.
Therefore, the claim is directed to an abstract idea.
Step 2A (prong 2):
This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of “a Long Short-Term Memory” and “a first cluster model”. The additional elements are recited at a high-level of generality (i.e., as a generic device performing the generic computer functions) such that they amount no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
The additional elements of “obtaining a first cluster model based on: first reference electricity consumption data, among the obtained reference electricity consumption data, corresponding to first electricity consuming objects of the first cluster; first reference environmental data, among the obtained reference environmental data, corresponding to the first electricity consuming objects of the first cluster; and first reference calendar data, among the obtained reference calendar data, corresponding to the first electricity consuming objects of the first cluster”, “wherein the first cluster model comprises MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that is specific to the first cluster”, “the model including: an input layer … an LSTM block comprising a plurality of LSTM layers … a dense layer … and an output layer including outputs for respective prediction tasks”, “wherein the present environmental data refers to current information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data … wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year … wherein the future electricity consumption refers to an amount of consumption of the electricity over a predetermined period of time in the future” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitation that amounts to merely indicating a field of use or technological environment in which to apply a judicial exception does not integrate into a practical application (see MPEP 2106.05(h)).
The additional elements of “inputting, into a first cluster model, the present environmental data and the present calendar data” amount to insignificant extra-solution activities of data gathering or transmitting which do not amount to significantly more than the abstract idea (MPEP 2106.05(g)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The additional element of “displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster” amounts to insignificant extra-solution activities of data outputting (MPEP 2106.05(g)). Accordingly, these additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
The claim recites the additional elements of “wherein predicted future electricity consumption values for the electricity consuming objects are input into a fully connected neural network aggregation layer configured to generate an aggregated load forecast for the first cluster”.  These limitations are recited at a high-level of generality (i.e., as a generic device performing the generic computer function of training) such that they amount to no more than mere instructions to apply the exception using the generic computer components (MPEP 2106.05(f)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a Long Short-Term Memory” and “a first cluster model” to perform the predicting step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.
The additional elements of “obtaining a first cluster model based on: first reference electricity consumption data, among the obtained reference electricity consumption data, corresponding to first electricity consuming objects of the first cluster; first reference environmental data, among the obtained reference environmental data, corresponding to the first electricity consuming objects of the first cluster; and first reference calendar data, among the obtained reference calendar data, corresponding to the first electricity consuming objects of the first cluster”, “wherein the first cluster model comprises MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that is specific to the first cluster”, “the model including: an input layer … an LSTM block comprising a plurality of LSTM layers … a dense layer … and an output layer including outputs for respective prediction tasks” and “wherein the present environmental data refers to current information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data … wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year … wherein the future electricity consumption refers to an amount of consumption of the electricity over a predetermined period of time in the future” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not amount to significantly more than the judicial exception (see MPEP 2106.05(h)).
The additional elements of “inputting, into a first cluster model, the present environmental data and the present calendar data” are recited at a high level of generality and amount to insignificant extra-solution activity related to mere data gathering or transmitting (MPEP 2106.05(g)).  The courts have found limitations directed to receiving and transmitting information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”).
The claim recites the additional element of “displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster”. The courts have similarly found limitations directed to [displaying, presenting, outputting] a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.")
The additional elements of “wherein predicted future electricity consumption values for the electricity consuming objects are input into a fully connected neural network aggregation layer configured to generate an aggregated load forecast for the first cluster” amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.

Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites a method which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “predicting, based on an output of the second cluster model, a future electricity consumption for each of the first electricity consuming objects of the second cluster”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “predicting” in the context of this claim encompasses the user, based on some given data such as historical data and/or weather data, predicting the air conditioner will consume more power in July, while the heater will consume less power in the same month.
The limitation of “obtaining a final forecast based on the predicted future electricity consumption of the first and the second electricity consuming objects of the first and the second clusters”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “obtaining” in the context of this claim encompasses the user combining two or more values to get a total value (for example a total loads).
The limitation of “wherein the second cluster model comprises a second multi-task learning process having a second joint loss function, and each input of the second multi-task learning process corresponds to a respective electricity consuming object of the second cluster” recites mathematical concepts.
Step 2A (prong 2):
This judicial exception is not integrated into a practical application.
The additional elements of “obtaining a second cluster model based on: second reference electricity consumption data, among the obtained reference electricity consumption data, corresponding to second electricity consuming objects of the second cluster; second reference environmental data, among the obtained environmental data, corresponding to the second electricity consuming objects of the second cluster; and second reference calendar data, among the obtained reference calendar data, corresponding to the second electricity consuming objects of the second cluster” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitation that amounts to merely indicating a field of use or technological environment in which to apply a judicial exception does not integrate into a practical application (see MPEP 2106.05(h)).
The additional element of “inputting, into a second cluster model, present environmental data and present calendar data corresponding to second electricity consuming objects of a second cluster among the plurality of clusters” amount to insignificant extra-solution activities of data gathering or transmitting which does not amount to significantly more than the abstract idea (MPEP 2106.05(g)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “obtaining a second cluster model based on: second reference electricity consumption data, among the obtained reference electricity consumption data, corresponding to second electricity consuming objects of the second cluster; second reference environmental data, among the obtained environmental data, corresponding to the second electricity consuming objects of the second cluster; and second reference calendar data, among the obtained reference calendar data, corresponding to the second electricity consuming objects of the second cluster” amount to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not amount to significantly more than the judicial exception (see MPEP 2106.05(h)).
The additional elements of “inputting, into a second cluster model, present environmental data and present calendar data corresponding to second electricity consuming objects of a second cluster among the plurality of clusters” are recited at a high level of generality and amount to insignificant extra-solution activity related to mere data gathering and transmitting (MPEP 2106.05(g)).  The courts have found limitations directed to receiving and transmitting information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”).

Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1:
The claim recites a method which falls within at least one of the four statutory categories of patent eligible subject matter.
Step 2:
Step 2A (prong 1):
The limitation of “the first and the second joint loss functions of the first and the second multi-task learning processes treat all learning tasks with equal importance” recites mathematical concepts.
Step 2A Prong Two, Step 2B:
The claim does not include any additional elements.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-8, 10-13, 15-17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Cini et al. (Cluster-based Aggregate Load Forecasting with Deep Neural Networks) in view of Moon et al. (US Pub. 2021/0215370) in view of Yildiz et al. (Household electricity load forecasting using historical smart meter data with clustering and classification techniques) in view of Chen et al. (US Pub. 2022/0296930) and further in view of Li et al. (US Pub. 2017/0124466).
As per claim 1, Cini teaches a method of load forecasting using a Long Short-Term Memory (LSTM) based multi-task deep learning [abstract, “predict the load requested by the grid”; Section I, Pg. 1, Column 2, Lines 27-40, "In this work we study CBAF in the context of deep learning; in particular, we present a novel CBAF architecture suitable for deep learning models and introduce a neural network implementing it. Taking inspiration from the Multitask Learning literature (MTL), we consider the cluster level aggregates as a single, multivariate, sequence and feed it in to a single deep recurrent model with multiple output modules predicting the cluster level energy demand. We also present a practical method to cluster long time-series for CBAF. Finally, we evaluate our approach on a publicly available benchmark dataset, showing the benefits of the proposed approach and test our method in a new challenging dataset from the Swiss town of Arbon, providing empirical evidence that CBAF is a viable technique to improve prediction accuracy in STLF (short-term load forecasting)"; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”], the method comprising: 
a first cluster model [Figs. 1-2, Section A, Pg. 3, Column 2, lines 9-13, “we propose CBAF in multi-task learning setting, studying predictors that require to train only a single model to forecast all the cluster-level loads at once. We focus on models that take as an input all the cluster-level aggregates as a single multivariate time-series”; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs)];
wherein the first cluster model comprises MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that is specific to the first cluster [Figs. 1-2, Section A, Pg. 3, Column 2, line 9 to Pg. 4 Column 1, line 7, “we propose CBAF in multi-task learning setting … the CBAF-CP architecture is inspired by the Multitask Learning literature [8], where the objective is to achieve better generalization sharing the representation of the input data across tasks”; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs)];
predicting, based on an output of the first cluster model, a future electricity consumption for each of the first electricity consuming objects of the first cluster [Section IV, Pg. 4, Column 1, Lines 19-29, "This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs). Each head, one per cluster, can include multiple hidden layers and is trained to predict the aggregated energy consumption at each time-step of the prediction horizon H. In particular, we use the Multi input- Multi output (MIMO) prediction strategy which consists in predicting the values to be forecasted for each time lag all at once"; Section VI, Pg. 7, “electrical load forecasting based on a clustering algorithm that groups load profiles based on their correlation”], and
wherein the future electricity consumption refers to an amount of consumption of the electricity over a predetermined period of time in the future [Section I, Pg. 1, Column 1, lines 24-35, “time-series describing the energy consumption of a specific customer as measured by a smart meter (SM) … We focus on the day-head prediction task, i.e., we aim at predicting the load on the grid for each time-step of the next 24 hours”],
wherein the first cluster model comprises a first multi-task learning process having a first joint loss function, and each input of the first multi-task learning process corresponds to a respective electricity consuming object of the first cluster [Section II, Pg. 2, Column 1, Lines 22-28, "Given a prediction horizon H (i.e., the number of steps ahead to predict) and a prediction window W (i.e., the number of previous steps used to make the prediction), this translates into minimizing a loss function, e.g., the prediction mean squared error (MSE):

    PNG
    media_image1.png
    50
    392
    media_image1.png
    Greyscale

where s_t = (s[t-W], …, s[t]), u_t is a vector of exogenous variables, ϴ represents the model's parameter vector and Yh is the h-step-ahead prediction"], and
wherein predicted future electricity consumption values for the electricity consuming objects are input into a fully connected neural network aggregation layer configured to generate an aggregated load forecast for the first cluster [Fig. 1 discloses a process of clustering the loads of the consumers and inputting into the predictors to generate the cluster-level forecasts, the forecasts are then used to generate a total load forecast suing the fully connected neural network which is shown in Fig. 2; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs). Each head, one per cluster, can include multiple hidden layers and is trained to predict the aggregated energy consumption at each time-step of the prediction horizon H. In particular, we use the Multi input - Multi output (MIMO) prediction strategy [21], which consists in predicting the values to be forecasted for each time-lag all at once]. 
Cini does not explicitly teach
receiving, via a communication interface, present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster, among a plurality of electricity consuming objects corresponding to a plurality of clusters, from external electronic devices corresponding to the first electricity consuming objects; 
inputting, into a first cluster model, the present environmental data and the present calendar data, the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster;
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster;
a dense layer shared across the prediction tasks; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects;
wherein the present environmental data refers to current information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data,
wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year,
displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster.
wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects,
wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time, and
wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster.
Moon teaches
Receiving, via a communication interface, present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster, among a plurality of electricity consuming objects corresponding to a plurality of clusters, from external electronic devices corresponding to the first electricity consuming objects [paragraphs 0009-0010, “An artificial intelligence (AI) based apparatus for forecasting energy usage includes a communication unit configured to receive a request for providing power usage forecasting information of a first period including a current month with respect to an electronic apparatus of a user, a memory configured to store power actual-use data indicating power usage of an electronic apparatus of each of a plurality of users, and a processor configured to load power actual-use data of the user during a predetermined second period before the current month … generate the power usage forecasting information of the first period with respect to the electronic apparatus of the user … the processor may calculate a daily forecasting weight given a weight based on weather information of the first period, divide a use period of the electronic apparatus into first to third sections, and classify the user into any one of a low user, a middle user or a high user according to the periods to generate the power usage forecasting information”; Fig. 4, paragraph 0140, “The communication unit 210a may receive a request for providing power usage forecasting information of a first period including a current month with respect to the electronic apparatus 100a of a user. The communication unit 210a may transmit and receive data to and from external apparatuses such as other AI devices 100a to 100e or an AI server 200”; It can be seen that the AI apparatus receives calendar data indicating time data (first period including a current month and second period) corresponding to the power usage data of an electronic apparatus of each of a plurality of users, and receives environmental data indicating weather information of the first period, the AI apparatus generates the power usage forecasting information based on the received data, also, in can be seen that the communication unit 210a receives or transmit data from and to device 100a via a communication interface];
wherein the present environmental data refers to current information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data [paragraphs 0009-0010, “An artificial intelligence (AI) based apparatus for forecasting energy usage includes a communication unit configured to receive a request for providing power usage forecasting information of a first period including a current month with respect to an electronic apparatus of a user … the processor may calculate a daily forecasting weight given a weight based on weather information of the first period, divide a use period of the electronic apparatus into first to third sections, and classify the user into any one of a low user, a middle user or a high user according to the periods to generate the power usage forecasting information”; Fig. 6, S22, “In: data on weather (temperature, fine dust) and weekday/holiday of current month”; paragraph 0171, “a monthly power usage temperature correction value may be calculated using a difference between a temperature measured on a weekday in the past and an average temperature”; It can be seen that the AI apparatus receives environmental data comprising weather and temperature data to in part generate the power usage forecasting information],
wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year [paragraphs 0009-0010, “An artificial intelligence (AI) based apparatus for forecasting energy usage includes a communication unit configured to receive a request for providing power usage forecasting information of a first period including a current month with respect to an electronic apparatus of a user, a memory configured to store power actual-use data indicating power usage of an electronic apparatus of each of a plurality of users, and a processor configured to load power actual-use data of the user during a predetermined second period before the current month … generate the power usage forecasting information of the first period with respect to the electronic apparatus of the user”; It can be seen that the AI apparatus receives calendar data indicating time data (first period including a current month and second period) corresponding to the power usage data of an electronic apparatus of each of a plurality of users to in part generate the power usage forecasting information],
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the processes of receiving present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster, among a plurality of electricity consuming objects corresponding to a plurality of clusters, from external electronic devices corresponding to the first electricity consuming objects of Moon. Doing so would help generating the power usage forecasting information of the first period with respect to the electronic apparatus of the user (Moon, 0009).
Cini and Moon do not explicitly teach
inputting, into a first cluster model, the present environmental data and the present calendar data, the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster;
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster;
a dense layer shared across the prediction tasks; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects;
displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster,
wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects,
wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time, and
wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster.
Yildiz teaches 
inputting, into a first cluster model, the present environmental data and the present calendar data [Section II, Pg. 874, Column 1 Lines 49-55, Column 2 Lines 1-3, "In a SMBM, a chosen machine learning model learns the relationship between the target forecasted loads and input data, which consist of historical forms (lags) of smart meter measurements, weather data (temperature, humidity, wind speed etc.) and temporal information (calendar information such as hour of day, day of week, holidays etc.). An example SMBM schematic can be seen in Figure 1. In the first stage, the input data is cleaned and pre-processed followed by organizing lags of load, weather and calendar variables for each time step"]; and 
wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects [Section II, Pg. 874-875, Column 2 Lines 26-35, Column 1 Lines 1-3, "For this study we use three machine learning models in the training stage: Artificial Neural Network Ensembles (ANN-E), Support Vector Regression (SVR) [18] and Least-Squares Support Vector Regression (LSSVR). Each model is trained and 10-fold cross-validated. For all models, during the cross-validation stage, model parameters are optimized and tuned to improve the forecast performance. Due to the space limitations, we guide readers to the respective references above for more technical details and parameter tuning process of these models. Amongst these three models, the best forecast performance was obtained by SVR, hence it was chosen to be used for the proposed CCF method which is described next"],
wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time [Section IV, Pg. 877, Column 2, Lines 16-24, "Following the example demonstration of the CCF method, its implementation was extended to the entire 300 households. Considering the different clustering models, daily profile representations and different number of clusters, thirty CCF combinations were tested on the data-set. Figure 7 shows the NRMSE (%) box plots for the top 8 best performing CCF with two simpler SMBMs for reference. It can be seen that CCF methods result in smaller NRMSE (%) and most have narrower error band than simpler SMBMs"], 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects, wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time of Yildiz. Doing so would help training the model for the particular cluster to predict the load at each time step (Yildiz, Pg. 4, Column 1, lines 29-31).
Yildiz also teaches 
wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster [Section II, Pg. 876, Column 1 Lines 35-45, Column 2 Lines 1-6, "In order to measure the forecast performance of the models, the following metrics are used: normalized root mean squared error (NRMSE), normalized mean absolute error (NMAE) and normalized mean bias error (NMBE); where the household mean load is used to normalize these metrics. Moreover, in order to assess the error on forecast accuracy, two metrics are defined: AElarge for loads larger than 500W and AEsmall for loads smaller than 500W [8]. The metrics are described in the equations below:

    PNG
    media_image2.png
    126
    242
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    128
    332
    media_image3.png
    Greyscale



    PNG
    media_image4.png
    68
    480
    media_image4.png
    Greyscale

It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster of Yildiz. Doing so would help measuring the forecast performance of the model using the loss function (Yildiz, Pg. 4, Column 1, lines 37-40).
Cini, Moon and Yildiz do not teach
the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster;
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster;
a dense layer shared across the prediction tasks; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects;
displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster.
	Chen teaches
the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster [Fig. 7C, paragraphs 0117-0118, “an RNN can be constructed and used in the process 700A … RNN suitable to process sequences of inputs. In an example, a long short-term memory (LSTM) network can be used. The LSTM is a. type of RNN architecture … an LSTM 5 network 700C that can include one input layer 761”];
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster [Fig. 7C, paragraphs 0117-0118, “The intermediate learning layers 762 can include two to four LSTM layers (e.g., three LSTM layers as shown in FIG. 7C)”]; 
a dense layer shared across the prediction tasks [Fig. 7C, paragraphs 0117-0118, “The dense layer 753”]; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects [Fig. 7C, paragraphs 0117-0118, “The dense layer 753 can be a fully connected output layer”];
Since Cini in Figs. 1-2, Section A, Pg. 3, Column 2, line 9 to Pg. 4 Column 1, line 7, and Section B, Pg. 4, Column 1, lines 13-28, teaches MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that processes input to generate load forecasting for consumers, “we propose CBAF in multi-task learning setting … the CBAF-CP architecture is inspired by the Multitask Learning literature [8], where the objective is to achieve better generalization sharing the representation of the input data across tasks … We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs)”, while Chen teaches the model comprises an input layer, an LSTM block comprising a plurality of LSTM layers, a dense layer and an output layer, therefore, the combination of Cini and Chen teaches the above claim limitations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the model comprises an input layer, an LSTM block comprising a plurality of LSTM layers, a dense layer and an output layer of Chen. Doing so would help training the model to predict the aggregated energy consumption at each time step of the prediction horizon (Cini, Pg. 4, Col. 1).
Cini, Moon, Yildiz and Chen do not teach
displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster.
Li teaches
displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster [abstract, “a method for predicting energy consumption data based on a time series of consumption data”, paragraph 0007, “a prediction module configure to estimate predicted energy consumption data based on the trained prediction function and a type of the day for which the prediction is performed, and an output module configure to output the estimated predicted energy consumption data to at least one of a user interface”; claim 9, “output the estimated predicted energy consumption data to at least one of a user interface display”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include displaying, via display, a representation of the future electricity consumption of Li. Doing so would help monitoring and comparing a predicted energy consumption with a true energy consumption of the consumer in real-time (Li, 0004).

As per claim 3, Cini, Moon, Yildiz, Chen and Li teach the method of claim 1.
Cini further teaches
predicting, based on an output of the second cluster model, a future electricity consumption for each of the second electricity consuming object of the second cluster [Figs. 1-2, Section IV, Pg. 4, Column 1, Lines 19-29, "This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feed forward fully-connected network heads (FCNNs). Each head, one per cluster, can include multiple hidden layers and is trained to predict the aggregated energy consumption at each time-step of the prediction horizon H. In particular, we use the Multi input - Multi output (MIMO) prediction strategy which consists in predicting the values to be forecasted for each time lag all at once"];
obtaining a final forecast based on the predicted future electricity consumption of the first and the second electricity consuming objects of the first and the second clusters [Figs. 1-2, Section V, Pg. 6, Column 2, Lines 10-21, "Results of the empirical evaluation are shown in Table II for the CER dataset and in Table III for the Arbon dataset. For both datasets, using CBAF with spectral clustering results in a substantial improvement in prediction accuracy over a naïve aggregation strategy. CBAF, as we could expect, appears to be more beneficial for the more complex and noisier dataset, where the SM data are more heterogeneous. In particular CBAF-CP achieves a 23% improvement in MAE over the standard STLF predictor on the Arbon dataset. However, the improvement in prediction accuracy is remarkable also in the CER - simpler - dataset case, confirming the results of the analyses conducted in previous works"],
wherein the second cluster model comprises a second multi-task learning process having a second joint loss function, and each input of the second multi-task learning process corresponds to a respective electricity consuming object of the second cluster [Figs. 1-2, Section II, Pg. 2, Column 1, Lines 22-28," Given a prediction horizon H (i.e., the number of steps ahead to predict) and a prediction window W (i.e., the number of previous steps used to make the prediction), this translates into minimizing a loss function, e.g., the prediction mean squared error (MSE):

    PNG
    media_image1.png
    50
    392
    media_image1.png
    Greyscale

where s_t = (s[t-W], …, s[t]), u_t is a vector of exogenous variables, ϴ represents the model's parameter vector and Yh is the h-step-ahead prediction"]. 
Yildiz further teaches
inputting, into a second cluster model, present environmental data and present calendar data corresponding to second electricity consuming objects of a second cluster among the plurality of clusters [Section II, Pg. 874, Column 1 Lines 49-55, Column 2 Lines 1-3, "In a SMBM, a chosen machine learning model learns the relationship between the target forecasted loads and input data, which consist of historical forms (lags) of smart meter measurements, weather data (temperature, humidity, wind speed etc.) and temporal information (calendar information such as hour of day, day of week, holidays etc.). An example SMBM schematic can be seen in Figure 1. In the first stage, the input data is cleaned and pre-processed followed by organizing lags of load, weather and calendar variables for each time step"; Since Cini in Figs. 1-2 teaches the load data are input into multiple models (including the second model) to perform load forecast, while Tildiz teaches the present environmental data and present calendar data are input into the model, therefore, the combination pf Cini and Yildiz teaches the above claim limitation];
wherein the second cluster model is trained based on second reference electricity consumption data, second reference environmental data, and second reference calendar data, with regard to the second cluster among the plurality of clusters [Section II, Pg. 874-875, Column 2 Lines 26-35, Column 1 Lines 1-3, "For this study we use three machine learning models in the training stage: Artificial Neural Network Ensembles (ANN-E), Support Vector Regression (SVR) [18] and Least-Squares Support Vector Regression (LSSVR). Each model is trained and 10-fold cross-validated. For all models, during the cross-validation stage, model parameters are optimized and tuned to improve the forecast performance. Due to the space limitations, we guide readers to the respective references above for more technical details and parameter tuning process of these models. Amongst these three models, the best forecast performance was obtained by SVR, hence it was chosen to be used for the proposed CCF method which is described next"],
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include training the second cluster model using second reference electricity consumption data, second reference environmental data, and second reference calendar data, with regard to the second cluster among the plurality of clusters of Yildiz. Doing so would help training the model for the particular cluster to predict the load at each time step (Yildiz, Pg. 4, Column 1, lines 29-31).

As per claim 4, Cini, Moon, Yildiz, Chen and Li teach the method of claim 3.
Yildiz further teaches
the first and the second joint loss functions of the first and the second multi-task learning processes treat all learning tasks with equal importance [Section V, Pg. 877, Column 1, Lines 13-26, "Following the clustering step, the CART was trained with the household's respective inputs and cluster labels of the training daily profiles. After the optimization and 10-fold ross-validation of the model, the classification accuracy was obtained as 71%. This can be considered as a fairly good classification performance considering that there were four clusters and the training daily profiles were distributed almost evenly across the clusters (a random predictor would achieve 25 % accuracy). The following figure demonstrates the importance of the classification in put variables in terms of their predictive capabilities, when assigning daily profiles into one of the four clusters. For this specific household, the forecasted day's expected mean temperature and current day's mean load value are the two most important predictors"].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the first and the second joint loss functions of the first and the second multi-task learning processes treat all learning tasks with equal importance of Yildiz. Doing so would help training the models to forecast the total load (Cini, Figs. 1-2, Pg. 2).

As per claim 5, Cini, Moon, Yildiz, Chen and Li teach the method of claim 3.
Cini (as modified) further teaches
the obtaining the final forecast comprises combining the predicted future electricity consumption of the electricity consuming objects of the first and the second clusters using a fully connected neural network output layer [Figs. 1-2 show the load forecasts associated with the clusters are combined to generate a total forecast using the fully connected neural network].  

As per claim 6, Cini, Moon, Yildiz, Chen and Li teach the method of claim 3.
Yildiz further teaches
the present environmental data and present calendar data corresponding to the first electricity consuming objects of the first cluster and the present environmental data and present calendar data corresponding to the second electricity consuming objects of the second cluster comprise time series data sets in which a final time corresponds to the current time [Section II, Pg. 874, Column 1 Lines 49-55, Column 2 Lines 1-3, "In a SMBM, a chosen machine learning model learns the relationship between the target forecasted loads and input data, which consist of historical forms (lags) of smart meter measurements, weather data (temperature, humidity, wind speed etc.) and temporal information (calendar information such as hour of day, day of week, holidays etc.). An example SMBM schematic can be seen in Figure 1. In the first stage, the input data is cleaned and pre-processed followed by organizing lags of load, weather and calendar variables for each time step"].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the present environmental data and present calendar data corresponding to the first electricity consuming objects of the first cluster and the present environmental data and present calendar data corresponding to the second electricity consuming objects of the second cluster comprise time series data sets in which a final time corresponds to the current time of Yildiz. Doing so would help training the model to predict the load at each time step using the load data corresponding to the particular clusters (Yildiz, Pg. 4, Column 1).

As per claim 7, Cini, Moon, Yildiz, Chen and Li teach the method of claim 1.
Cini further teaches
the plurality of clusters comprises the first cluster through an Nth cluster [Figs. 1-2, Section IV, Pg. 5, Column 1, Lines 31-37, "Building the similarity matrix can be computationally expensive if the number of load profiles is large, but the process can be efficiently parallelized and it needs to be executed only once since the hyperparameters (i.e., the number of neighbors and clusters) can be tuned independently afterwards. In practice, using the K-neighbors graph greatly simplifies the clustering problem"], 
the method further comprising: 
obtaining cluster models for each of the first through the Nth clusters [Figs. 1-2, Section IV, Pg. 5, Column 1, Lines 24-37, "For each pair of load profiles we compute a similarity measure as the average Person correlation coefficient between the subsequences obtained at the previous step 1. We use the similarity matrix obtained at the previous step to build a K-Nearest neighbor graph of the dataset. We perform spectral clustering using the graph representation obtained at the previous step. Building the similarity matrix can be computationally expensive if the number of load profiles is large, but the process can be efficiently parallelized and it needs to be executed only once since the hyperparameters (i.e., the number of neighbors and clusters) can be tuned independently afterwards. In practice, using the K-neighbors graph greatly simplifies the clustering problem"];
obtaining a final forecast based on predicted future electricity consumption of the electricity consuming objects of the first through the Nth clusters [Figs. 1-2, Section V, Pg. 6, Column 2, Lines 10-21, "Results of the empirical evaluation are shown in Table II for the CER dataset and in Table III for the Arbon dataset. For both datasets, using CBAF with spectral clustering results in a substantial improvement in prediction accuracy over a naive aggregation strategy. CBAF, as we could expect, appears to be more beneficial for the more complex and noisier dataset, where the SM data are more heterogeneous. In particular CBAF-CP achieves a 23% improvement in MAE over the standard STLF predictor on the Arbon dataset. However, the improvement in prediction accuracy is remarkable also in the CER- simpler- dataset case, confirming the results of the analyses conducted in previous works"], 
wherein the first through Nth cluster models comprise multi-task learning processes having the first joint loss function through an Nth joint loss function [Section II, Pg. 2, Column 1, Lines 22-28, "Given a prediction horizon H (i.e., the number of steps ahead to predict) and a prediction window W (i.e., the number of previous steps used to make the prediction), this translates into minimizing a loss function, e.g., the prediction mean squared error (MSE):

    PNG
    media_image1.png
    50
    392
    media_image1.png
    Greyscale

where s_t = (s[t-W], …, s[t]), u_t is a vector of exogenous variables, ϴ represents the model's parameter vector and Yh is the h-step-ahead prediction"].
Yildiz further teaches
wherein the inputs of the multi-task learning processes correspond to electricity consuming objects of corresponding clusters [Section I, Pg. 2, Column 1, Lines 18-33, "This approach was extended by which used wavelet based neural networks, where the input was based on historical days which had similar calendar and weather characteristics with the forecasted day. In the individual household load forecast context, a similar approach was applied in; however, unlike the previous studies, finding historical load profiles similar to the forecasted day was accomplished by using clustering and classification models. Our study extends the work presented in, by combining the predictive inputs used in the SMBM forecast approach with the predictive inputs used in the 'pattern similarity' based forecast approach. Furthermore, we investigate the impact of different clustering models, data transformation techniques, number of clusters and clustering performance validity indices which influence forecast performance"], and 
wherein the first through the Nth joint loss functions of the multi-task learning processes treat all learning tasks with equal importance [Section V, Pg. 877, Column 1, Lines 13-26, "Following the clustering step, the CART was trained with the household's respective inputs and cluster labels of the training daily profiles. After the optimization and 10-fold ross-validation of the model, the classification accuracy was obtained as 71%. This can be considered as a fairly good classification performance considering that there were four clusters and the training daily profiles were distributed almost evenly across the clusters (a random predictor would achieve 25 % accuracy). The following figure demonstrates the importance of the classification in put variables in terms of their predictive capabilities, when assigning daily profiles into one of the four clusters. For this specific household, the forecasted day's expected mean temperature and current day's mean load value are the two most important predictors"].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the inputs of the multi-task learning processes correspond to electricity consuming objects of corresponding clusters, and the first through the Nth joint loss functions of the multi-task learning processes treat all learning tasks with equal importance of Yildiz. Doing so would help training the models to forecast the total load (Cini, Figs. 1-2, Pg. 2).

As per claim 8, Cini teaches an apparatus for forecasting load using a Long Short-Term Memory (LSTM) based multi-task deep learning [abstract, “predict the load requested by the grid”; Section I, Pg. 1, Column 2, Lines 27-40, "In this work we study CBAF in the context of deep learning; in particular, we present a novel CBAF architecture suitable for deep learning models and introduce a neural network implementing it. Taking inspiration from the Multitask Learning literature (MTL), we consider the cluster level aggregates as a single, multivariate, sequence and feed it in to a single deep recurrent model with multiple output modules predicting the cluster level energy demand. We also present a practical method to cluster long time-series for CBAF. Finally, we evaluate our approach on a publicly available benchmark dataset, showing the benefits of the proposed approach and test our method in a new challenging dataset from the Swiss town of Arbon, providing empirical evidence that CBAF is a viable technique to improve prediction accuracy in STLF (short-term load forecasting)"; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”], the apparatus comprising: 
a first cluster model [Figs. 1-2, Section A, Pg. 3, Column 2, lines 9-13, “we propose CBAF in multi-task learning setting, studying predictors that require to train only a single model to forecast all the cluster-level loads at once. We focus on models that take as an input all the cluster-level aggregates as a single multivariate time-series”; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs)];
wherein the first cluster model comprises MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that is specific to the first cluster [Figs. 1-2, Section A, Pg. 3, Column 2, line 9 to Pg. 4 Column 1, line 7, “we propose CBAF in multi-task learning setting … the CBAF-CP architecture is inspired by the Multitask Learning literature [8], where the objective is to achieve better generalization sharing the representation of the input data across tasks”; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs)];
predicting, based on an output of the first cluster model, a future electricity consumption for each of the first electricity consuming objects of the first cluster [Section IV, Pg. 4, Column 1, Lines 19-29, "This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs). Each head, one per cluster, can include multiple hidden layers and is trained to predict the aggregated energy consumption at each time-step of the prediction horizon H. In particular, we use the Multi input- Multi output (MIMO) prediction strategy which consists in predicting the values to be forecasted for each time lag all at once"; Section VI, Pg. 7, “electrical load forecasting based on a clustering algorithm that groups load profiles based on their correlation”], and
wherein the future electricity consumption refers to an amount of consumption of the electricity over a predetermined period of time in the future [Section I, Pg. 1, Column 1, lines 24-35, “time-series describing the energy consumption of a specific customer as measured by a smart meter (SM) … We focus on the day-head prediction task, i.e., we aim at predicting the load on the grid for each time-step of the next 24 hours”],
wherein the first cluster model comprises a first multi-task learning process having a first joint loss function, and each input of the first multi-task learning process corresponds to a respective electricity consuming object of the first cluster [Section II, Pg. 2, Column 1, Lines 22-28, "Given a prediction horizon H (i.e., the number of steps ahead to predict) and a prediction window W (i.e., the number of previous steps used to make the prediction), this translates into minimizing a loss function, e.g., the prediction mean squared error (MSE):

    PNG
    media_image1.png
    50
    392
    media_image1.png
    Greyscale

where s_t = (s[t-W], …, s[t]), u_t is a vector of exogenous variables, ϴ represents the model's parameter vector and Yh is the h-step-ahead prediction"], and
wherein predicted future electricity consumption values for the electricity consuming objects are input into a fully connected neural network aggregation layer configured to generate an aggregated load forecast for the first cluster [Fig. 1 discloses a process of clustering the loads of the consumers and inputting into the predictors to generate the cluster-level forecasts, the forecasts are then used to generate a total load forecast suing the fully connected neural network which is shown in Fig. 2; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs). Each head, one per cluster, can include multiple hidden layers and is trained to predict the aggregated energy consumption at each time-step of the prediction horizon H. In particular, we use the Multi input - Multi output (MIMO) prediction strategy [21], which consists in predicting the values to be forecasted for each time-lag all at once]. 
Cini does not explicitly teach
a communication interface; 
a display; 
at least one memory storing instructions; and 
at least one processor configured to execute the instructions to: 
receive present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster, among a plurality of electricity consuming objects corresponding to a plurality of clusters, through the communication interface from external electronic devices corresponding to the first electricity consuming objects; 
input, into a first cluster model, the present environmental data and the present calendar data, the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster;
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster;
a dense layer shared across the prediction tasks; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects;
wherein the present environmental data refers to information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data, 
wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year, 
display, via the display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster,
wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects, 
wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time, and
wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster.
Moon teaches
a communication interface [Fig. 4, paragraph 0140, “The communication unit 210a may receive a request for providing power usage forecasting information of a first period including a current month with respect to the electronic apparatus 100a of a user. The communication unit 210a may transmit and receive data to and from external apparatuses such as other AI devices 100a to 100e or an AI server 200”; It can be seen that the communication unit 210a comprises an interface to transmit and receive data to and from external apparatuses such as other AI devices 100a to 100e or an AI server 200];
at least one memory storing instructions [paragraph 0086, “one or more instructions that constitute the learning model may be stored in memory 230”]; and 
at least one processor configured to execute the instructions to [Fig. 2, processor 260]:
receive present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster, among a plurality of electricity consuming objects corresponding to a plurality of clusters, through the communication interface from external electronic devices corresponding to the first electricity consuming objects [paragraphs 0009-0010, “An artificial intelligence (AI) based apparatus for forecasting energy usage includes a communication unit configured to receive a request for providing power usage forecasting information of a first period including a current month with respect to an electronic apparatus of a user, a memory configured to store power actual-use data indicating power usage of an electronic apparatus of each of a plurality of users, and a processor configured to load power actual-use data of the user during a predetermined second period before the current month … generate the power usage forecasting information of the first period with respect to the electronic apparatus of the user … the processor may calculate a daily forecasting weight given a weight based on weather information of the first period, divide a use period of the electronic apparatus into first to third sections, and classify the user into any one of a low user, a middle user or a high user according to the periods to generate the power usage forecasting information”; Fig. 4, paragraph 0140, “The communication unit 210a may receive a request for providing power usage forecasting information of a first period including a current month with respect to the electronic apparatus 100a of a user. The communication unit 210a (comprising an interface) may transmit and receive data to and from external apparatuses such as other AI devices 100a to 100e or an AI server 200”; It can be seen that the AI apparatus receives calendar data indicating time data (first period including a current month and second period) corresponding to the power usage data of an electronic apparatus of each of a plurality of users, and receives environmental data indicating weather information of the first period, the AI apparatus generates the power usage forecasting information based on the received data];
wherein the present environmental data refers to current information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data [paragraphs 0009-0010, “An artificial intelligence (AI) based apparatus for forecasting energy usage includes a communication unit configured to receive a request for providing power usage forecasting information of a first period including a current month with respect to an electronic apparatus of a user … the processor may calculate a daily forecasting weight given a weight based on weather information of the first period, divide a use period of the electronic apparatus into first to third sections, and classify the user into any one of a low user, a middle user or a high user according to the periods to generate the power usage forecasting information”; Fig. 6, S22, “In: data on weather (temperature, fine dust) and weekday/holiday of current month”; paragraph 0171, “a monthly power usage temperature correction value may be calculated using a difference between a temperature measured on a weekday in the past and an average temperature”; It can be seen that the AI apparatus receives environmental data comprising weather and temperature data to in part generate the power usage forecasting information],
wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year [paragraphs 0009-0010, “An artificial intelligence (AI) based apparatus for forecasting energy usage includes a communication unit configured to receive a request for providing power usage forecasting information of a first period including a current month with respect to an electronic apparatus of a user, a memory configured to store power actual-use data indicating power usage of an electronic apparatus of each of a plurality of users, and a processor configured to load power actual-use data of the user during a predetermined second period before the current month … generate the power usage forecasting information of the first period with respect to the electronic apparatus of the user”; It can be seen that the AI apparatus receives calendar data indicating time data (first period including a current month and second period) corresponding to the power usage data of an electronic apparatus of each of a plurality of users to in part generate the power usage forecasting information],
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the processes of receiving present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster, among a plurality of electricity consuming objects corresponding to a plurality of clusters, from external electronic devices corresponding to the first electricity consuming objects of Moon. Doing so would help generating the power usage forecasting information of the first period with respect to the electronic apparatus of the user (Moon, 0009).
Cini and Moon do not explicitly teach
a display;
input, into a first cluster model, the present environmental data and the present calendar data, the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster;
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster;
a dense layer shared across the prediction tasks; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects;
display, via the display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster,
wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects, 
wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time,
wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster.
Yildiz teaches 
input, into a first cluster model, the present environmental data and the present calendar data [Section II, Pg. 874, Column 1 Lines 49-55, Column 2 Lines 1-3, "In a SMBM, a chosen machine learning model learns the relationship between the target forecasted loads and input data, which consist of historical forms (lags) of smart meter measurements, weather data (temperature, humidity, wind speed etc.) and temporal information (calendar information such as hour of day, day of week, holidays etc.). An example SMBM schematic can be seen in Figure 1. In the first stage, the input data is cleaned and pre-processed followed by organizing lags of load, weather and calendar variables for each time step"]; and 
wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects [Section II, Pg. 874-875, Column 2 Lines 26-35, Column 1 Lines 1-3, "For this study we use three machine learning models in the training stage: Artificial Neural Network Ensembles (ANN-E), Support Vector Regression (SVR) [18] and Least-Squares Support Vector Regression (LSSVR). Each model is trained and 10-fold cross-validated. For all models, during the cross-validation stage, model parameters are optimized and tuned to improve the forecast performance. Due to the space limitations, we guide readers to the respective references above for more technical details and parameter tuning process of these models. Amongst these three models, the best forecast performance was obtained by SVR, hence it was chosen to be used for the proposed CCF method which is described next"],
wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time [Section IV, Pg. 877, Column 2, Lines 16-24, "Following the example demonstration of the CCF method, its implementation was extended to the entire 300 households. Considering the different clustering models, daily profile representations and different number of clusters, thirty CCF combinations were tested on the data-set. Figure 7 shows the NRMSE (%) box plots for the top 8 best performing CCF with two simpler SMBMs for reference. It can be seen that CCF methods result in smaller NRMSE (%) and most have narrower error band than simpler SMBMs"], 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects, wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time of Yildiz. Doing so would help training the model for the particular cluster to predict the load at each time step (Yildiz, Pg. 4, Column 1, lines 29-31).
Yildiz also teaches 
wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster [Section II, Pg. 876, Column 1 Lines 35-45, Column 2 Lines 1-6, "In order to measure the forecast performance of the models, the following metrics are used: normalized root mean squared error (NRMSE), normalized mean absolute error (NMAE) and normalized mean bias error (NMBE); where the household mean load is used to normalize these metrics. Moreover, in order to assess the error on forecast accuracy, two metrics are defined: AElarge for loads larger than 500W and AEsmall for loads smaller than 500W [8]. The metrics are described in the equations below:

    PNG
    media_image2.png
    126
    242
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    128
    332
    media_image3.png
    Greyscale



    PNG
    media_image4.png
    68
    480
    media_image4.png
    Greyscale

It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster of Yildiz. Doing so would help measuring the forecast performance of the model using the loss function (Yildiz, Pg. 4, Column 1, lines 37-40).
Cini, Moon and Yildiz do not teach
a display; 
the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster;
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster;
a dense layer shared across the prediction tasks; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects;
display, via the display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster,
Chen teaches
the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster [Fig. 7C, paragraphs 0117-0118, “an RNN can be constructed and used in the process 700A … RNN suitable to process sequences of inputs. In an example, a long short-term memory (LSTM) network can be used. The LSTM is a. type of RNN architecture … an LSTM 5 network 700C that can include one input layer 761”];
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster [Fig. 7C, paragraphs 0117-0118, “The intermediate learning layers 762 can include two to four LSTM layers (e.g., three LSTM layers as shown in FIG. 7C)”]; 
a dense layer shared across the prediction tasks [Fig. 7C, paragraphs 0117-0118, “The dense layer 753”]; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects [Fig. 7C, paragraphs 0117-0118, “The dense layer 753 can be a fully connected output layer”];
Since Cini in Figs. 1-2, Section A, Pg. 3, Column 2, line 9 to Pg. 4 Column 1, line 7, and Section B, Pg. 4, Column 1, lines 13-28, teaches MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that processes input to generate load forecasting for consumers, “we propose CBAF in multi-task learning setting … the CBAF-CP architecture is inspired by the Multitask Learning literature [8], where the objective is to achieve better generalization sharing the representation of the input data across tasks … We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs)”, while Chen teaches the model comprises an input layer, an LSTM block comprising a plurality of LSTM layers, a dense layer and an output layer, therefore, the combination of Cini and Chen teaches the above claim limitations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the model comprises an input layer, an LSTM block comprising a plurality of LSTM layers, a dense layer and an output layer of Chen. Doing so would help training the model to predict the aggregated energy consumption at each time step of the prediction horizon (Cini, Pg. 4, Col. 1).
Cini, Moon, Yildiz and Chen do not teach
a display;
display, via the display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster.
Li teaches
a display [claim 9, user interface display];
displaying, via the display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster [abstract, “a method for predicting energy consumption data based on a time series of consumption data”, paragraph 0007, “a prediction module configure to estimate predicted energy consumption data based on the trained prediction function and a type of the day for which the prediction is performed, and an output module configure to output the estimated predicted energy consumption data to at least one of a user interface”; claim 9, “output the estimated predicted energy consumption data to at least one of a user interface display”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include displaying, via display, a representation of the future electricity consumption of Li. Doing so would help monitoring and comparing a predicted energy consumption with a true energy consumption of the consumer in real-time (Li, 0004).

As per claim 10, Cini, Moon, Yildiz, Chen and Li teach the apparatus of claim 8.
Cini further teaches
predict, based on an output of the second cluster model, a future electricity consumption for each of the second electricity consuming object of the second cluster [Figs. 1-2, Section IV, Pg. 4, Column 1, Lines 19-29, "This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feed forward fully-connected network heads (FCNNs). Each head, one per cluster, can include multiple hidden layers and is trained to predict the aggregated energy consumption at each time-step of the prediction horizon H. In particular, we use the Multi input - Multi output (MIMO) prediction strategy which consists in predicting the values to be forecasted for each time lag all at once"];
obtain a final forecast based on the predicted future electricity consumption of the first and the second electricity consuming objects of the first and the second clusters [Figs. 1-2, Section V, Pg. 6, Column 2, Lines 10-21, "Results of the empirical evaluation are shown in Table II for the CER dataset and in Table III for the Arbon dataset. For both datasets, using CBAF with spectral clustering results in a substantial improvement in prediction accuracy over a naïve aggregation strategy. CBAF, as we could expect, appears to be more beneficial for the more complex and noisier dataset, where the SM data are more heterogeneous. In particular CBAF-CP achieves a 23% improvement in MAE over the standard STLF predictor on the Arbon dataset. However, the improvement in prediction accuracy is remarkable also in the CER - simpler - dataset case, confirming the results of the analyses conducted in previous works"],
wherein the second cluster model comprises a second multi-task learning process having a second joint loss function, and each input of the second multi-task learning process corresponds to a respective electricity consuming object of the second cluster [Figs. 1-2, Section II, Pg. 2, Column 1, Lines 22-28," Given a prediction horizon H (i.e., the number of steps ahead to predict) and a prediction window W (i.e., the number of previous steps used to make the prediction), this translates into minimizing a loss function, e.g., the prediction mean squared error (MSE):

    PNG
    media_image1.png
    50
    392
    media_image1.png
    Greyscale

where s_t = (s[t-W], …, s[t]), u_t is a vector of exogenous variables, ϴ represents the model's parameter vector and Yh is the h-step-ahead prediction"]. 
Yildiz further teaches
input, into a second cluster model, present environmental data and present calendar data corresponding to second electricity consuming objects of a second cluster among the plurality of clusters [Section II, Pg. 874, Column 1 Lines 49-55, Column 2 Lines 1-3, "In a SMBM, a chosen machine learning model learns the relationship between the target forecasted loads and input data, which consist of historical forms (lags) of smart meter measurements, weather data (temperature, humidity, wind speed etc.) and temporal information (calendar information such as hour of day, day of week, holidays etc.). An example SMBM schematic can be seen in Figure 1. In the first stage, the input data is cleaned and pre-processed followed by organizing lags of load, weather and calendar variables for each time step"; Since Cini in Figs. 1-2 teaches the load data are input into multiple models (including the second model) to perform load forecast, while Tildiz teaches the present environmental data and present calendar data are input into the model, therefore, the combination pf Cini and Yildiz teaches the above claim limitation];
wherein the second cluster model is trained based on second reference electricity consumption data, second reference environmental data, and second reference calendar data, with regard to the second cluster among the plurality of clusters [Section II, Pg. 874-875, Column 2 Lines 26-35, Column 1 Lines 1-3, "For this study we use three machine learning models in the training stage: Artificial Neural Network Ensembles (ANN-E), Support Vector Regression (SVR) [18] and Least-Squares Support Vector Regression (LSSVR). Each model is trained and 10-fold cross-validated. For all models, during the cross-validation stage, model parameters are optimized and tuned to improve the forecast performance. Due to the space limitations, we guide readers to the respective references above for more technical details and parameter tuning process of these models. Amongst these three models, the best forecast performance was obtained by SVR, hence it was chosen to be used for the proposed CCF method which is described next"],
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include training the second cluster model using second reference electricity consumption data, second reference environmental data, and second reference calendar data, with regard to the second cluster among the plurality of clusters of Yildiz. Doing so would help training the model for the particular cluster to predict the load at each time step (Yildiz, Pg. 4, Column 1, lines 29-31).

As per claim 11, Cini, Moon, Yildiz, Chen and Li teach the apparatus of claim 10.
Yildiz further teaches
the first and the second joint loss functions of the first and the second multi-task learning processes treat all learning tasks with equal importance [Section V, Pg. 877, Column 1, Lines 13-26, "Following the clustering step, the CART was trained with the household's respective inputs and cluster labels of the training daily profiles. After the optimization and 10-fold ross-validation of the model, the classification accuracy was obtained as 71%. This can be considered as a fairly good classification performance considering that there were four clusters and the training daily profiles were distributed almost evenly across the clusters (a random predictor would achieve 25 % accuracy). The following figure demonstrates the importance of the classification in put variables in terms of their predictive capabilities, when assigning daily profiles into one of the four clusters. For this specific household, the forecasted day's expected mean temperature and current day's mean load value are the two most important predictors"].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the first and the second joint loss functions of the first and the second multi-task learning processes treat all learning tasks with equal importance of Yildiz. Doing so would help training the models to forecast the total load (Cini, Figs. 1-2, Pg. 2).

As per claim 12, Cini, Moon, Yildiz, Chen and Li teach the apparatus of claim 8.
Cini further teaches
the plurality of clusters comprises the first cluster through an Nth cluster [Figs. 1-2, Section IV, Pg. 5, Column 1, Lines 31-37, "Building the similarity matrix can be computationally expensive if the number of load profiles is large, but the process can be efficiently parallelized and it needs to be executed only once since the hyperparameters (i.e., the number of neighbors and clusters) can be tuned independently afterwards. In practice, using the K-neighbors graph greatly simplifies the clustering problem"], 
the at least one processor is further configured to:
obtain cluster models for each of the first through the Nth clusters [Figs. 1-2, Section IV, Pg. 5, Column 1, Lines 24-37, "For each pair of load profiles we compute a similarity measure as the average Person correlation coefficient between the subsequences obtained at the previous step 1. We use the similarity matrix obtained at the previous step to build a K-Nearest neighbor graph of the dataset. We perform spectral clustering using the graph representation obtained at the previous step. Building the similarity matrix can be computationally expensive if the number of load profiles is large, but the process can be efficiently parallelized and it needs to be executed only once since the hyperparameters (i.e., the number of neighbors and clusters) can be tuned independently afterwards. In practice, using the K-neighbors graph greatly simplifies the clustering problem"];
obtain a final forecast based on predicted future electricity consumption of the electricity consuming objects of the first through the Nth clusters [Figs. 1-2, Section V, Pg. 6, Column 2, Lines 10-21, "Results of the empirical evaluation are shown in Table II for the CER dataset and in Table III for the Arbon dataset. For both datasets, using CBAF with spectral clustering results in a substantial improvement in prediction accuracy over a naive aggregation strategy. CBAF, as we could expect, appears to be more beneficial for the more complex and noisier dataset, where the SM data are more heterogeneous. In particular CBAF-CP achieves a 23% improvement in MAE over the standard STLF predictor on the Arbon dataset. However, the improvement in prediction accuracy is remarkable also in the CER- simpler- dataset case, confirming the results of the analyses conducted in previous works"], 
wherein the first through Nth cluster models comprise multi-task learning processes having the first joint loss function through an Nth joint loss function [Section II, Pg. 2, Column 1, Lines 22-28, "Given a prediction horizon H (i.e., the number of steps ahead to predict) and a prediction window W (i.e., the number of previous steps used to make the prediction), this translates into minimizing a loss function, e.g., the prediction mean squared error (MSE):

    PNG
    media_image1.png
    50
    392
    media_image1.png
    Greyscale

where s_t = (s[t-W], …, s[t]), u_t is a vector of exogenous variables, ϴ represents the model's parameter vector and Yh is the h-step-ahead prediction"].
Yildiz further teaches
wherein the inputs of the multi-task learning processes correspond to electricity consuming objects of corresponding clusters [Section I, Pg. 2, Column 1, Lines 18-33, "This approach was extended by which used wavelet based neural networks, where the input was based on historical days which had similar calendar and weather characteristics with the forecasted day. In the individual household load forecast context, a similar approach was applied in; however, unlike the previous studies, finding historical load profiles similar to the forecasted day was accomplished by using clustering and classification models. Our study extends the work presented in, by combining the predictive inputs used in the SMBM forecast approach with the predictive inputs used in the 'pattern similarity' based forecast approach. Furthermore, we investigate the impact of different clustering models, data transformation techniques, number of clusters and clustering performance validity indices which influence forecast performance"], and 
wherein the first through the Nth joint loss functions of the multi-task learning processes treat all learning tasks with equal importance [Section V, Pg. 877, Column 1, Lines 13-26, "Following the clustering step, the CART was trained with the household's respective inputs and cluster labels of the training daily profiles. After the optimization and 10-fold ross-validation of the model, the classification accuracy was obtained as 71%. This can be considered as a fairly good classification performance considering that there were four clusters and the training daily profiles were distributed almost evenly across the clusters (a random predictor would achieve 25 % accuracy). The following figure demonstrates the importance of the classification in put variables in terms of their predictive capabilities, when assigning daily profiles into one of the four clusters. For this specific household, the forecasted day's expected mean temperature and current day's mean load value are the two most important predictors"].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the inputs of the multi-task learning processes correspond to electricity consuming objects of corresponding clusters, and the first through the Nth joint loss functions of the multi-task learning processes treat all learning tasks with equal importance of Yildiz. Doing so would help training the models to forecast the total load (Cini, Figs. 1-2, Pg. 2).

As per claim 13, Cini teaches
a first cluster model [Figs. 1-2, Section A, Pg. 3, Column 2, lines 9-13, “we propose CBAF in multi-task learning setting, studying predictors that require to train only a single model to forecast all the cluster-level loads at once. We focus on models that take as an input all the cluster-level aggregates as a single multivariate time-series”; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs)];
wherein the first cluster model comprises MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that is specific to the first cluster [Figs. 1-2, Section A, Pg. 3, Column 2, line 9 to Pg. 4 Column 1, line 7, “we propose CBAF in multi-task learning setting … the CBAF-CP architecture is inspired by the Multitask Learning literature [8], where the objective is to achieve better generalization sharing the representation of the input data across tasks”; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs)];
predicting, based on an output of the first cluster model, a future electricity consumption for each of the first electricity consuming objects of the first cluster [Section IV, Pg. 4, Column 1, Lines 19-29, "This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs). Each head, one per cluster, can include multiple hidden layers and is trained to predict the aggregated energy consumption at each time-step of the prediction horizon H. In particular, we use the Multi input- Multi output (MIMO) prediction strategy which consists in predicting the values to be forecasted for each time lag all at once"; Section VI, Pg. 7, “electrical load forecasting based on a clustering algorithm that groups load profiles based on their correlation”], and
wherein the future electricity consumption refers to an amount of consumption of the electricity over a predetermined period of time in the future [Section I, Pg. 1, Column 1, lines 24-35, “time-series describing the energy consumption of a specific customer as measured by a smart meter (SM) … We focus on the day-head prediction task, i.e., we aim at predicting the load on the grid for each time-step of the next 24 hours”],
wherein the first cluster model comprises a first multi-task learning process having a first joint loss function, and each input of the first multi-task learning process corresponds to a respective electricity consuming object of the first cluster [Section II, Pg. 2, Column 1, Lines 22-28, "Given a prediction horizon H (i.e., the number of steps ahead to predict) and a prediction window W (i.e., the number of previous steps used to make the prediction), this translates into minimizing a loss function, e.g., the prediction mean squared error (MSE):

    PNG
    media_image1.png
    50
    392
    media_image1.png
    Greyscale

where s_t = (s[t-W], …, s[t]), u_t is a vector of exogenous variables, ϴ represents the model's parameter vector and Yh is the h-step-ahead prediction"], and
wherein predicted future electricity consumption values for the electricity consuming objects are input into a fully connected neural network aggregation layer configured to generate an aggregated load forecast for the first cluster [Fig. 1 discloses a process of clustering the loads of the consumers and inputting into the predictors to generate the cluster-level forecasts, the forecasts are then used to generate a total load forecast suing the fully connected neural network which is shown in Fig. 2; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs). Each head, one per cluster, can include multiple hidden layers and is trained to predict the aggregated energy consumption at each time-step of the prediction horizon H. In particular, we use the Multi input - Multi output (MIMO) prediction strategy [21], which consists in predicting the values to be forecasted for each time-lag all at once]. 
Cini does not explicitly teach
A non-transitory computer-readable medium storing instructions, the instructions comprising: one or more instructions that, when executed by one or more processors, cause the one or more processors to: 
receive, via a communication interface, present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster, among a plurality of electricity consuming objects corresponding to a plurality of clusters, from external electronic devices corresponding to the first electricity consuming objects;
input, into a first cluster model, the present environmental data and the present calendar data, the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster;
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster;
a dense layer shared across the prediction tasks; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects;
wherein the present environmental data refers to information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data,
wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year,
display, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster,
wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects,
wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time, and
wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming
    PNG
    media_image5.png
    6
    5
    media_image5.png
    Greyscale
objects of the first cluster.
Moon teaches
A non-transitory computer-readable medium storing instructions, the instructions comprising: one or more instructions that paragraph 0086, “one or more instructions that constitute the learning model may be stored in memory 230”], when executed by one or more processors [Fig. 2, processor 260], cause the one or more processors to: 
receive, via a communication interface, present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster, among a plurality of electricity consuming objects corresponding to a plurality of clusters, from external electronic devices corresponding to the first electricity consuming objects [paragraphs 0009-0010, “An artificial intelligence (AI) based apparatus for forecasting energy usage includes a communication unit configured to receive a request for providing power usage forecasting information of a first period including a current month with respect to an electronic apparatus of a user, a memory configured to store power actual-use data indicating power usage of an electronic apparatus of each of a plurality of users, and a processor configured to load power actual-use data of the user during a predetermined second period before the current month … generate the power usage forecasting information of the first period with respect to the electronic apparatus of the user … the processor may calculate a daily forecasting weight given a weight based on weather information of the first period, divide a use period of the electronic apparatus into first to third sections, and classify the user into any one of a low user, a middle user or a high user according to the periods to generate the power usage forecasting information”; Fig. 4, paragraph 0140, “The communication unit 210a may receive a request for providing power usage forecasting information of a first period including a current month with respect to the electronic apparatus 100a of a user. The communication unit 210a may transmit and receive data to and from external apparatuses such as other AI devices 100a to 100e or an AI server 200”; It can be seen that the AI apparatus receives calendar data indicating time data (first period including a current month and second period) corresponding to the power usage data of an electronic apparatus of each of a plurality of users, and receives environmental data indicating weather information of the first period, the AI apparatus generates the power usage forecasting information based on the received data, also, in can be seen that the communication unit 210a receives or transmit data from and to device 100a via a communication interface];
wherein the present environmental data refers to current information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data [paragraphs 0009-0010, “An artificial intelligence (AI) based apparatus for forecasting energy usage includes a communication unit configured to receive a request for providing power usage forecasting information of a first period including a current month with respect to an electronic apparatus of a user … the processor may calculate a daily forecasting weight given a weight based on weather information of the first period, divide a use period of the electronic apparatus into first to third sections, and classify the user into any one of a low user, a middle user or a high user according to the periods to generate the power usage forecasting information”; Fig. 6, S22, “In: data on weather (temperature, fine dust) and weekday/holiday of current month”; paragraph 0171, “a monthly power usage temperature correction value may be calculated using a difference between a temperature measured on a weekday in the past and an average temperature”; It can be seen that the AI apparatus receives environmental data comprising weather and temperature data to in part generate the power usage forecasting information],
wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year [paragraphs 0009-0010, “An artificial intelligence (AI) based apparatus for forecasting energy usage includes a communication unit configured to receive a request for providing power usage forecasting information of a first period including a current month with respect to an electronic apparatus of a user, a memory configured to store power actual-use data indicating power usage of an electronic apparatus of each of a plurality of users, and a processor configured to load power actual-use data of the user during a predetermined second period before the current month … generate the power usage forecasting information of the first period with respect to the electronic apparatus of the user”; It can be seen that the AI apparatus receives calendar data indicating time data (first period including a current month and second period) corresponding to the power usage data of an electronic apparatus of each of a plurality of users to in part generate the power usage forecasting information],
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the processes of receiving present environmental data and present calendar data corresponding to first electricity consuming objects of a first cluster, among a plurality of electricity consuming objects corresponding to a plurality of clusters, from external electronic devices corresponding to the first electricity consuming objects of Moon. Doing so would help generating the power usage forecasting information of the first period with respect to the electronic apparatus of the user (Moon, 0009).
Cini and Moon do not explicitly teach
input, into a first cluster model, the present environmental data and the present calendar data, the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster;
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster;
a dense layer shared across the prediction tasks; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects;
display, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster,
wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects,
wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time, and
wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming
    PNG
    media_image5.png
    6
    5
    media_image5.png
    Greyscale
objects of the first cluster.
Yildiz teaches 
input, into a first cluster model, the present environmental data and the present calendar data [Section II, Pg. 874, Column 1 Lines 49-55, Column 2 Lines 1-3, "In a SMBM, a chosen machine learning model learns the relationship between the target forecasted loads and input data, which consist of historical forms (lags) of smart meter measurements, weather data (temperature, humidity, wind speed etc.) and temporal information (calendar information such as hour of day, day of week, holidays etc.). An example SMBM schematic can be seen in Figure 1. In the first stage, the input data is cleaned and pre-processed followed by organizing lags of load, weather and calendar variables for each time step"]; and 
wherein the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects [Section II, Pg. 874-875, Column 2 Lines 26-35, Column 1 Lines 1-3, "For this study we use three machine learning models in the training stage: Artificial Neural Network Ensembles (ANN-E), Support Vector Regression (SVR) [18] and Least-Squares Support Vector Regression (LSSVR). Each model is trained and 10-fold cross-validated. For all models, during the cross-validation stage, model parameters are optimized and tuned to improve the forecast performance. Due to the space limitations, we guide readers to the respective references above for more technical details and parameter tuning process of these models. Amongst these three models, the best forecast performance was obtained by SVR, hence it was chosen to be used for the proposed CCF method which is described next"],
wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time [Section IV, Pg. 877, Column 2, Lines 16-24, "Following the example demonstration of the CCF method, its implementation was extended to the entire 300 households. Considering the different clustering models, daily profile representations and different number of clusters, thirty CCF combinations were tested on the data-set. Figure 7 shows the NRMSE (%) box plots for the top 8 best performing CCF with two simpler SMBMs for reference. It can be seen that CCF methods result in smaller NRMSE (%) and most have narrower error band than simpler SMBMs"], 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the first cluster model is trained based on first reference electricity consumption data, first reference environmental data, and first reference calendar data, with regard to the first cluster corresponding to the first electricity consuming objects, wherein the plurality of electricity consuming objects are clustered into the plurality of clusters based on reference electricity consumption data, reference environmental data, and reference calendar data that are obtained over a period of time of Yildiz. Doing so would help training the model for the particular cluster to predict the load at each time step (Yildiz, Pg. 4, Column 1, lines 29-31).
Yildiz also teaches 
wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster [Section II, Pg. 876, Column 1 Lines 35-45, Column 2 Lines 1-6, "In order to measure the forecast performance of the models, the following metrics are used: normalized root mean squared error (NRMSE), normalized mean absolute error (NMAE) and normalized mean bias error (NMBE); where the household mean load is used to normalize these metrics. Moreover, in order to assess the error on forecast accuracy, two metrics are defined: AElarge for loads larger than 500W and AEsmall for loads smaller than 500W [8]. The metrics are described in the equations below:

    PNG
    media_image2.png
    126
    242
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    128
    332
    media_image3.png
    Greyscale



    PNG
    media_image4.png
    68
    480
    media_image4.png
    Greyscale

It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster of Yildiz. Doing so would help measuring the forecast performance of the model using the loss function (Yildiz, Pg. 4, Column 1, lines 37-40).
Cini, Moon and Yildiz do not teach
the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster;
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster;
a dense layer shared across the prediction tasks; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects;
display, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster,
Chen teaches
the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster [Fig. 7C, paragraphs 0117-0118, “an RNN can be constructed and used in the process 700A … RNN suitable to process sequences of inputs. In an example, a long short-term memory (LSTM) network can be used. The LSTM is a. type of RNN architecture … an LSTM 5 network 700C that can include one input layer 761”];
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster [Fig. 7C, paragraphs 0117-0118, “The intermediate learning layers 762 can include two to four LSTM layers (e.g., three LSTM layers as shown in FIG. 7C)”]; 
a dense layer shared across the prediction tasks [Fig. 7C, paragraphs 0117-0118, “The dense layer 753”]; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects [Fig. 7C, paragraphs 0117-0118, “The dense layer 753 can be a fully connected output layer”];
Since Cini in Figs. 1-2, Section A, Pg. 3, Column 2, line 9 to Pg. 4 Column 1, line 7, and Section B, Pg. 4, Column 1, lines 13-28, teaches MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that processes input to generate load forecasting for consumers, “we propose CBAF in multi-task learning setting … the CBAF-CP architecture is inspired by the Multitask Learning literature [8], where the objective is to achieve better generalization sharing the representation of the input data across tasks … We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs)”, while Chen teaches the model comprises an input layer, an LSTM block comprising a plurality of LSTM layers, a dense layer and an output layer, therefore, the combination of Cini and Chen teaches the above claim limitations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the model comprises an input layer, an LSTM block comprising a plurality of LSTM layers, a dense layer and an output layer of Chen. Doing so would help training the model to predict the aggregated energy consumption at each time step of the prediction horizon (Cini, Pg. 4, Col. 1).
Cini, Moon, Yildiz and Chen do not teach
display, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster.
Li teaches
display, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster [abstract, “a method for predicting energy consumption data based on a time series of consumption data”, paragraph 0007, “a prediction module configure to estimate predicted energy consumption data based on the trained prediction function and a type of the day for which the prediction is performed, and an output module configure to output the estimated predicted energy consumption data to at least one of a user interface”; claim 9, “output the estimated predicted energy consumption data to at least one of a user interface display”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include displaying, via display, a representation of the future electricity consumption of Li. Doing so would help monitoring and comparing a predicted energy consumption with a true energy consumption of the consumer in real-time (Li, 0004).

As per claim 15, Cini, Moon, Yildiz, Chen and Li teach the non-transitory computer-readable medium of claim 13.
Cini further teaches
predict, based on an output of the second cluster model, a future electricity consumption for each of the second electricity consuming object of the second cluster [Section IV, Pg. 4, Column 1, Lines 19-29, "This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feed forward fully-connected network heads (FCNNs). Each head, one per cluster, can include multiple hidden layers and is trained to predict the aggregated energy consumption at each time-step of the prediction horizon H. In particular, we use the Multi input - Multi output (MIMO) prediction strategy which consists in predicting the values to be forecasted for each time lag all at once"];
obtain a final forecast based on the predicted future electricity consumption of the first and the second electricity consuming objects of the first and the second clusters [Section V, Pg. 6, Column 2, Lines 10-21, "Results of the empirical evaluation are shown in Table II for the CER dataset and in Table III for the Arbon dataset. For both datasets, using CBAF with spectral clustering results in a substantial improvement in prediction accuracy over a naïve aggregation strategy. CBAF, as we could expect, appears to be more beneficial for the more complex and noisier dataset, where the SM data are more heterogeneous. In particular CBAF-CP achieves a 23% improvement in MAE over the standard STLF predictor on the Arbon dataset. However, the improvement in prediction accuracy is remarkable also in the CER - simpler - dataset case, confirming the results of the analyses conducted in previous works"],
wherein the second cluster model comprises a second multi-task learning process having a second joint loss function, and each input of the second multi-task learning process corresponds to a respective electricity consuming object of the second cluster [Section II, Pg. 2, Column 1, Lines 22-28," Given a prediction horizon H (i.e., the number of steps ahead to predict) and a prediction window W (i.e., the number of previous steps used to make the prediction), this translates into minimizing a loss function, e.g., the prediction mean squared error (MSE):

    PNG
    media_image1.png
    50
    392
    media_image1.png
    Greyscale

where s_t = (s[t-W], …, s[t]), u_t is a vector of exogenous variables, ϴ represents the model's parameter vector and Yh is the h-step-ahead prediction"]. 
Yildiz further teaches
input, into a second cluster model, present environmental data and present calendar data corresponding to second electricity consuming objects of a second cluster among the plurality of clusters [Section II, Pg. 874, Column 1 Lines 49-55, Column 2 Lines 1-3, "In a SMBM, a chosen machine learning model learns the relationship between the target forecasted loads and input data, which consist of historical forms (lags) of smart meter measurements, weather data (temperature, humidity, wind speed etc.) and temporal information (calendar information such as hour of day, day of week, holidays etc.). An example SMBM schematic can be seen in Figure 1. In the first stage, the input data is cleaned and pre-processed followed by organizing lags of load, weather and calendar variables for each time step"; Since Cini in Figs. 1-2 teaches the load data are input into multiple models (including the second model) to perform load forecast, while Tildiz teaches the present environmental data and present calendar data are input into the model, therefore, the combination pf Cini and Yildiz teaches the above claim limitation];
wherein the second cluster model is trained based on second reference electricity consumption data, second reference environmental data, and second reference calendar data, with regard to the second cluster among the plurality of clusters [Section II, Pg. 874-875, Column 2 Lines 26-35, Column 1 Lines 1-3, "For this study we use three machine learning models in the training stage: Artificial Neural Network Ensembles (ANN-E), Support Vector Regression (SVR) [18] and Least-Squares Support Vector Regression (LSSVR). Each model is trained and 10-fold cross-validated. For all models, during the cross-validation stage, model parameters are optimized and tuned to improve the forecast performance. Due to the space limitations, we guide readers to the respective references above for more technical details and parameter tuning process of these models. Amongst these three models, the best forecast performance was obtained by SVR, hence it was chosen to be used for the proposed CCF method which is described next"],
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include training the second cluster model using second reference electricity consumption data, second reference environmental data, and second reference calendar data, with regard to the second cluster among the plurality of clusters of Yildiz. Doing so would help training the model for the particular cluster to predict the load at each time step (Yildiz, Pg. 4, Column 1, lines 29-31).

As per claim 16, Cini, Moon, Yildiz, Chen and Li teach the non-transitory computer-readable medium of claim 15.
Yildiz further teaches
the first and the second joint loss functions of the first and the second multi-task learning processes treat all learning tasks with equal importance [Section V, Pg. 877, Column 1, Lines 13-26, "Following the clustering step, the CART was trained with the household's respective inputs and cluster labels of the training daily profiles. After the optimization and 10-fold ross-validation of the model, the classification accuracy was obtained as 71%. This can be considered as a fairly good classification performance considering that there were four clusters and the training daily profiles were distributed almost evenly across the clusters (a random predictor would achieve 25 % accuracy). The following figure demonstrates the importance of the classification in put variables in terms of their predictive capabilities, when assigning daily profiles into one of the four clusters. For this specific household, the forecasted day's expected mean temperature and current day's mean load value are the two most important predictors"].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the first and the second joint loss functions of the first and the second multi-task learning processes treat all learning tasks with equal importance of Yildiz. Doing so would help training the models to forecast the total load (Cini, Figs. 1-2, Pg. 2).

As per claim 17, Cini teaches a method of load forecasting using a Long Short-Term Memory (LSTM) based multi-task deep learning [abstract, “predict the load requested by the grid”; Section I, Pg. 1, Column 2, Lines 27-40, "In this work we study CBAF in the context of deep learning; in particular, we present a novel CBAF architecture suitable for deep learning models and introduce a neural network implementing it. Taking inspiration from the Multitask Learning literature (MTL), we consider the cluster level aggregates as a single, multivariate, sequence and feed it in to a single deep recurrent model with multiple output modules predicting the cluster level energy demand. We also present a practical method to cluster long time-series for CBAF. Finally, we evaluate our approach on a publicly available benchmark dataset, showing the benefits of the proposed approach and test our method in a new challenging dataset from the Swiss town of Arbon, providing empirical evidence that CBAF is a viable technique to improve prediction accuracy in STLF (short-term load forecasting)"; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”], the method comprising: 
clustering the plurality of electricity consuming objects into a plurality of clusters based on the obtained reference electricity consumption data, the plurality of clusters comprising a first cluster and a second cluster [Figs. 1-2, disclose how the loads from the load consumers are clustered; Section IV, Pg. 5, Column 1, Lines 24-30, "For each pair of load profiles we compute a similarity measure as the average Person correlation coefficient between the subsequences obtained at the previous step 1. We use the similarity matrix obtained at the previous step to build a K-Nearest neighbor graph of the dataset. We perform spectral clustering using the graph representation obtained at the previous step”]; 
a first cluster model [Figs. 1-2, Section A, Pg. 3, Column 2, lines 9-13, “we propose CBAF in multi-task learning setting, studying predictors that require to train only a single model to forecast all the cluster-level loads at once. We focus on models that take as an input all the cluster-level aggregates as a single multivariate time-series”; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs)];
wherein the first cluster model comprises MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that is specific to the first cluster [Figs. 1-2, Section A, Pg. 3, Column 2, line 9 to Pg. 4 Column 1, line 7, “we propose CBAF in multi-task learning setting … the CBAF-CP architecture is inspired by the Multitask Learning literature [8], where the objective is to achieve better generalization sharing the representation of the input data across tasks”; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs)];
predicting, based on an output of the first cluster model, a future electricity consumption for each of the first electricity consuming objects of the first cluster [Section IV, Pg. 4, Column 1, Lines 19-29, "This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs). Each head, one per cluster, can include multiple hidden layers and is trained to predict the aggregated energy consumption at each time-step of the prediction horizon H. In particular, we use the Multi input- Multi output (MIMO) prediction strategy which consists in predicting the values to be forecasted for each time lag all at once"; Section VI, Pg. 7, “electrical load forecasting based on a clustering algorithm that groups load profiles based on their correlation”], and
wherein the future electricity consumption refers to an amount of consumption of the electricity over a predetermined period of time in the future [Section I, Pg. 1, Column 1, lines 24-35, “time-series describing the energy consumption of a specific customer as measured by a smart meter (SM) … We focus on the day-head prediction task, i.e., we aim at predicting the load on the grid for each time-step of the next 24 hours”],
wherein the first cluster model comprises a first multi-task learning process having a first joint loss function, and each input of the first multi-task learning process corresponds to a respective electricity consuming object of the first cluster [Section II, Pg. 2, Column 1, Lines 22-28, "Given a prediction horizon H (i.e., the number of steps ahead to predict) and a prediction window W (i.e., the number of previous steps used to make the prediction), this translates into minimizing a loss function, e.g., the prediction mean squared error (MSE):

    PNG
    media_image1.png
    50
    392
    media_image1.png
    Greyscale

where s_t = (s[t-W], …, s[t]), u_t is a vector of exogenous variables, ϴ represents the model's parameter vector and Yh is the h-step-ahead prediction"], and
wherein predicted future electricity consumption values for the electricity consuming objects are input into a fully connected neural network aggregation layer configured to generate an aggregated load forecast for the first cluster [Fig. 1 discloses a process of clustering the loads of the consumers and inputting into the predictors to generate the cluster-level forecasts, the forecasts are then used to generate a total load forecast suing the fully connected neural network which is shown in Fig. 2; Section B, Pg. 4, Column 1, lines 13-28, “We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs). Each head, one per cluster, can include multiple hidden layers and is trained to predict the aggregated energy consumption at each time-step of the prediction horizon H. In particular, we use the Multi input - Multi output (MIMO) prediction strategy [21], which consists in predicting the values to be forecasted for each time-lag all at once]. 
Cini does not explicitly teach
obtaining reference electricity consumption data, reference environmental data, and reference calendar data for a plurality of electricity consuming objects over a period of time; 
obtaining a first cluster model based on: 
first reference electricity consumption data, among the obtained reference electricity consumption data, corresponding to first electricity consuming objects of the first cluster; 
first reference environmental data, among the obtained reference environmental data, corresponding to the first electricity consuming objects of the first cluster; and 
first reference calendar data, among the obtained reference calendar data, corresponding to the first electricity consuming objects of the first cluster; 
inputting, into the first cluster model, present environmental data and present calendar data corresponding to the first electricity consuming objects of the first cluster, the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster;
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster;
a dense layer shared across the prediction tasks; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects;
wherein the present environmental data refers to information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data, 
wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year, 
displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster,
wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster,
Yildiz teaches 
obtaining reference electricity consumption data, reference environmental data, and reference calendar data for a plurality of electricity consuming objects over a period of time [Section IV, Pg. 877, Column 2, Lines 16-24, "Following the example demonstration of the CCF method, its implementation was extended to the entire 300 households. Considering the different clustering models, daily profile representations and different number of clusters, thirty CCF combinations were tested on the data-set. Figure 7 shows the NRMSE (%) box plots for the top 8 best performing CCF with two simpler SMBMs for reference. It can be seen that CCF methods result in smaller NRMSE (%) and most have narrower error band than simpler SMBMs"]; 
first reference electricity consumption data, among the obtained reference electricity consumption data, corresponding to first electricity consuming objects of the first cluster [Section III, Pg. 877, Column 1, Lines 1-8, "In order to demonstrate how the CCF method works, an example household was randomly chosen (house ID=17). Figure 4 demonstrates the clustering results obtained with SOM and using statistical parameters as daily representations (987 daily profiles were used for the training phase). The optimum number of clusters, which was four for this example, was decided by the use of both clustering validity indices and manual inspection"];
first reference environmental data, among the obtained reference environmental data, corresponding to the first electricity consuming objects of the first cluster [Section III, Pg. 876, Column 2, Lines 9-26, "The used data-set, 'Solar home electricity data', consists of 300 households with rooftop Photovoltaic (PV) systems randomly selected from the Ausgrid network, an electricity distribution network provider in the Greater Sydney area of New South Wales (NSW), Australia. According to Koppen Climate Classification, the region's climate is classified as 'Cfa' which represents humid subtropical climates. Each household's electricity load was measured for 1096 days between 1 July 2010 to 30 June 2013 in half hour resolution. Most of the average daily load profile of each customer exhibits a typical Australian household profile with distinguishable morning and evening peaks which is shown in Figure 3. The average daily consumption is around 18 kWh for the household stock, while the minimum and maximum average daily consumption are 5 kWh to 35.5 kWh respectively. For some households, the average daily load standard deviation is around 0.3 kWh, but in some cases the daily standard deviation is as high as 2. 8 kWh"]; and
first reference calendar data, among the obtained reference calendar data, corresponding to the first electricity consuming objects of the first cluster [Section III, Pg. 877, Column 1 Lines 26-27, Column2 Line 28-42, "The obtained clusters were then used to build specific SMBMs. In particular, four different SMBMs were trained by using the cluster specific profile information in addition to the lags of load, weather and other temporal predictors. An independent test day, '01/06/2013' was randomly chosen and shown here for the demonstration of predictions. The CART classifies this day into the cluster#2 according to the expected weather variables and temporal variables of '01/06/2013' and the load information from '31/05/2013'. The SMBM built for cluster#2 is used to predict the daily profile. Figure 6 shows the daily profile loads predicted by the CCF and the simpler SMBM approach. It can be seen that the CCF method has superior ability in forecasting the overall shape of the load profile. For this particular household, the obtained NRMSE and NMAE are 38% and 19% respectively whereas the simpler SMBM obtains an NRMSE and NMAE of 52% and 26% respectively"];
inputting, into the first cluster model, present environmental data and present calendar data corresponding to the first electricity consuming objects of the first cluster [Section II, Pg. 874, Column 1 Lines 49-55, Column 2 Lines 1-3, "In a SMBM, a chosen machine learning model learns the relationship between the target forecasted loads and input data, which consist of historical forms (lags) of smart meter measurements, weather data ( temperature, humidity, wind speed etc.) and temporal information ( calendar information such as hour of day, day of week, holidays etc.). An example SMBM schematic can be seen in Figure 1. In the first stage, the input data is cleaned and pre-processed followed by organizing lags of load, weather and calendar variables for each time step"];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include obtaining reference electricity consumption data, reference environmental data, and reference calendar data for a plurality of electricity consuming objects over a period of time Yildiz. Doing so would help training the model for the particular cluster to predict the load at each time step (Yildiz, Pg. 4, Column 1, lines 29-31).
Yildiz also teaches 
wherein the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster [Section II, Pg. 876, Column 1 Lines 35-45, Column 2 Lines 1-6, "In order to measure the forecast performance of the models, the following metrics are used: normalized root mean squared error (NRMSE), normalized mean absolute error (NMAE) and normalized mean bias error (NMBE); where the household mean load is used to normalize these metrics. Moreover, in order to assess the error on forecast accuracy, two metrics are defined: AElarge for loads larger than 500W and AEsmall for loads smaller than 500W [8]. The metrics are described in the equations below:

    PNG
    media_image2.png
    126
    242
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    128
    332
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    68
    480
    media_image4.png
    Greyscale

It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the first joint loss function of the first multi-task learning process optimizes a task corresponding to a single electricity consuming object among the first electricity consuming objects of the first cluster of Yildiz. Doing so would help measuring the forecast performance of the model using the loss function (Yildiz, Pg. 4, Column 1, lines 37-40).
Cini and Yildiz do not teach
the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster;
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster;
a dense layer shared across the prediction tasks; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects;
wherein the present environmental data refers to information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data, 
wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year, 
displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster,
Moon teaches
wherein the present environmental data refers to current information about an environment corresponding to the electricity consuming objects, comprising temperature data and weather data [paragraphs 0009-0010, “An artificial intelligence (AI) based apparatus for forecasting energy usage includes a communication unit configured to receive a request for providing power usage forecasting information of a first period including a current month with respect to an electronic apparatus of a user … the processor may calculate a daily forecasting weight given a weight based on weather information of the first period, divide a use period of the electronic apparatus into first to third sections, and classify the user into any one of a low user, a middle user or a high user according to the periods to generate the power usage forecasting information”; Fig. 6, S22, “In: data on weather (temperature, fine dust) and weekday/holiday of current month”; paragraph 0171, “a monthly power usage temperature correction value may be calculated using a difference between a temperature measured on a weekday in the past and an average temperature”; It can be seen that the AI apparatus receives environmental data comprising weather and temperature data to in part generate the power usage forecasting information],
wherein the present calendar data refers to current time-related information corresponding to the electricity consuming objects, comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year [paragraphs 0009-0010, “An artificial intelligence (AI) based apparatus for forecasting energy usage includes a communication unit configured to receive a request for providing power usage forecasting information of a first period including a current month with respect to an electronic apparatus of a user, a memory configured to store power actual-use data indicating power usage of an electronic apparatus of each of a plurality of users, and a processor configured to load power actual-use data of the user during a predetermined second period before the current month … generate the power usage forecasting information of the first period with respect to the electronic apparatus of the user”; It can be seen that the AI apparatus receives calendar data indicating time data (first period including a current month and second period) corresponding to the power usage data of an electronic apparatus of each of a plurality of users to in part generate the power usage forecasting information],
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the present environmental data comprising temperature data and weather data, the present calendar data comprising at least one of a current time, a present day of a week, the present day of a month, and the present day of a year, the future electricity consumption refers to an amount of consumption of the electricity over a predetermined period of time in the future of Moon. Doing so would help generating the power usage forecasting information of the first period with respect to the electronic apparatus of the user using the environmental data and calendar data (Moon, 0009).
Cini, Yildiz and Moon do not teach
the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster;
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster;
a dense layer shared across the prediction tasks; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects;
displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster,
Chen teaches
the model including:
an input layer configured to receive input data for multiple electricity consuming objects in the first cluster [Fig. 7C, paragraphs 0117-0118, “an RNN can be constructed and used in the process 700A … RNN suitable to process sequences of inputs. In an example, a long short-term memory (LSTM) network can be used. The LSTM is a. type of RNN architecture … an LSTM 5 network 700C that can include one input layer 761”];
an LSTM block comprising a plurality of LSTM layers shared across prediction tasks for respective electricity consuming objects in the first cluster [Fig. 7C, paragraphs 0117-0118, “The intermediate learning layers 762 can include two to four LSTM layers (e.g., three LSTM layers as shown in FIG. 7C)”]; 
a dense layer shared across the prediction tasks [Fig. 7C, paragraphs 0117-0118, “The dense layer 753”]; and
an output layer including outputs for respective prediction tasks corresponding to each of the electricity consuming objects [Fig. 7C, paragraphs 0117-0118, “The dense layer 753 can be a fully connected output layer”];
Since Cini in Figs. 1-2, Section A, Pg. 3, Column 2, line 9 to Pg. 4 Column 1, line 7, and Section B, Pg. 4, Column 1, lines 13-28, teaches MTL-LSTM model (Long Short-Term Memory based multi-task learning model) that processes input to generate load forecasting for consumers, “we propose CBAF in multi-task learning setting … the CBAF-CP architecture is inspired by the Multitask Learning literature [8], where the objective is to achieve better generalization sharing the representation of the input data across tasks … We introduce a specific neural network architecture, shown in Figure 2, for CBAF with deep models (it follows that the predictor box of Figure 1 is that of Figure 2). We use a shared recurrent backbone, that can be implemented using any recurrent (multilayer) neural network, such as Long Short-Term Memory networks (LSTMs)”. This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feedforward fully-connected network heads (FCNNs)”, while Chen teaches the model comprises an input layer, an LSTM block comprising a plurality of LSTM layers, a dense layer and an output layer, therefore, the combination of Cini and Chen teaches the above claim limitations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the model comprises an input layer, an LSTM block comprising a plurality of LSTM layers, a dense layer and an output layer of Chen. Doing so would help training the model to predict the aggregated energy consumption at each time step of the prediction horizon (Cini, Pg. 4, Col. 1).
Cini, Moon, Yildiz and Chen do not teach
displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster.
Li teaches
displaying, via display, a representation of the future electricity consumption for each of the first electricity consuming objects of the first cluster [abstract, “a method for predicting energy consumption data based on a time series of consumption data”, paragraph 0007, “a prediction module configure to estimate predicted energy consumption data based on the trained prediction function and a type of the day for which the prediction is performed, and an output module configure to output the estimated predicted energy consumption data to at least one of a user interface”; claim 9, “output the estimated predicted energy consumption data to at least one of a user interface display”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include displaying, via display, a representation of the future electricity consumption of Li. Doing so would help monitoring and comparing a predicted energy consumption with a true energy consumption of the consumer in real-time (Li, 0004).

As per claim 19, Cini, Moon, Yildiz, Chen and Li teach the method of claim 17.
Cini further teaches
predicting, based on an output of the second cluster model, a future electricity consumption for each of the second electricity consuming object of the second cluster [Figs. 1-2, Section IV, Pg. 4, Column 1, Lines 19-29, "This module takes as input a multivariate time-series of length W representing the load profiles aggregated at a cluster level, the last hidden state- here acting as a feature vector - is then fed into K feed forward fully-connected network heads (FCNNs). Each head, one per cluster, can include multiple hidden layers and is trained to predict the aggregated energy consumption at each time-step of the prediction horizon H. In particular, we use the Multi input - Multi output (MIMO) prediction strategy which consists in predicting the values to be forecasted for each time lag all at once"];
obtaining a final forecast based on the predicted future electricity consumption of the first and the second electricity consuming objects of the first and the second clusters [Figs. 1-2, Section V, Pg. 6, Column 2, Lines 10-21, "Results of the empirical evaluation are shown in Table II for the CER dataset and in Table III for the Arbon dataset. For both datasets, using CBAF with spectral clustering results in a substantial improvement in prediction accuracy over a naïve aggregation strategy. CBAF, as we could expect, appears to be more beneficial for the more complex and noisier dataset, where the SM data are more heterogeneous. In particular CBAF-CP achieves a 23% improvement in MAE over the standard STLF predictor on the Arbon dataset. However, the improvement in prediction accuracy is remarkable also in the CER - simpler - dataset case, confirming the results of the analyses conducted in previous works"],
wherein the second cluster model comprises a second multi-task learning process having a second joint loss function, and each input of the second multi-task learning process corresponds to a respective electricity consuming object of the second cluster [Figs. 1-2, Section II, Pg. 2, Column 1, Lines 22-28," Given a prediction horizon H (i.e., the number of steps ahead to predict) and a prediction window W (i.e., the number of previous steps used to make the prediction), this translates into minimizing a loss function, e.g., the prediction mean squared error (MSE):

    PNG
    media_image1.png
    50
    392
    media_image1.png
    Greyscale

where s_t = (s[t-W], …, s[t]), u_t is a vector of exogenous variables, ϴ represents the model's parameter vector and Yh is the h-step-ahead prediction"]. 
Yildiz further teaches
second reference electricity consumption data, among the obtained reference electricity consumption data, corresponding to second electricity consuming objects of the second cluster [Section III, Pg. 877, Column 1, Lines 1-8, "In order to demonstrate how the CCF method works, an example household was randomly chosen (house ID=17). Figure 4 demonstrates the clustering results obtained with SOM and using statistical parameters as daily representations (987 daily profiles were used for the training phase). The optimum number of clusters, which was four for this example, was decided by the use of both clustering validity indices and manual inspection"]; 
second reference environmental data, among the obtained environmental data, corresponding to the second electricity consuming objects of the second cluster [Section III, Pg. 876, Column 2, Lines 9-26, "The used data-set, 'Solar home electricity data', consists of 300 households with rooftop Photovoltaic (PV) systems randomly selected from the Ausgrid network, an electricity distribution network provider in the Greater Sydney area of New South Wales (NSW), Australia. According to Koppen Climate Classification, the region's climate is classified as 'Cfa' which represents humid subtropical climates. Each household's electricity load was measured for 1096 days between 1 July 2010 to 30 June 2013 in half hour resolution. Most of the average daily load profile of each customer exhibits a typical Australian household profile with distinguishable morning and evening peaks which is shown in Figure 3. The average daily consumption is around 18 kWh for the household stock, while the minimum and maximum average daily consumption are 5 kWh to 35.5 kWh respectively. For some households, the average daily load standard deviation is around 0.3 kWh, but in some cases the daily standard deviation is as high as 2. 8 kWh"]; and
second reference calendar data, among the obtained reference calendar data, corresponding to the second electricity consuming objects of the second cluster [Section III, Pg. 877, Column 1 Lines 26-27, Column2 Line 28-42, "The obtained clusters were then used to build specific SMBMs. In particular, four different SMBMs were trained by using the cluster specific profile information in addition to the lags of load, weather and other temporal predictors. An independent test day, '01/06/2013' was randomly chosen and shown here for the demonstration of predictions. The CART classifies this day into the cluster#2 according to the expected weather variables and temporal variables of '01/06/2013' and the load information from '31/05/2013'. The SMBM built for cluster#2 is used to predict the daily profile. Figure 6 shows the daily profile loads predicted by the CCF and the simpler SMBM approach. It can be seen that the CCF method has superior ability in forecasting the overall shape of the load profile. For this particular household, the obtained NRMSE and NMAE are 38% and 19% respectively whereas the simpler SMBM obtains an NRMSE and NMAE of 52% and 26% respectively"];
inputting, into a second cluster model, present environmental data and present calendar data corresponding to second electricity consuming objects of a second cluster [Section II, Pg. 874, Column 1 Lines 49-55, Column 2 Lines 1-3, "In a SMBM, a chosen machine learning model learns the relationship between the target forecasted loads and input data, which consist of historical forms (lags) of smart meter measurements, weather data (temperature, humidity, wind speed etc.) and temporal information (calendar information such as hour of day, day of week, holidays etc.). An example SMBM schematic can be seen in Figure 1. In the first stage, the input data is cleaned and pre-processed followed by organizing lags of load, weather and calendar variables for each time step"; Since Cini in Figs. 1-2 teaches the load data are input into multiple models (including the second model) to perform load forecast, while Tildiz teaches the present environmental data and present calendar data are input into the model, therefore, the combination pf Cini and Yildiz teaches the above claim limitation];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include training the second cluster model using second reference electricity consumption data, second reference environmental data, and second reference calendar data, with regard to the second cluster among the plurality of clusters of Yildiz. Doing so would help training the model for the particular cluster to predict the load at each time step (Yildiz, Pg. 4, Column 1, lines 29-31).

As per claim 20, Cini, Moon, Yildiz, Chen and Li teach the method of claim 19.
Yildiz further teaches
the first and the second joint loss functions of the first and the second multi-task learning processes treat all learning tasks with equal importance [Section V, Pg. 877, Column 1, Lines 13-26, "Following the clustering step, the CART was trained with the household's respective inputs and cluster labels of the training daily profiles. After the optimization and 10-fold ross-validation of the model, the classification accuracy was obtained as 71%. This can be considered as a fairly good classification performance considering that there were four clusters and the training daily profiles were distributed almost evenly across the clusters (a random predictor would achieve 25 % accuracy). The following figure demonstrates the importance of the classification in put variables in terms of their predictive capabilities, when assigning daily profiles into one of the four clusters. For this specific household, the forecasted day's expected mean temperature and current day's mean load value are the two most important predictors"].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of Load Forecasting with Deep Neural Networks of Cini to include the first and the second joint loss functions of the first and the second multi-task learning processes treat all learning tasks with equal importance of Yildiz. Doing so would help training the models to forecast the total load (Cini, Figs. 1-2, Pg. 2).

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Lal et al. (US Pub. 2019/0205939) describes a LSTM network comprising an input layer, LSTM layers, dense layer and output layer.
Samuni et al. (US Pub. 2020/0355387) describes methods for monitoring a plurality of heating, ventilation, and air conditioning systems and predicting inefficient HVAC operation.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRI T NGUYEN whose telephone number is 571-272-0103. The examiner can normally be reached M-F, 8 AM-5 PM, (CT).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ can be reached at 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/TRI T NGUYEN/Examiner, Art Unit 2128    

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128
Read full office action
Prosecution Timeline

Jun 28, 2021
Application Filed
Sep 24, 2024
Non-Final Rejection — §101, §103
Dec 27, 2024
Response Filed
May 11, 2025
Final Rejection — §101, §103
Jul 22, 2025
Request for Continued Examination
Jul 30, 2025
Response after Non-Final Action
Jan 02, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/011,734
Patent 12572820
METHODS AND SYSTEMS FOR GENERATING KNOWLEDGE GRAPHS FROM PROGRAM SOURCE CODE
2y 5m to grant Granted Mar 10, 2026
16/976,398
Patent 12536418
PERTURBATIVE NEURAL NETWORK
2y 5m to grant Granted Jan 27, 2026
17/083,367
Patent 12524662
BLOCKCHAIN FOR ARTIFICIAL INTELLIGENCE TRAINING
2y 5m to grant Granted Jan 13, 2026
17/277,118
Patent 12493963
JOINT UNSUPERVISED OBJECT SEGMENTATION AND INPAINTING
2y 5m to grant Granted Dec 09, 2025
16/972,222
Patent 12468974
QUANTUM CONTROL DEVELOPMENT AND IMPLEMENTATION INTERFACE
2y 5m to grant Granted Nov 11, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
68%
Grant Probability
82%
With Interview (+13.2%)
3y 10m
Median Time to Grant
High
PTA Risk
Based on 183 resolved cases by this examiner. Grant probability derived from career allow rate.