Last updated: April 19, 2026
Application No. 18/055,414
Federated Learning-Based Regional Photovoltaic Power Probabilistic Forecasting Method and Coordinated Control System

Non-Final OA §103§112
Filed
Nov 15, 2022
Examiner
KIM, SEHWAN
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
North China Electric Power University
OA Round
1 (Non-Final)
This examiner grants 60% of cases after interview

— +65.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 144 resolved cases, 2023–2026
Examiner Intelligence

KIM, SEHWAN View full profile →
Grants 60% of resolved cases
Career Allow Rate
86 granted / 144 resolved
+4.7% vs TC avg
Strong +66% interview lift
Without
With
+65.6%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
35 currently pending
Career history
179
Total Applications
across all art units
Statute-Specific Performance

§101
20.8%
-19.2% vs TC avg
§103
46.2%
+6.2% vs TC avg
§102
6.3%
-33.7% vs TC avg
§112
23.3%
-16.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 144 resolved cases
Office Action

§103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner’s Note
The Examiner encourages Applicant to schedule an interview to discuss issues related to, for example, the rejections noted below under 35 U.S.C § 112, § 101 and § 103, for moving forward allowance.
Providing supporting paragraph(s) for each limitation of amended/new claim(s) in Remarks is strongly requested for clear and definite claim interpretations by Examiner.

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No CN202010458444.5, filed on 05/27/2020.

Allowable Subject Matter
Claim 3 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) and claim objections set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The following is an examiner’s statement of reasons for allowance:
The prior art of record, Saputra et al. (Energy Demand Prediction with Federated Learning for Electric Vehicle Networks) teaches a federated energy demand learning (FEDL) approach which allows the charging stations sharing their information without revealing real datasets. Specifically, the charging stations only need to send their trained models to the charging station provider for processing in order to significantly reduce the communication overhead and effectively protect data privacy for the electric vehicle users.
Yang et al. (LSTM-Attention-Embedding Model-Based Day-Ahead Prediction of Photovoltaic Power Output Using Bayesian Optimization) teaches an LSTM-attention-embedding model based on Bayesian optimization to predict the day-ahead photovoltaic power output. The statistical features at multiple time scales, combined features, time features and wind speed categorical features are explored for photovoltaic related meteorological factors. A deep learning model is constructed based on an LSTM block and an embedding block with the connection of a merge layer. 
Patro et al. (Normalization: A Preprocessing Stage) teaches a data normalization technique which gives the scaled or transformed or structured or normalized one dataset for research work within the range 0 and 1 to make a dataset well-structured or make it into the structured one. Normalization is scaling technique or a mapping technique or a preprocessing stage, and it can be helpful for the prediction or forecasting purpose a lot, and can make the large variation of prediction and forecasting being close.
Moorthy et al. (US 2020/0042920 A1) teaches replacing an outlier value with an interpolation between two or more adjacent data points, a weighted average, and/or a moving average. Since machine learning algorithms tend to be sensitive to the range and distribution of attribute values, addressing outlier data can be important in avoiding situations where the training process is unduly extended or altered due to outlier data. 
NOOKALA et al. (US 20200226205 A1) teaches one-hot encoded vector representations for neural network by capturing a spatial distribution of source and destination for a given set of data. For example, hour is the 24-dimensional hour of the day, day is the 7-dimensional day of the week categorical variable representing Monday to Sunday, and isWeekend is the 1-dimensional Boolean that denotes weekend or weekday.
ZHAO et al. (Vehicle Accident Risk Prediction Based on AdaBoost-SO in VANETs) teaches One-Hot encoding method which uses an N-bit status register to encode N states. Each state has a separate register bit, and only one bit is valid at any time. That is, for each feature, if it has m possible values, it will become m binary features after being One-Hot encoded. 
However, the claim in the application is/are deemed to be directed to a nonobvious improvement over the prior art of record.

Claim Objections
Claim(s) 1-10 is/are objected to because of the following informalities.
Claim(s) 1 is/are objected to because of the following informalities: 
“the processed sample dataset” (step 3) needs to read “the pre-processed sample dataset” or something else. Appropriate correction is required. 
“the updated global model” (step 12) needs to read “the updated global forecasting model” or something else. Appropriate correction is required.
“the global model” (step 13) needs to read “the global forecasting model” or something else. Appropriate correction is required.
Claim(s) 9 is/are objected to because of the following informalities: “the global model” (line 3) needs to read “the global forecasting model” or something else. Appropriate correction is required. 
Claim(s) 1, 9 each recite(s) limitations that raise issues of indefiniteness as set forth above, and their dependent claims are objected to at least based on their direct and/or indirect dependency from the claims listed above. Appropriate explanation and/or amendment is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim(s) 1-10 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim(s) 1 recite(s) the limitation “the local training set and testing set” (step 9). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a local training set and a local testing set”, or something else. For the purposes of examination, “a local training set and a local testing set” is used.
The term “apparently” (claim 3, line 3) is a relative term which renders the claim indefinite. The term “apparently” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
The term “recent” (claim 3, lines 3-4) is a relative term which renders the claim indefinite. The term “recent” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
Claim(s) 3 recite(s) the limitation “the average value” (line 3). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “an average value”, or something else. For the purposes of examination, “an average value” is used.
Claim(s) 3 recite(s) the limitation “the global irradiances” (line 5). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “global irradiances”, or something else. For the purposes of examination, “global irradiances” is used.
Claim(s) 3 recite(s) “N” (2nd last line). However, it is not clear what it means. The specification and/or the claims do not define it, and persons skilled in the art cannot clearly and precisely determine the metes and bounds of the claimed invention. It appears that the claim may need to include “wherein N denotes an integer greater than 0” or something else. For the purposes of examination, “wherein N is an integer greater than 0” is used.
Claim(s) 5 recite(s) the limitation “the original numerical value” (line 6). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “an original numerical value”, or something else. For the purposes of examination, “an original numerical value” is used.
Claim(s) 5 recite(s) the limitation “the mean value of variable A” (line 7). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a mean value of variable A”, or something else. For the purposes of examination, “a mean value of variable A” is used.
Claim(s) 5 recite(s) the limitation “the standard deviation” (line 7). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a standard deviation”, or something else. For the purposes of examination, “a standard deviation” is used.
Claim(s) 5 recite(s) the limitation “the variable” (line 7). There is insufficient antecedent basis for this limitation in the claim. It is not clear if it indicates “variable A” or one of “variables” (claim 1, line 5) or something else. It appears it may need to read “the variable A”, or something else. For the purposes of examination, “the variable A” is used.
Claim(s) 6 recite(s) the limitation “the central server” (line 3). There is insufficient antecedent basis for this limitation in the claim. It is not clear if it indicates “a central server” (claim 6, line 2) or “a central server” (claim 1, step 6) or something else. It appears “a central server” (claim 6, line 2) may need to read “the central server” or something else. For the purposes of examination, “the central server” (claim 6, line 2) is used.
The term “mainly” (claim 7, line 4) is a relative term which renders the claim indefinite. The term “mainly” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
Claim(s) 8 recite(s) the limitation “the model training error function” (line 2). There is insufficient antecedent basis for this limitation in the claim. It is not clear if it means “a model training error function” or indicates “a training error function” (claim 1, step 7) or something else. It appears it may need to read “the training error function” or something else. For the purposes of examination, “the training error function” is used.
Claim(s) 8 recite(s) the limitation “the ith measured photovoltaic power” (line 5). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “an ith measured photovoltaic power” or something else. For the purposes of examination, “an ith measured photovoltaic power” is used.
Claim(s) 8 recite(s) the limitation “the dataset” (line 5). There is insufficient antecedent basis for this limitation in the claim. It is not clear if it means “a dataset” or indicates “a sample dataset” (claim 1), or something else. It appears it may need to read “the sample dataset” or something else. For the purposes of examination, “the sample dataset” is used.
Claim(s) 8 recite(s) the limitation “the number of pieces of data” (line 7). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a number of pieces of data” or something else. For the purposes of examination, “a number of pieces of data” is used.
Claim(s) 9 recite(s) the limitation “the model” (line 2). There is insufficient antecedent basis for this limitation in the claim. It is not clear if it indicates “a global forecasting model” (claim 1), or one of “local forecasting models” (claim 1), or something else. It appears it may need to read “a model” or something else. For the purposes of examination, “a model” is used.
Claim(s) 9 recite(s) the limitation “the training model” (line 3). There is insufficient antecedent basis for this limitation in the claim. It is not clear if it indicates “a global forecasting model” (claim 1), or one of “local forecasting models” (claim 1), or “the model” (claim 9, line 2), or something else. It appears it may need to read “the model” or something else. For the purposes of examination, “the model” is used.
Claim(s) 9 recite(s) the limitation “the last round” (line 4). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a last round” or something else. For the purposes of examination, “a last round” is used.
Claim(s) 10 recite(s) the limitation “the central server” (line 6). There is insufficient antecedent basis for this limitation in the claim. It is not clear if it indicates “a central server” (claim 1), or “a central server” (claim 10, line 4), or something else. It appears that “a central server” (claim 10, line 4) may need to read “a particular central server”, and “the central serve” (claim 10, line 6) may need to read “the particular central serve”, or something else. For the purposes of examination, “a particular central server” and “the particular central serve” are used. In addition, claim(s) 10 (last line) is/are rejected for the same reason.
Claim(s) 1, 3, 5-10 each recite(s) limitations that raise issues of indefiniteness as set forth above, and their dependent claims are rejected at least based on their direct and/or indirect dependency from the claims listed above. Appropriate explanation and/or amendment is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 4-6, 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Saputra et al. (Energy Demand Prediction with Federated Learning for Electric Vehicle Networks) in view of Yang et al. (LSTM-Attention-Embedding Model-Based Day-Ahead Prediction of Photovoltaic Power Output Using Bayesian Optimization) in view of Patro et al. (Normalization: A Preprocessing Stage)

Regarding claim 1

(Note: Hereinafter, if a limitation has bold brackets (i.e. [·]) around claim languages, the bracketed claim languages indicate that they have not been taught yet by the current prior art reference but they will be taught by another prior art reference afterwards.)

Saputra teaches
A federated learning-based regional [photovoltaic] power probabilistic forecasting method, comprising steps of:
(Saputra [sec(s) I] “In this paper, we introduce state-of-the-art machine learning-based approaches which can not only significantly improve the accuracy of energy demand prediction, but also remarkably reduce the communication overhead for EV networks. In particular, we first introduce a communication model using the CSP as a centralized node to gather all information from the CSs in a considered area. We then develop an energy demand learning (EDL)-based solution utilizing deep learning method to help the CSP accurately predict energy demands for the CSs in this area. However, this approach requires the CSs to share their local data with the CSP, and thus it may suffer from serious overhead and privacy issues. To address these issues, we propose a novel federated energy demand learning (FEDL) approach in which the CSs only need to share their trained models obtained from their datasets instead of sharing their real datasets. To further improve the prediction accuracy, we develop the clustering-based EDL approach which can classify the CSs into several clusters before the learning process is performed. In this way, we can reduce the dimensionality of the dataset based on the useful feature classification [10], and thus the biased prediction can be minimized [11].”;)

step 1: pinpointing all [photovoltaic] power stations within a region which participate in a federated learning framework for probabilistic forecasting, collecting [weather] information and corresponding [photovoltaic] power variables within a time step, and grouping the variables according to time order into a sample dataset;
(Saputra [sec(s) I] “In particular, we first introduce a communication model using the CSP as a centralized node to gather all information from the CSs in a considered area. We then develop an energy demand learning (EDL)-based solution utilizing deep learning method to help the CSP accurately predict energy demands for the CSs in this area. However, this approach requires the CSs to share their local data with the CSP, and thus it may suffer from serious overhead and privacy issues. To address these issues, we propose a novel federated energy demand learning (FEDL) approach in which the CSs only need to share their trained models obtained from their datasets instead of sharing their real datasets. … We conduct extensive experimental results to evaluate the efficiency of the proposed methods using the real CS session dataset in Dundee city, the United Kingdom.” [sec(s) IV] “To evaluate the performance of the proposed learning methods, we use the real data obtained from charging stations in Dundee city, the United Kingdom between 2017 and 2018 [12]. In particular, the dataset has 65,601 transactions which include CS ID from 58 CSs, transaction ID for each CS, EV charging date, EV charging time, and consumed energy (in kWh) for each transaction. We use the first four information as the learning features, and the consumed energy as the learning label. Then, we classify CS ID, charging date, and charging time as categorical features. Specifically, we convert the charging date and charging time information into 7-day (i.e., 1, 2, . . . , 7) and 24-hour (i.e., 0, 1, . . . , 23) categories, respectively. In addition, each CS has the latitude and longitude information which will be used for clustering.”;)

step 2: pre-processing the sample dataset obtained in step 1;
(Saputra [sec(s) IV] “To evaluate the performance of the proposed learning methods, we use the real data obtained from charging stations in Dundee city, the United Kingdom between 2017 and 2018 [12]. In particular, the dataset has 65,601 transactions which include CS ID from 58 CSs, transaction ID for each CS, EV charging date, EV charging time, and consumed energy (in kWh) for each transaction. We use the first four information as the learning features, and the consumed energy as the learning label. Then, we classify CS ID, charging date, and charging time as categorical features. Specifically, we convert the charging date and charging time information into 7-day (i.e., 1, 2, . . . , 7) and 24-hour (i.e., 0, 1, . . . , 23) categories, respectively. In addition, each CS has the latitude and longitude information which will be used for clustering. … We split the dataset into 80%, 70%, 60%, as well as 50% training dataset, and the rest of the portions for testing dataset. From the training dataset, we divide the number of transactions by J training subsets, when FEDL is implemented.”;)

step 3: splitting the processed sample dataset of the [photovoltaic] power stations resulting from step 2 into a training set and a testing set according to a predetermined proportion;
(Saputra [sec(s) IV] “To evaluate the performance of the proposed learning methods, we use the real data obtained from charging stations in Dundee city, the United Kingdom between 2017 and 2018 [12]. In particular, the dataset has 65,601 transactions which include CS ID from 58 CSs, transaction ID for each CS, EV charging date, EV charging time, and consumed energy (in kWh) for each transaction. We use the first four information as the learning features, and the consumed energy as the learning label. Then, we classify CS ID, charging date, and charging time as categorical features. Specifically, we convert the charging date and charging time information into 7-day (i.e., 1, 2, . . . , 7) and 24-hour (i.e., 0, 1, . . . , 23) categories, respectively. In addition, each CS has the latitude and longitude information which will be used for clustering. … We split the dataset into 80%, 70%, 60%, as well as 50% training dataset, and the rest of the portions for testing dataset. From the training dataset, we divide the number of transactions by J training subsets, when FEDL is implemented.”;)

step 4: [normalizing] the training set and the testing set resulting from step 3;
(Saputra [sec(s) IV] “To evaluate the performance of the proposed learning methods, we use the real data obtained from charging stations in Dundee city, the United Kingdom between 2017 and 2018 [12]. In particular, the dataset has 65,601 transactions which include CS ID from 58 CSs, transaction ID for each CS, EV charging date, EV charging time, and consumed energy (in kWh) for each transaction. We use the first four information as the learning features, and the consumed energy as the learning label. Then, we classify CS ID, charging date, and charging time as categorical features. Specifically, we convert the charging date and charging time information into 7-day (i.e., 1, 2, . . . , 7) and 24-hour (i.e., 0, 1, . . . , 23) categories, respectively. In addition, each CS has the latitude and longitude information which will be used for clustering. … We split the dataset into 80%, 70%, 60%, as well as 50% training dataset, and the rest of the portions for testing dataset. From the training dataset, we divide the number of transactions by J training subsets, when FEDL is implemented.”;)

step 5: constructing the federated learning framework;
(Saputra [fig(s) 2] [sec(s) III] “Since the CSP needs to collect data from CSs, this centralized learning may lead to the communication overhead and data privacy concerns. To deal with these issues, we develop a framework using federated energy demand learning (FEDL) method. Particularly, the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2.”;)

step 6: building, by a central server based on a forecast requirement, a global forecasting model;
(Saputra [fig(s) 1] “Charging Stations (CSs)” and “Charging Station Provider (CSP)” [fig(s) 2] “global model” [sec(s) III] “Since the CSP needs to collect data from CSs, this centralized learning may lead to the communication overhead and data privacy concerns. To deal with these issues, we develop a framework using federated energy demand learning (FEDL) method. Particularly, the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2.” [sec(s) I] “Generally, the power grid supplies the energy for charging stations (CSs) once receiving requests from EVs [2]. However, this approach experiences a serious energy transfer congestion when a huge number of EVs charging the energy simultaneously [3] and lead to high energy transfer cost for the charging station provider (CSP)”;)

step 7: defining a training error function, an optimizer, and a learning rate of the global forecasting model built in step 6, and distributing network architecture and initialized parameters to each [photovoltaic] power station;
(Saputra [algorithm 1] “
    PNG
    media_image1.png
    308
    668
    media_image1.png
    Greyscale
” [algorithm 2] “Update and send υ(φ) back to J CSs” [fig(s) 1] “Charging Stations (CSs)” and “Charging Station Provider (CSP)” [fig(s) 2] “global model” [sec(s) III.A] “After ∇υ(φ) is obtained, the CSP updates the global model υ(φ) to minimize the prediction error, i.e., 
    PNG
    media_image2.png
    95
    265
    media_image2.png
    Greyscale
, using adaptive learning rate optimizer Adam [15] which produces fast convergence and significant robustness to the model. … The learning process repeats and then terminates when the prediction error converges, or a certain number of epoch time T is reached. In this case, the final global model υ∗ in the CSP is obtained to predict Yˆcsp of training dataset Xcsp and new dataset Xˆcsp using Eq. (1). The algorithm for energy demand prediction using EDL is summarized in Algorithm 1. The processes between Lines 4 and 11 are implemented in the CSP.” [sec(s) IV.B] “We also apply the adaptive learning rate Adam optimizer with initial step size 0.01 and tanh function as the activation function.” [sec(s) III.B] “Since the CSP needs to collect data from CSs, this centralized learning may lead to the communication overhead and data privacy concerns. To deal with these issues, we develop a framework using federated energy demand learning (FEDL) method. Particularly, the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2.” [sec(s) IV.A] “Then, we adopt the RMSE to show the prediction accuracy, i.e., prediction error, because we deal with the prediction of energy demand which is categorized as a regression prediction model, i.e., when the mapping function yields the continuous prediction outputs. Given S transactions, the RMSE can be computed as follows: 
    PNG
    media_image3.png
    244
    813
    media_image3.png
    Greyscale
, (20) where ωs and ωˆs are the actual and predicted energy demand for transaction s.”;)


step 8: selecting, by the central server based on its communication status with each photovoltaic power station, a plurality of [photovoltaic] power stations to perform forecasting model training and feedback;
(Saputra [algorithm 1] [algorithm 2] “Update and send υ(φ) back to J CSs” [fig(s) 1] “Charging Stations (CSs)” and “Charging Station Provider (CSP)” [fig(s) 2] “global model” [sec(s) III.A] “After ∇υ(φ) is obtained, the CSP updates the global model υ(φ) to minimize the prediction error, i.e., 
    PNG
    media_image2.png
    95
    265
    media_image2.png
    Greyscale
, using adaptive learning rate optimizer Adam [15] which produces fast convergence and significant robustness to the model. … The learning process repeats and then terminates when the prediction error converges, or a certain number of epoch time T is reached. In this case, the final global model υ∗ in the CSP is obtained to predict Yˆcsp of training dataset Xcsp and new dataset Xˆcsp using Eq. (1). The algorithm for energy demand prediction using EDL is summarized in Algorithm 1. The processes between Lines 4 and 11 are implemented in the CSP.” [sec(s) IV.B] “We also apply the adaptive learning rate Adam optimizer with initial step size 0.01 and tanh function as the activation function.” [sec(s) III.B] “Since the CSP needs to collect data from CSs, this centralized learning may lead to the communication overhead and data privacy concerns. To deal with these issues, we develop a framework using federated energy demand learning (FEDL) method. Particularly, the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2.”;)

step 9: performing model training and testing using the local training set and testing set prepared in step 4 to each photovoltaic power station selected in step 8, respectively, and updating local forecasting models;
(Saputra [algorithm 1] [algorithm 2] “Update and send υ(φ) back to J CSs” [fig(s) 1] “Charging Stations (CSs)” and “Charging Station Provider (CSP)” [fig(s) 2] “global model” [sec(s) III.A] “After ∇υ(φ) is obtained, the CSP updates the global model υ(φ) to minimize the prediction error, i.e., 
    PNG
    media_image2.png
    95
    265
    media_image2.png
    Greyscale
, using adaptive learning rate optimizer Adam [15] which produces fast convergence and significant robustness to the model. … The learning process repeats and then terminates when the prediction error converges, or a certain number of epoch time T is reached. In this case, the final global model υ∗ in the CSP is obtained to predict Yˆcsp of training dataset Xcsp and new dataset Xˆcsp using Eq. (1). The algorithm for energy demand prediction using EDL is summarized in Algorithm 1. The processes between Lines 4 and 11 are implemented in the CSP.” [sec(s) IV.B] “We also apply the adaptive learning rate Adam optimizer with initial step size 0.01 and tanh function as the activation function.” [sec(s) III.B] “Since the CSP needs to collect data from CSs, this centralized learning may lead to the communication overhead and data privacy concerns. To deal with these issues, we develop a framework using federated energy demand learning (FEDL) method. Particularly, the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2. … This υ(φ+1) is then pushed back to the CS-j, ∀j ∈ J for the next local learning process.”; e.g., fig 2 along with “local learning process” read(s) on “updating local forecasting models”.)

step 10: performing [photovoltaic] power probabilistic forecasting to each of the selected photovoltaic power stations;
(Saputra [algorithm 1] [algorithm 2] “Update and send υ(φ) back to J CSs” [fig(s) 1] “Charging Stations (CSs)” and “Charging Station Provider (CSP)” [fig(s) 2] “global model” [sec(s) III.A] “After ∇υ(φ) is obtained, the CSP updates the global model υ(φ) to minimize the prediction error, i.e., 
    PNG
    media_image2.png
    95
    265
    media_image2.png
    Greyscale
, using adaptive learning rate optimizer Adam [15] which produces fast convergence and significant robustness to the model. … The learning process repeats and then terminates when the prediction error converges, or a certain number of epoch time T is reached. In this case, the final global model υ∗ in the CSP is obtained to predict Yˆcsp of training dataset Xcsp and new dataset Xˆcsp using Eq. (1). The algorithm for energy demand prediction using EDL is summarized in Algorithm 1. The processes between Lines 4 and 11 are implemented in the CSP.” [sec(s) IV.B] “We also apply the adaptive learning rate Adam optimizer with initial step size 0.01 and tanh function as the activation function.” [sec(s) III.B] “the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2. … This υ(φ+1) is then pushed back to the CS-j, ∀j ∈ J for the next local learning process.”;)

step 11: receiving, by the central server, the local forecasting models in step 9 which pass testing, and updating the global forecasting model;
(Saputra [algorithm 1] [algorithm 2] “Send ∇υ(φ)j to the CSP for global model update” [fig(s) 1] “Charging Stations (CSs)” and “Charging Station Provider (CSP)” [fig(s) 2] “global model” [sec(s) III.A] “After ∇υ(φ) is obtained, the CSP updates the global model υ(φ) to minimize the prediction error, i.e., 
    PNG
    media_image2.png
    95
    265
    media_image2.png
    Greyscale
, using adaptive learning rate optimizer Adam [15] which produces fast convergence and significant robustness to the model. … The learning process repeats and then terminates when the prediction error converges, or a certain number of epoch time T is reached. In this case, the final global model υ∗ in the CSP is obtained to predict Yˆcsp of training dataset Xcsp and new dataset Xˆcsp using Eq. (1). The algorithm for energy demand prediction using EDL is summarized in Algorithm 1. The processes between Lines 4 and 11 are implemented in the CSP.” [sec(s) IV.B] “We also apply the adaptive learning rate Adam optimizer with initial step size 0.01 and tanh function as the activation function.” [sec(s) III.B] “the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2. … This υ(φ+1) is then pushed back to the CS-j, ∀j ∈ J for the next local learning process.”; e.g., fig 2 read(s) on “receiving, by the central server, the local forecasting models in step 9 which pass testing, and updating the global forecasting model”.)

step 12: distributing, by the central server, the updated global model to all photovoltaic power stations;
(Saputra [algorithm 1] [algorithm 2] “Update and send υ(φ) back to J CSs” [fig(s) 1] “Charging Stations (CSs)” and “Charging Station Provider (CSP)” [fig(s) 2] “global model” [sec(s) III.A] “After ∇υ(φ) is obtained, the CSP updates the global model υ(φ) to minimize the prediction error, i.e., 
    PNG
    media_image2.png
    95
    265
    media_image2.png
    Greyscale
, using adaptive learning rate optimizer Adam [15] which produces fast convergence and significant robustness to the model. … The learning process repeats and then terminates when the prediction error converges, or a certain number of epoch time T is reached. In this case, the final global model υ∗ in the CSP is obtained to predict Yˆcsp of training dataset Xcsp and new dataset Xˆcsp using Eq. (1). The algorithm for energy demand prediction using EDL is summarized in Algorithm 1. The processes between Lines 4 and 11 are implemented in the CSP.” [sec(s) IV.B] “We also apply the adaptive learning rate Adam optimizer with initial step size 0.01 and tanh function as the activation function.” [sec(s) III.B] “the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2. … This υ(φ+1) is then pushed back to the CS-j, ∀j ∈ J for the next local learning process.”; e.g., fig 2 read(s) on “distributing, by the central server, the updated global model to all photovoltaic power stations”.)

step 13: repeating steps 8 to 12 to rolling update the global model.
(Saputra [algorithm 1] [algorithm 2] “Send ∇υ(φ)j to the CSP for global model update” [fig(s) 1] “Charging Stations (CSs)” and “Charging Station Provider (CSP)” [fig(s) 2] “global model” [sec(s) III.A] “After ∇υ(φ) is obtained, the CSP updates the global model υ(φ) to minimize the prediction error, i.e., 
    PNG
    media_image2.png
    95
    265
    media_image2.png
    Greyscale
, using adaptive learning rate optimizer Adam [15] which produces fast convergence and significant robustness to the model. … The learning process repeats and then terminates when the prediction error converges, or a certain number of epoch time T is reached. In this case, the final global model υ∗ in the CSP is obtained to predict Yˆcsp of training dataset Xcsp and new dataset Xˆcsp using Eq. (1). The algorithm for energy demand prediction using EDL is summarized in Algorithm 1. The processes between Lines 4 and 11 are implemented in the CSP.” [sec(s) IV.B] “We also apply the adaptive learning rate Adam optimizer with initial step size 0.01 and tanh function as the activation function.” [sec(s) III.B] “the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2. … This υ(φ+1) is then pushed back to the CS-j, ∀j ∈ J for the next local learning process.”;)

However, Saputra does not appear to explicitly teach:
A federated learning-based regional [photovoltaic] power probabilistic forecasting method, comprising steps of:
step 1: pinpointing all [photovoltaic] power stations within a region which participate in a federated learning framework for probabilistic forecasting, collecting [weather] information and corresponding [photovoltaic] power variables within a time step;
step 3: splitting the processed sample dataset of the [photovoltaic] power stations resulting from step 2 into a training set and a testing set according to a predetermined proportion;
step 4: [normalizing] the training set and the testing set resulting from step 3;
step 7: defining a training error function, an optimizer, and a learning rate of the global forecasting model built in step 6, and distributing network architecture and initialized parameters to each [photovoltaic] power station;
step 8: selecting, by the central server based on its communication status with each photovoltaic power station, a plurality of [photovoltaic] power stations to perform forecasting model training and feedback;
step 10: performing [photovoltaic] power probabilistic forecasting to each of the selected photovoltaic power stations;

(Note: Hereinafter, if a limitation has one or more bold underlines, the one or more underlined claim languages indicate that they are taught by the current prior art reference, while the one or more non-underlined claim languages indicate that they have been taught already by one or more previous art references.)

Yang teaches
A federated learning-based regional photovoltaic power probabilistic forecasting method, comprising steps of:
(Yang [sec(s) III] “The flowchart of the PV prediction is shown in FIGURE 5. According to the meteorological factors, the statistical features and the combined features are constructed, and the time features are also extracted. Then, the LSTM-attention-embedding model is built up. To reduce the redundancy of the features, the Bayesian optimization is used to optimize the parameters. With the optimization method, the optimal time window, the number of statistical features and the number of the combined features are obtained. Finally, the prediction model of the PV power output is trained based on the available data.” [sec(s) Abs] “Photovoltaic (PV) output is susceptible to meteorological factors, resulting in intermittency and randomness of power generation. Accurate prediction of PV power output can not only reduce the impact of PV power generation on the grid but also provide a reference for grid dispatching. Therefore, this paper proposes an LSTM-attention-embedding model based on Bayesian optimization to predict the day-ahead PV power output. The statistical features at multiple time scales, combined features, time features and wind speed categorical features are explored for PV related meteorological factors. A deep learning model is constructed based on an LSTM block and an embedding block with the connection of a merge layer.”;)

step 1: pinpointing all photovoltaic power stations within a region which participate in a federated learning framework for probabilistic forecasting, collecting weather information and corresponding photovoltaic power variables within a time step;
(Yang [sec(s) III] “The flowchart of the PV prediction is shown in FIGURE 5. According to the meteorological factors, the statistical features and the combined features are constructed, and the time features are also extracted. Then, the LSTM-attention-embedding model is built up. To reduce the redundancy of the features, the Bayesian optimization is used to optimize the parameters. With the optimization method, the optimal time window, the number of statistical features and the number of the combined features are obtained. Finally, the prediction model of the PV power output is trained based on the available data.” [sec(s) Abs] “Photovoltaic (PV) output is susceptible to meteorological factors, resulting in intermittency and randomness of power generation. Accurate prediction of PV power output can not only reduce the impact of PV power generation on the grid but also provide a reference for grid dispatching. Therefore, this paper proposes an LSTM-attention-embedding model based on Bayesian optimization to predict the day-ahead PV power output.” [sec(s) IV] “The dataset used for the experiment was collected from a PV power output dataset within 25 months (from 1st April 2016 to 30th April 2018) from two PV stations located in one area of China, and the sampling frequency of the data is 1.1×10−3 Hz (every 15min for one point). From the PV stations, we can get day-ahead historical meteorological data, which is same as NWP data. The information of each PV station is shown in TABLE 1, and the sampling data of each station includes time, irradiance, wind direction, temperature, pressure and humidity, with the resolution of 15 mins.”;)

step 3: splitting the processed sample dataset of the photovoltaic power stations resulting from step 2 into a training set and a testing set according to a predetermined proportion;
(Yang [sec(s) III] “With the optimization method, the optimal time window, the number of statistical features and the number of the combined features are obtained. Finally, the prediction model of the PV power output is trained based on the available data.” and “The experimental dataset of PV station 1 is divided into a training dataset and a test dataset in chronological order of 8:2, and the test error is selected as the target of Bayesian optimization.” [sec(s) IV] “The dataset used for the experiment was collected from a PV power output dataset within 25 months (from 1st April 2016 to 30th April 2018) from two PV stations located in one area of China, and the sampling frequency of the data is 1.1×10−3 Hz (every 15min for one point). From the PV stations, we can get day-ahead historical meteorological data, which is same as NWP data. The information of each PV station is shown in TABLE 1, and the sampling data of each station includes time, irradiance, wind direction, temperature, pressure and humidity, with the resolution of 15 mins.”;)

step 7: defining a training error function, an optimizer, and a learning rate of the global forecasting model built in step 6, and distributing network architecture and initialized parameters to each photovoltaic power station;
(Yang [table(s) 3] “
    PNG
    media_image4.png
    176
    590
    media_image4.png
    Greyscale
” [sec(s) III] “With the optimization method, the optimal time window, the number of statistical features and the number of the combined features are obtained. Finally, the prediction model of the PV power output is trained based on the available data.” and “The experimental dataset of PV station 1 is divided into a training dataset and a test dataset in chronological order of 8:2, and the test error is selected as the target of Bayesian optimization. The iteration steps are 200, and the optimal parameters appear at the step of 199, and the optimal parameters are {t =18, num1 = 3, num2 = 12} with MSE of 0.66MW. … The detailed training parameters for the improved deep learning model is shown in TABLE 3. The method for the parameter initialization can be found in [32].” [sec(s) IV] “The dataset used for the experiment was collected from a PV power output dataset within 25 months (from 1st April 2016 to 30th April 2018) from two PV stations located in one area of China, and the sampling frequency of the data is 1.1×10−3 Hz (every 15min for one point). From the PV stations, we can get day-ahead historical meteorological data, which is same as NWP data. The information of each PV station is shown in TABLE 1, and the sampling data of each station includes time, irradiance, wind direction, temperature, pressure and humidity, with the resolution of 15 mins.”;)

step 8: selecting, by the central server based on its communication status with each photovoltaic power station, a plurality of photovoltaic power stations to perform forecasting model training and feedback;
(Yang [table(s) 3] [sec(s) III] “With the optimization method, the optimal time window, the number of statistical features and the number of the combined features are obtained. Finally, the prediction model of the PV power output is trained based on the available data.” and “The experimental dataset of PV station 1 is divided into a training dataset and a test dataset in chronological order of 8:2, and the test error is selected as the target of Bayesian optimization. The iteration steps are 200, and the optimal parameters appear at the step of 199, and the optimal parameters are {t =18, num1 = 3, num2 = 12} with MSE of 0.66MW. … The detailed training parameters for the improved deep learning model is shown in TABLE 3. The method for the parameter initialization can be found in [32].” [sec(s) IV] “The dataset used for the experiment was collected from a PV power output dataset within 25 months (from 1st April 2016 to 30th April 2018) from two PV stations located in one area of China, and the sampling frequency of the data is 1.1×10−3 Hz (every 15min for one point). From the PV stations, we can get day-ahead historical meteorological data, which is same as NWP data. The information of each PV station is shown in TABLE 1, and the sampling data of each station includes time, irradiance, wind direction, temperature, pressure and humidity, with the resolution of 15 mins.”;)

step 10: performing photovoltaic power probabilistic forecasting to each of the selected photovoltaic power stations;
(Yang [table(s) 3] [sec(s) III] “With the optimization method, the optimal time window, the number of statistical features and the number of the combined features are obtained. Finally, the prediction model of the PV power output is trained based on the available data.” and “The experimental dataset of PV station 1 is divided into a training dataset and a test dataset in chronological order of 8:2, and the test error is selected as the target of Bayesian optimization. The iteration steps are 200, and the optimal parameters appear at the step of 199, and the optimal parameters are {t =18, num1 = 3, num2 = 12} with MSE of 0.66MW. … The detailed training parameters for the improved deep learning model is shown in TABLE 3. The method for the parameter initialization can be found in [32].” [sec(s) IV] “The dataset used for the experiment was collected from a PV power output dataset within 25 months (from 1st April 2016 to 30th April 2018) from two PV stations located in one area of China, and the sampling frequency of the data is 1.1×10−3 Hz (every 15min for one point). From the PV stations, we can get day-ahead historical meteorological data, which is same as NWP data. The information of each PV station is shown in TABLE 1, and the sampling data of each station includes time, irradiance, wind direction, temperature, pressure and humidity, with the resolution of 15 mins.”;)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Saputra with the photovoltaic power station of Yang.
One of ordinary skill in the art would have been motived to combine in order to significantly improve the performance of photovoltaic (PV) output prediction compared to LSTM neural networks, BPNN, SVR model and persistence model.
(Yang [sec(s) Abs] “The comparative experimental results show that the performance of the proposed model has been significantly improved compared to LSTM neural networks, BPNN, SVR model and persistence model.”)

However, the combination of Saputra, Yang does not appear to explicitly teach:
step 4: [normalizing] the training set and the testing set resulting from step 3;

Patro teaches
step 4: normalizing the training set and the testing set resulting from step 3;
(Patro [sec(s) II] “Parameter is called as Z-score Normalization [3-6]. So the unstructured data can be normalized using z-score parameter, as per given formulae: 
    PNG
    media_image5.png
    101
    212
    media_image5.png
    Greyscale
 Where, vi’ is Z-score normalized one values. vi is value of the row E of ith column 
    PNG
    media_image6.png
    144
    802
    media_image6.png
    Greyscale
 
    PNG
    media_image7.png
    103
    331
    media_image7.png
    Greyscale
 or mean value In this technique, suppose we are having five rows namely X Y, Z, U, and V with different variables or columns that are ‘n’ in each row. So in each row above z-score technique can be applied to calculate the normalized ones. If suppose some row having all the values are identical, so the standard deviation of that row is equal to zero then all values for that row are set to zero. Like that Min-Max normalization the z-score also gives the range of values between 0 and 1.”;)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Saputra, Yang with the data normalization of Patro.
One of ordinary skill in the art would have been motived to combine in order to be helpful for the prediction or forecasting purpose by making the large variation of prediction and forecasting being closer based on data normalization since many prediction or forecasting approaches could vary each other a lot.
(Patro [sec(s) I] “Normalization is scaling technique or a mapping technique or a pre processing stage [1]. Where, we can find new range from an existing one range. It can be helpful for the prediction or forecasting purpose a lot [2]. As we know there are so many ways to predict or forecast but all can vary with each other a lot. So to maintain the large variation of prediction and forecasting the Normalization technique is required to make them closer. But there is some existing normalization techniques as mentioned in my abstract section namely Min-Max, Zscore & Decimal scaling excluding these technique we are presenting new one technique called Integer Scaling technique. This technique comes from the AMZD (Advanced on Min-Max Z-score Decimal scaling) [3-6].”)

Regarding claim 4
The combination of Saputra, Yang, Patro teaches claim 1.

Saputra further teaches
wherein the step 3 further comprises: the splitting the sample dataset refers to splitting the sample dataset into a training set and a testing set without shuffling in accordance with an 8:2 or 7:3 proportion.
(Saputra [sec(s) IV] “To evaluate the performance of the proposed learning methods, we use the real data obtained from charging stations in Dundee city, the United Kingdom between 2017 and 2018 [12]. In particular, the dataset has 65,601 transactions which include CS ID from 58 CSs, transaction ID for each CS, EV charging date, EV charging time, and consumed energy (in kWh) for each transaction. We use the first four information as the learning features, and the consumed energy as the learning label. Then, we classify CS ID, charging date, and charging time as categorical features. Specifically, we convert the charging date and charging time information into 7-day (i.e., 1, 2, . . . , 7) and 24-hour (i.e., 0, 1, . . . , 23) categories, respectively. In addition, each CS has the latitude and longitude information which will be used for clustering. … We split the dataset into 80%, 70%, 60%, as well as 50% training dataset, and the rest of the portions for testing dataset. From the training dataset, we divide the number of transactions by J training subsets, when FEDL is implemented.”;)

Regarding claim 5
The combination of Saputra, Yang, Patro teaches claim 1.

Patro further teaches
wherein in step 4, the normalizing enables different dimensional data other than time information to be transformed into dimensionless data with a range of [0, 1] according to an equation of: 
    PNG
    media_image8.png
    123
    248
    media_image8.png
    Greyscale

where xi denotes the original numerical value, xi’ denotes normalized data, µA denotes the mean value of variable A, and σA denotes the standard deviation of the variable.
(Patro [sec(s) II] “Parameter is called as Z-score Normalization [3-6]. So the unstructured data can be normalized using z-score parameter, as per given formulae: 
    PNG
    media_image5.png
    101
    212
    media_image5.png
    Greyscale
 Where, vi’ is Z-score normalized one values. vi is value of the row E of ith column 
    PNG
    media_image6.png
    144
    802
    media_image6.png
    Greyscale
 
    PNG
    media_image7.png
    103
    331
    media_image7.png
    Greyscale
 or mean value In this technique, suppose we are having five rows namely X Y, Z, U, and V with different variables or columns that are ‘n’ in each row. So in each row above z-score technique can be applied to calculate the normalized ones. If suppose some row having all the values are identical, so the standard deviation of that row is equal to zero then all values for that row are set to zero. Like that Min-Max normalization the z-score also gives the range of values between 0 and 1.”;)

The combination of Saputra, Yang, Patro is combinable with Patro for the same rationale as set forth above with respect to claim 1.

Regarding claim 6
The combination of Saputra, Yang, Patro teaches claim 1.

Saputra further teaches
wherein in step 5, the federated learning framework comprises a central server and respective [photovoltaic] power stations, the central server being responsible for coordinating a forecasting model training process, and the respective [photovoltaic] power stations participating in updating the forecasting model and computing forecast values.
(Saputra [algorithm 1] [algorithm 2] “Update and send υ(φ) back to J CSs” [fig(s) 1] “Charging Stations (CSs)” and “Charging Station Provider (CSP)” [sec(s) I] “In this paper, we introduce state-of-the-art machine learning-based approaches which can not only significantly improve the accuracy of energy demand prediction, but also remarkably reduce the communication overhead for EV networks. In particular, we first introduce a communication model using the CSP as a centralized node to gather all information from the CSs in a considered area.” [fig(s) 2] “global model” [sec(s) III.A] “After ∇υ(φ) is obtained, the CSP updates the global model υ(φ) to minimize the prediction error, i.e., 
    PNG
    media_image2.png
    95
    265
    media_image2.png
    Greyscale
, using adaptive learning rate optimizer Adam [15] which produces fast convergence and significant robustness to the model. … The learning process repeats and then terminates when the prediction error converges, or a certain number of epoch time T is reached. In this case, the final global model υ∗ in the CSP is obtained to predict Yˆcsp of training dataset Xcsp and new dataset Xˆcsp using Eq. (1). The algorithm for energy demand prediction using EDL is summarized in Algorithm 1. The processes between Lines 4 and 11 are implemented in the CSP.” [sec(s) IV.B] “We also apply the adaptive learning rate Adam optimizer with initial step size 0.01 and tanh function as the activation function.” [sec(s) III.B] “the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2. … This υ(φ+1) is then pushed back to the CS-j, ∀j ∈ J for the next local learning process.”;)

Yang further teaches 
wherein in step 5, the federated learning framework comprises a central server and respective photovoltaic power stations, …, and the respective photovoltaic power stations participating in updating the forecasting model and computing forecast values.
(Yang [sec(s) III] “The flowchart of the PV prediction is shown in FIGURE 5. According to the meteorological factors, the statistical features and the combined features are constructed, and the time features are also extracted. Then, the LSTM-attention-embedding model is built up. To reduce the redundancy of the features, the Bayesian optimization is used to optimize the parameters. With the optimization method, the optimal time window, the number of statistical features and the number of the combined features are obtained. Finally, the prediction model of the PV power output is trained based on the available data.” [sec(s) Abs] “Photovoltaic (PV) output is susceptible to meteorological factors, resulting in intermittency and randomness of power generation. Accurate prediction of PV power output can not only reduce the impact of PV power generation on the grid but also provide a reference for grid dispatching. Therefore, this paper proposes an LSTM-attention-embedding model based on Bayesian optimization to predict the day-ahead PV power output.” [sec(s) IV] “The dataset used for the experiment was collected from a PV power output dataset within 25 months (from 1st April 2016 to 30th April 2018) from two PV stations located in one area of China, and the sampling frequency of the data is 1.1×10−3 Hz (every 15min for one point). From the PV stations, we can get day-ahead historical meteorological data, which is same as NWP data. The information of each PV station is shown in TABLE 1, and the sampling data of each station includes time, irradiance, wind direction, temperature, pressure and humidity, with the resolution of 15 mins.”;)

The combination of Saputra, Yang, Patro is combinable with Yang for the same rationale as set forth above with respect to claim 1.

Regarding claim 10
The combination of Saputra, Yang, Patro teaches claim 1.

Saputra further teaches
A regional energy coordinated control system adapted to implement the federated learning-based regional [photovoltaic] power probability forecasting method of claim 1, wherein the regional energy coordinated control system comprising:
a central server;
(Saputra [fig(s) 1] “Charging Stations (CSs)” and “Charging Station Provider (CSP)” [fig(s) 2] “global model” [sec(s) III] “Since the CSP needs to collect data from CSs, this centralized learning may lead to the communication overhead and data privacy concerns. To deal with these issues, we develop a framework using federated energy demand learning (FEDL) method. Particularly, the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2.” [sec(s) I] “In this paper, we introduce state-of-the-art machine learning-based approaches which can not only significantly improve the accuracy of energy demand prediction, but also remarkably reduce the communication overhead for EV networks. In particular, we first introduce a communication model using the CSP as a centralized node to gather all information from the CSs in a considered area. We then develop an energy demand learning (EDL)-based solution utilizing deep learning method to help the CSP accurately predict energy demands for the CSs in this area. However, this approach requires the CSs to share their local data with the CSP, and thus it may suffer from serious overhead and privacy issues. To address these issues, we propose a novel federated energy demand learning (FEDL) approach in which the CSs only need to share their trained models obtained from their datasets instead of sharing their real datasets.”;)

edge computing nodes of each plant; and
(Saputra [fig(s) 1] “Charging Stations (CSs)” and “Charging Station Provider (CSP)” [fig(s) 2] “global model” [sec(s) III] “Since the CSP needs to collect data from CSs, this centralized learning may lead to the communication overhead and data privacy concerns. To deal with these issues, we develop a framework using federated energy demand learning (FEDL) method. Particularly, the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2.” [sec(s) I] “Generally, the power grid supplies the energy for charging stations (CSs) once receiving requests from EVs [2]. However, this approach experiences a serious energy transfer congestion when a huge number of EVs charging the energy simultaneously [3] and lead to high energy transfer cost for the charging station provider (CSP)”;)

communication lines, through which the central server communicates with the edge computing nodes of different [photovoltaic] plants belonging to different entities;
(Saputra [algorithm 1] [algorithm 2] “Update and send υ(φ) back to J CSs” [fig(s) 1] “Charging Stations (CSs)” and “Charging Station Provider (CSP)” [fig(s) 2] “global model” [sec(s) III.A] “After ∇υ(φ) is obtained, the CSP updates the global model υ(φ) to minimize the prediction error, i.e., 
    PNG
    media_image2.png
    95
    265
    media_image2.png
    Greyscale
, using adaptive learning rate optimizer Adam [15] which produces fast convergence and significant robustness to the model. … The learning process repeats and then terminates when the prediction error converges, or a certain number of epoch time T is reached. In this case, the final global model υ∗ in the CSP is obtained to predict Yˆcsp of training dataset Xcsp and new dataset Xˆcsp using Eq. (1). The algorithm for energy demand prediction using EDL is summarized in Algorithm 1. The processes between Lines 4 and 11 are implemented in the CSP.” [sec(s) IV.B] “We also apply the adaptive learning rate Adam optimizer with initial step size 0.01 and tanh function as the activation function.” [sec(s) III.B] “Since the CSP needs to collect data from CSs, this centralized learning may lead to the communication overhead and data privacy concerns. To deal with these issues, we develop a framework using federated energy demand learning (FEDL) method. Particularly, the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2.”;)

wherein the central server generates a probabilistic [photovoltaic] power forecast result.
(Saputra [algorithm 1] [algorithm 2] “Update and send υ(φ) back to J CSs” [fig(s) 1] “Charging Stations (CSs)” and “Charging Station Provider (CSP)” [sec(s) I] “In this paper, we introduce state-of-the-art machine learning-based approaches which can not only significantly improve the accuracy of energy demand prediction, but also remarkably reduce the communication overhead for EV networks. In particular, we first introduce a communication model using the CSP as a centralized node to gather all information from the CSs in a considered area.” [fig(s) 2] “global model” [sec(s) III.A] “After ∇υ(φ) is obtained, the CSP updates the global model υ(φ) to minimize the prediction error, i.e., 
    PNG
    media_image2.png
    95
    265
    media_image2.png
    Greyscale
, using adaptive learning rate optimizer Adam [15] which produces fast convergence and significant robustness to the model. … The learning process repeats and then terminates when the prediction error converges, or a certain number of epoch time T is reached. In this case, the final global model υ∗ in the CSP is obtained to predict Yˆcsp of training dataset Xcsp and new dataset Xˆcsp using Eq. (1). The algorithm for energy demand prediction using EDL is summarized in Algorithm 1. The processes between Lines 4 and 11 are implemented in the CSP.” [sec(s) IV.B] “We also apply the adaptive learning rate Adam optimizer with initial step size 0.01 and tanh function as the activation function.” [sec(s) III.B] “the CSP only requires to collect the trained models, i.e., gradient information, from the set of CSs, and then updates the global model efficiently before sending back to the CSs [16]. After that, the CSs can use this information to learn by themselves using deep learning method. Given that J = {1, . . . , j, . . . , J} as the set of CSs which acts as workers to implement the EDL algorithms using their Xj locally as illustrated in Fig. 2. … This υ(φ+1) is then pushed back to the CS-j, ∀j ∈ J for the next local learning process.”;)

Yang further teaches
A regional energy coordinated control system adapted to implement the federated learning-based regional photovoltaic power probability forecasting method of claim 1,
(Yang [sec(s) III] “The flowchart of the PV prediction is shown in FIGURE 5. According to the meteorological factors, the statistical features and the combined features are constructed, and the time features are also extracted. Then, the LSTM-attention-embedding model is built up. To reduce the redundancy of the features, the Bayesian optimization is used to optimize the parameters. With the optimization method, the optimal time window, the number of statistical features and the number of the combined features are obtained. Finally, the prediction model of the PV power output is trained based on the available data.” [sec(s) Abs] “Photovoltaic (PV) output is susceptible to meteorological factors, resulting in intermittency and randomness of power generation. Accurate prediction of PV power output can not only reduce the impact of PV power generation on the grid but also provide a reference for grid dispatching. Therefore, this paper proposes an LSTM-attention-embedding model based on Bayesian optimization to predict the day-ahead PV power output.” [sec(s) IV] “The dataset used for the experiment was collected from a PV power output dataset within 25 months (from 1st April 2016 to 30th April 2018) from two PV stations located in one area of China, and the sampling frequency of the data is 1.1×10−3 Hz (every 15min for one point). From the PV stations, we can get day-ahead historical meteorological data, which is same as NWP data. The information of each PV station is shown in TABLE 1, and the sampling data of each station includes time, irradiance, wind direction, temperature, pressure and humidity, with the resolution of 15 mins.”;)

communication lines, through which the central server communicates with the edge computing nodes of different photovoltaic plants belonging to different entities;
(Yang [sec(s) III] “The flowchart of the PV prediction is shown in FIGURE 5. According to the meteorological factors, the statistical features and the combined features are constructed, and the time features are also extracted. Then, the LSTM-attention-embedding model is built up. To reduce the redundancy of the features, the Bayesian optimization is used to optimize the parameters. With the optimization method, the optimal time window, the number of statistical features and the number of the combined features are obtained. Finally, the prediction model of the PV power output is trained based on the available data.” [sec(s) Abs] “Photovoltaic (PV) output is susceptible to meteorological factors, resulting in intermittency and randomness of power generation. Accurate prediction of PV power output can not only reduce the impact of PV power generation on the grid but also provide a reference for grid dispatching. Therefore, this paper proposes an LSTM-attention-embedding model based on Bayesian optimization to predict the day-ahead PV power output.” [sec(s) IV] “The dataset used for the experiment was collected from a PV power output dataset within 25 months (from 1st April 2016 to 30th April 2018) from two PV stations located in one area of China, and the sampling frequency of the data is 1.1×10−3 Hz (every 15min for one point). From the PV stations, we can get day-ahead historical meteorological data, which is same as NWP data. The information of each PV station is shown in TABLE 1, and the sampling data of each station includes time, irradiance, wind direction, temperature, pressure and humidity, with the resolution of 15 mins.”;)

wherein the central server generates a probabilistic photovoltaic power forecast result.
(Yang [sec(s) III] “The flowchart of the PV prediction is shown in FIGURE 5. According to the meteorological factors, the statistical features and the combined features are constructed, and the time features are also extracted. Then, the LSTM-attention-embedding model is built up. To reduce the redundancy of the features, the Bayesian optimization is used to optimize the parameters. With the optimization method, the optimal time window, the number of statistical features and the number of the combined features are obtained. Finally, the prediction model of the PV power output is trained based on the available data.” [sec(s) Abs] “Photovoltaic (PV) output is susceptible to meteorological factors, resulting in intermittency and randomness of power generation. Accurate prediction of PV power output can not only reduce the impact of PV power generation on the grid but also provide a reference for grid dispatching. Therefore, this paper proposes an LSTM-attention-embedding model based on Bayesian optimization to predict the day-ahead PV power output.” [sec(s) IV] “The dataset used for the experiment was collected from a PV power output dataset within 25 months (from 1st April 2016 to 30th April 2018) from two PV stations located in one area of China, and the sampling frequency of the data is 1.1×10−3 Hz (every 15min for one point). From the PV stations, we can get day-ahead historical meteorological data, which is same as NWP data. The information of each PV station is shown in TABLE 1, and the sampling data of each station includes time, irradiance, wind direction, temperature, pressure and humidity, with the resolution of 15 mins.”;)

The combination of Saputra, Yang, Patro is combinable with Yang for the same rationale as set forth above with respect to claim 1.

Claim(s) 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Saputra et al. (Energy Demand Prediction with Federated Learning for Electric Vehicle Networks) in view of Yang et al. (LSTM-Attention-Embedding Model-Based Day-Ahead Prediction of Photovoltaic Power Output Using Bayesian Optimization) in view of Patro et al. (Normalization: A Preprocessing Stage) in view of Thielke et al. (US 11733427 B1)

Regarding claim 2
The combination of Saputra, Yang, Patro teaches claim 1.

However, the combination of Saputra, Yang, Patro does not appear to explicitly teach:
wherein the weather information includes global irradiances, direct irradiances, diffuse irradiance data, atmospheric temperatures, atmospheric pressures, wind speeds, wind directions, and relative humidity.

Thielke teaches
wherein the weather information includes global irradiances, direct irradiances, diffuse irradiance data, atmospheric temperatures, atmospheric pressures, wind speeds, wind directions, and relative humidity.
(Thielke [col 3 ln 66– col 5 ln 46] ““Weather variables” and “weather indicators,” as used herein, includes any and all weather features, weather indicators, atmospheric conditions, etc., including but not limited to wind speed and direction, air temperature, solar irradiance, cloud cover, cloud type, atmospheric pressure, relative humidity, absolute humidity, dew point, precipitation (e.g., rain, snow, sleet, freezing rain, hail, etc.) intensity, precipitation amounts, precipitable water, direct normal irradiance (DNI), global horizontal irradiance (GHI), diffuse horizontal irradiance (DIF), and more.”;)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Saputra, Yang, Patro with the weather information of Thielke.
One of ordinary skill in the art would have been motived to combine in order to provide a weather forecasting system of greater accuracy, and less susceptible to initialization errors, than existing systems so that the improved weather forecasts are usable by multiple industries, including power generation, insurance, retail operations, airlines, etc.
(Patro [col 1 ln 21– col 1 ln 62] “There remains a need for a weather forecasting system of greater accuracy, and less susceptible to initialization errors, than existing systems. It would be beneficial if such system would improve the forecasting accuracy of specific weather variables, such as cloud cover, wind speed and direction, air temperature, solar irradiance, atmospheric pressure, relative humidity, etc.” [col 12 ln 32– col 12 ln 45] “The improved weather forecasts provided by the disclosed system are usable by multiple industries, including power generation, insurance, retail operations, airlines, etc. For example, weather-dependent power generators, e.g., wind farms and solar power plants, can use the disclosed methods to create more accurate short-term power output forecasts. With historical weather data, particularly winds for wind farms and solar radiation for solar power plants, and historical power generation output, an algorithm, e.g., machine learning algorithm, regression algorithm, or other algorithm as described herein, can be trained to predict power output from weather input data. The trained model can then be used to forecast short-term power plant production output from a weather forecast.”)

Claim(s) 7-8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Saputra et al. (Energy Demand Prediction with Federated Learning for Electric Vehicle Networks) in view of Yang et al. (LSTM-Attention-Embedding Model-Based Day-Ahead Prediction of Photovoltaic Power Output Using Bayesian Optimization) in view of Patro et al. (Normalization: A Preprocessing Stage) in view of Vandal et al. (Prediction and Uncertainty Quantification of Daily Airport Flight Delays)

Regarding claim 7
The combination of Saputra, Yang, Patro teaches claim 1.

Yang further teaches
wherein the step 6 further comprises: computing a photovoltaic power station power forecast value using a Bayesian long short term memory neural network model, wherein the neural network model mainly comprises a long short-term memory network architecture and [a variational inference architecture, wherein the variational inference architecture is implemented using Monte Carlo Dropout technique]; and finally, forecasts in consideration of uncertainty are subjected to multiple times of forward propagation training to obtain different results, wherein the photovoltaic power station power forecast value is characterized by variance of the different results.
(Yang [table(s) 3] “
    PNG
    media_image4.png
    176
    590
    media_image4.png
    Greyscale
” [sec(s) III] “The flowchart of the PV prediction is shown in FIGURE 5. According to the meteorological factors, the statistical features and the combined features are constructed, and the time features are also extracted. Then, the LSTM-attention-embedding model is built up. To reduce the redundancy of the features, the Bayesian optimization is used to optimize the parameters. With the optimization method, the optimal time window, the number of statistical features and the number of the combined features are obtained. Finally, the prediction model of the PV power output is trained based on the available data.” and “The experimental dataset of PV station 1 is divided into a training dataset and a test dataset in chronological order of 8:2, and the test error is selected as the target of Bayesian optimization. The iteration steps are 200, and the optimal parameters appear at the step of 199, and the optimal parameters are {t =18, num1 = 3, num2 = 12} with MSE of 0.66MW. [sec(s) Abs] “Photovoltaic (PV) output is susceptible to meteorological factors, resulting in intermittency and randomness of power generation. Accurate prediction of PV power output can not only reduce the impact of PV power generation on the grid but also provide a reference for grid dispatching. Therefore, this paper proposes an LSTM-attention-embedding model based on Bayesian optimization to predict the day-ahead PV power output.” [sec(s) IV] “The dataset used for the experiment was collected from a PV power output dataset within 25 months (from 1st April 2016 to 30th April 2018) from two PV stations located in one area of China, and the sampling frequency of the data is 1.1×10−3 Hz (every 15min for one point).”;)

The combination of Saputra, Yang, Patro is combinable with Yang for the same rationale as set forth above with respect to claim 1.

However, the combination of Saputra, Yang, Patro does not appear to explicitly teach:
wherein the neural network model mainly comprises a long short-term memory network architecture and [a variational inference architecture, wherein the variational inference architecture is implemented using Monte Carlo Dropout technique];

Vandal teaches
wherein the neural network model mainly comprises a long short-term memory network architecture and a variational inference architecture, wherein the variational inference architecture is implemented using Monte Carlo Dropout technique;
(Vandal [sec(s) 3] “In this section we describe the Bayesian LSTM architecture we developed to predict daily average flight delays per airport using both continuous and categorical features. Including an airport indicator variable allows us to leverage similar delay effects between airports, similar to a multi-task model. The model consists of 3-Layer LSTM network where data from the day of departure and the four previous days is used to predict average delay for the day of departure. First, we embed the categorical variables from 7 days to 3 features, 12 months to 3 features, and 123 airports to 5 features. The weights for embedding are learned during training. Using a grid search with a range of dimensions, we found that these embedding dimensions provided a good trade-off between complexity and over-fitting. Categorical variables embedded to dimensions 3, 3, and 5 are concatenated with the 11 continuous features, including weather and airport congestion, resulting in 22 total features that are fed into the LSTM. Following (Gal and Ghahramani, 2016; Gal, 2016), we assume y ~ N µ(x), σ2(x) such that [µ(x), σ(x)] = fw(x) where f is an LSTM network with two hidden layers of 128 units each. Approximate variational inference is applied over all the weights, w, using Monte Carlo Dropout. As is crucial in Variational LSTMs, the same dropout mask is used at each step and for all weights (rather than dropping different weights at each time step). The corresponding negative log-likelihood is then written as:”;)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Saputra, Yang, Patro with the variational inference architecture of Vandal.
One of ordinary skill in the art would have been motived to combine in order to provide a model with robust uncertainty metrics, not just point estimates, and ability to accurately predict delays in order make travelers’ lives easier and save airlines money
(Vandal [sec(s) 1] “Because Freebird is interested in using these models in a risk management setting, we require a model that provides robust uncertainty metrics, not just point estimates. This requirement has limited the application of deep learning to the problem thus far. We leverage recent advances from (Gal and Ghahramani, 2016; Gal, 2016; Gal et al., 2017), using Monte Carlo dropout to obtain parameter variance estimates. To our knowledge, we are the first to apply Bayesian deep learning to the problem of predicting and quantifying flight delays in the U.S. airspace and the first to apply a Variational LSTM to any industrial problem.” [sec(s) Abs] “One in four commercial airline flights is delayed, inconveniencing travelers and causing large financial losses for carriers. The ability to accurately predict delays would make travelers’ lives easier and save airlines money.”;)

Regarding claim 8
The combination of Saputra, Yang, Patro teaches claim 7.

Saputra further teaches
wherein in step 7, the model training error function selects mean square error (MSE), expressed as: 
    PNG
    media_image9.png
    196
    726
    media_image9.png
    Greyscale

where yi and 
    PNG
    media_image10.png
    94
    70
    media_image10.png
    Greyscale
 denote the ith measured [photovoltaic] power in the dataset and corresponding [Bayesian long short term memory neural network] forecast value, respectively, and K denotes the number of pieces of data in use.
(Saputra [sec(s) IV] “Then, we adopt the RMSE to show the prediction accuracy, i.e., prediction error, because we deal with the prediction of energy demand which is categorized as a regression prediction model, i.e., when the mapping function yields the continuous prediction outputs. Given S transactions, the RMSE can be computed as follows: 
    PNG
    media_image3.png
    244
    813
    media_image3.png
    Greyscale
, (20) where ωs and ωˆs are the actual and predicted energy demand for transaction s.”;)

Yang further teaches
wherein in step 7, the model training error function selects mean square error (MSE), expressed as: 
    PNG
    media_image9.png
    196
    726
    media_image9.png
    Greyscale

where yi and 
    PNG
    media_image10.png
    94
    70
    media_image10.png
    Greyscale
 denote the ith measured photovoltaic power in the dataset and corresponding Bayesian long short term memory neural network forecast value, respectively, and K denotes the number of pieces of data in use.
(Yang [sec(s) III] “The flowchart of the PV prediction is shown in FIGURE 5. According to the meteorological factors, the statistical features and the combined features are constructed, and the time features are also extracted. Then, the LSTM-attention-embedding model is built up. To reduce the redundancy of the features, the Bayesian optimization is used to optimize the parameters. With the optimization method, the optimal time window, the number of statistical features and the number of the combined features are obtained. Finally, the prediction model of the PV power output is trained based on the available data. … The iteration steps are 200, and the optimal parameters appear at the step of 199, and the optimal parameters are {t =18, num1 = 3, num2 = 12} with MSE of 0.66MW. Compared with the model {t =48, num1 = 5, num2 = 15} without any optimization, the MSE is decreased by 0.142MW, and the features dimension is also effectively reduced.” [sec(s) Abs] “Photovoltaic (PV) output is susceptible to meteorological factors, resulting in intermittency and randomness of power generation. Accurate prediction of PV power output can not only reduce the impact of PV power generation on the grid but also provide a reference for grid dispatching. Therefore, this paper proposes an LSTM-attention-embedding model based on Bayesian optimization to predict the day-ahead PV power output.” [sec(s) IV] “The dataset used for the experiment was collected from a PV power output dataset within 25 months (from 1st April 2016 to 30th April 2018) from two PV stations located in one area of China, and the sampling frequency of the data is 1.1×10−3 Hz (every 15min for one point). From the PV stations, we can get day-ahead historical meteorological data, which is same as NWP data. The information of each PV station is shown in TABLE 1, and the sampling data of each station includes time, irradiance, wind direction, temperature, pressure and humidity, with the resolution of 15 mins.”;)

The combination of Saputra, Yang, Patro is combinable with Yang for the same rationale as set forth above with respect to claim 1.

Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Saputra et al. (Energy Demand Prediction with Federated Learning for Electric Vehicle Networks) in view of Yang et al. (LSTM-Attention-Embedding Model-Based Day-Ahead Prediction of Photovoltaic Power Output Using Bayesian Optimization) in view of Patro et al. (Normalization: A Preprocessing Stage) in view of ALABBASI et al. (US 20230019669 A1)

Regarding claim 9
The combination of Saputra, Yang, Patro teaches claim 1.

However, the combination of Saputra, Yang, Patro does not appear to explicitly teach:
wherein the step 9 further comprises: if a testing result error of the model is less than a set threshold, using the training model to perform forecasting; otherwise, using the global model of the last round to perform forecasting.

ALABBASI teaches
wherein the step 9 further comprises: if a testing result error of the model is less than a set threshold, using the training model to perform forecasting; otherwise, using the global model of the last round to perform forecasting.
(ALABBASI [par(s) 73] “The client devices 304 then update their local ML models 316 based on the received global ML model 308 (e.g., the global ML 308 is stored as the new local ML models 316). In one embodiment, this is done by storing the received global ML model 308 as the local ML model 316 at each fo the client devices 304. For the next training epoch, the cascaded federated ML client function 314 at each client device 304 then performs training of its (new) local ML model 316 based on its local data 318 and the feedback information received from the server 302 and sends the resulting local ML model 316 (or an update relative to the last version of the local ML model 316 sent) and value(s) of the output parameter(s) of the local ML model 316 for this training epoch to the server 302. The training process continues in this manner until some predefined stopping criteria is reached. The stopping criteria may be, for example, reaching a predefined maximum number of training epochs, reaching a desired performance criterion (e.g., accuracy is greater than a predefined threshold), or the like”; e.g., “The training process continues in this manner until some predefined stopping criteria is reached” along with “accuracy is greater than a predefined threshold” read(s) on “if a testing result error of the model is less than a set threshold”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Saputra, Yang, Patro with the forecasting based on a threshold of ALABBASI.
One of ordinary skill in the art would have been motived to combine in order to improve performance while also avoiding the need to share the global data with the client devices.
(ALABBASI [par(s) 12] “One benefit of this cascaded federated machine learning approach relative to conventional federated machine learning is that the local ML models 216 are trained based on the value(s) of the output parameter(s) of the network ML model 210. As a result, performance is improved while also avoiding the need to share the global data 212 with the client devices 204.”;)

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Saputra et al. (Federated Learning Meets Contract Theory: Energy-Efficient Framework for Electric Vehicle Networks) teaches clustering-based decentralized federated energy learning (DFEL) approaches.
Sun et al. (Using Bayesian Deep Learning to Capture Uncertainty for Residential Net Load Forecasting) teaches Bayesian deep learning (BDL).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Fri 9:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/SEHWAN KIM/Examiner, Art Unit 2129                                                                                                                                                                                                        
12/12/2025
Read full office action
Prosecution Timeline

Nov 15, 2022
Application Filed
Dec 12, 2025
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

15/360,454
Patent 12602595
SYSTEM AND METHOD OF USING A KNOWLEDGE REPRESENTATION FOR FEATURES IN A MACHINE LEARNING CLASSIFIER
2y 5m to grant Granted Apr 14, 2026
16/453,380
Patent 12602580
Dataset Dependent Low Rank Decomposition Of Neural Networks
2y 5m to grant Granted Apr 14, 2026
17/098,007
Patent 12602581
Systems and Methods for Out-of-Distribution Detection
2y 5m to grant Granted Apr 14, 2026
17/358,891
Patent 12602606
APPARATUSES, COMPUTER-IMPLEMENTED METHODS, AND COMPUTER PROGRAM PRODUCTS FOR IMPROVED GLOBAL QUBIT POSITIONING IN A QUANTUM COMPUTING ENVIRONMENT
2y 5m to grant Granted Apr 14, 2026
18/081,242
Patent 12541722
MACHINE LEARNING TECHNIQUES FOR VALIDATING AND MUTATING OUTPUTS FROM PREDICTIVE SYSTEMS
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
60%
Grant Probability
99%
With Interview (+65.6%)
4y 1m
Median Time to Grant
Low
PTA Risk
Based on 144 resolved cases by this examiner. Grant probability derived from career allow rate.
Federated Learning-Based Regional Photovoltaic Power Probabilistic Forecasting Method and Coordinated Control System

This examiner grants 60% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email