DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 02/06/2026 has been entered.
Examiner’s Note
The rejection under 35 USC § 101 is withdrawn in view of the instance amendments. Specifically, the amended claim recite technical detailed on selecting machine learning models based on defined performance metrics and training a new machine learning model to achieve improved performance metrics. These additional limitations reflects a technical improvement in the selection and training of machine learning models.
Response to Arguments
Applicant’s arguments with respect to amended claims (pg. 13 – 15) have been considered but are moot, because arguments/remarks are directed to amended claim limitations that were not previously examined by the examiner, see 35 USC § 103 section below.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 1 is rejected under 35 U.S.C. 103 as being unpatentable over Galicia, et al., "multi-step forecasting for big data time series based on ensemble learning." in view of ZARE et al., Pub. No.: US20210174192A1, GUELMAN, Pub. No.: US20190279109A1, Wang et al., Pub. No.: US11734614B1 and Chen et al., Pub. No.: US20180300176A1.
Regarding claim 1, Galicia teaches: A system for weighting training data
(Galicia, page: 830, “Advances in technology have led to an increasing generation and storage of massive data in recent years [1,2]. These data need to be efficiently processed in order to extract useful and valuable knowledge. Thus, the development of new tools for dealing with big data has become a critical issue. An essential component of the nature of big data is that information is usually captured over time at different points, resulting in big data time series [3]. This information can be analysed for various purposes: to predict the future values, to establish relations among variables, to detect anomalous values, or to discover patterns [A system for weighting training data].”)
to increase efficiency of performance of a machine learning model, the system comprising:
(Galicia, page: 831, “(4) We conduct a comprehensive evaluation using Spanish electricity data for 10 years, measured at 10-min intervals, demonstrating that both ensemble members performed, outperforming [to increase efficiency of performance of a machine learning model] the base models they combine, and particularly showing the potential of dynamic ensembles for big data forecasting.”)
one or more processors and computer program instructions that, when executed, cause the one or more processors to perform operations comprising:
(Galicia, page: 835, “The experimentation was conducted using high-performance computing resources on the Open Telekom Cloud Platform with five machines, one master and four slave nodes. Each node has 60 GB of main memory and 8 logical cores from an Intel Xeon E5-2658 v3 @ 2.20 GHz processor [one or more processors and computer program instructions that, when executed, cause the one or more processors to perform operations] that has 30 MB L3 cache. The cluster works with Apache Spark 2.1.2 and Hadoop 2.6.”)
training a plurality of machine learning models,
(Galicia, page: 832, “Ensembles of prediction models are one of the most successful methods used in practical applications [34]. An ensemble usually improves the results of the single base models it combines. In this paper we focus on ensembles for time series forecasting, and especially for big data. We propose to combine DT, GBT and RF in an ensemble [training a plurality of machine learning models], due to the good results obtained by each of these algorithms when applied to big data time series forecasting [15].”)
wherein each training dataset corresponds to a different time period;
(Galicia, page: 833, “The time series used in this work is related to the total electrical energy consumption in Spain, from January 1st 2007 at midnight to June 21st 2016 at 11:40 pm. In short, it is a time series of nine and a half years with a high sampling frequency, namely 10 min intervals, including 49,832 measurements in total. When using the proposed methodology with a prediction horizon of 4 h (h is set to 24 values), the dataset consists of 20,742 instances and 144 attributes, corresponding to 5.70 MiB of storage size. These 144 attributes correspond to a window w of 144 past values (24 h). For the static ensemble, this dataset is divided into a training set and a test set consisting of 60% and 40% of the data, respectively. The training set has 298,752 measurements; it includes data from January 1st, 2007 at midnight to September 8th, 2012 at 10:30 am [wherein each training dataset corresponds to a different time period]. The test set contains the remaining data, namely 199,080 measurements from September 8th, 2012 at 10:40 am to June 21st, 2016 at 11:40 pm.”)
wherein the testing dataset corresponds to a time period that is subsequent to the different time periods corresponding to the training datasets,
(Galicia, page: 833, “The time series used in this work is related to the total electrical energy consumption in Spain, from January 1st 2007 at midnight to June 21st 2016 at 11:40 pm. In short, it is a time series of nine and a half years with a high sampling frequency, namely 10 min intervals, including 49,832 measurements in total. When using the proposed methodology with a prediction horizon of 4 h (h is set to 24 values), the dataset consists of 20,742 instances and 144 attributes, corresponding to 5.70 MiB of storage size. These 144 attributes correspond to a window w of 144 past values (24 h). For the static ensemble, this dataset is divided into a training set and a test set consisting of 60% and 40% of the data, respectively. The training set has 298,752 measurements; it includes data from January 1st, 2007 at midnight to September 8th, 2012 at 10:30 am. The test set contains the remaining data, namely 199,080 measurements from September 8th, 2012 at 10:40 am to June 21st, 2016 at 11:40 pm. [wherein the testing dataset corresponds to a time period that is subsequent to the different time periods corresponding to the training datasets].”)
Galicia does not teach:
wherein each machine learning model is trained using a different training dataset from a plurality of training datasets,
inputting a testing dataset into each machine learning model of the plurality of machine learning models to obtain a plurality of performance metrics for the plurality of machine learning models,
wherein the plurality of performance metrics include a respective recall score and one or more of a respective F1 score or a respective log loss of each machine learning model;
selecting, based on the plurality of performance metrics, a first dataset of the plurality of training datasets and a second dataset of the plurality of training datasets;
training, using an aggregated weighted dataset comprising the first dataset weighted using a first weight for a first machine learning model of the plurality of machine learning models and the second dataset weighted using a second weight for a second machine learning model of the plurality of machine learning models,
a new machine learning model with improved performance metrics compared to training the plurality of machine learning models using the plurality of training datasets due to the aggregated weighted dataset representing an optimal combination of a subset of the plurality of training datasets;
predicting, using the new machine learning model, whether an action will be completed by a deadline.
ZARE teaches:
wherein each machine learning model is trained using a different training dataset from a plurality of training datasets,
(ZARE, “[0050] FIG. 5 depicts an exemplary method for identifying predictive machine learning parameters using multiple machine learning models, consistent with disclosed embodiments. In the non-limiting example depicted in FIG. 5, the at least one computer of process 100 can be configured to generate three models [wherein each machine learning model] based on three collections of input data (input data 502, input data 504, and input data 506) [is trained using a different training dataset from a plurality of training datasets]. These three collections of input data can each comprise a training dataset and a validation dataset. In some embodiments, a seed may be associated with each collection of input data.”)
inputting a testing dataset into each machine learning model of the plurality of machine learning models to obtain a plurality of performance metrics for the plurality of machine learning models,
(ZARE, Fig. 5)
PNG
media_image1.png
471
462
media_image1.png
Greyscale
[AltContent: textbox ([inputting a testing dataset into each machine learning model of the plurality of machine learning models to obtain a plurality of performance metrics for the plurality of machine learning models])][AltContent: rect]
ZARE and Galicia are related to the same field of endeavor (i.e.: training ensemble models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of ZARE with teachings of Galicia to calculate accuracy metrics for each model to perform a statistical analysis of these metrics to rank the independent variables, enhancing the ensemble’s ability to select the most relevant features and improve overall model accuracy (ZARE, ¶[0004]).
Galicia in view of ZARE do not teach:
wherein the plurality of performance metrics include a respective recall score and one or more of a respective F1 score or a respective log loss of each machine learning model;
selecting, based on the plurality of performance metrics, a first dataset of the plurality of training datasets and a second dataset of the plurality of training datasets;
training, using an aggregated weighted dataset comprising the first dataset weighted using a first weight for a first machine learning model of the plurality of machine learning models and the second dataset weighted using a second weight for a second machine learning model of the plurality of machine learning models,
a new machine learning model with improved performance metrics compared to training the plurality of machine learning models using the plurality of training datasets due to the aggregated weighted dataset representing an optimal combination of a subset of the plurality of training datasets;
predicting, using the new machine learning model, whether an action will be completed by a deadline.
GUELMAN teaches:
wherein the plurality of performance metrics include a respective recall score and one or more of a respective F1 score or a respective log loss of each machine learning model;
(GUELMAN, “[0072] In some embodiments, performance metrics unit 315 [the plurality of performance metrics include] may be configured to output one or more metrics, which may include one or more of: auc variable, which represents an area under the curve for a binary classification model; precision variable, which is determined by true positives divided by the sum of true positives and false positives; recall variable [a respective recall score], which is determined by true positives divided by the sum of true positives and false negatives; specificity variable, which is determined by true negatives divided by the sum of true negatives and false negatives; f1Score variable [and one or more of a respective F1 score], which represents the f1 score; ks variable, which represents a Kolmogorov-Smirnov statistic; ce variable, which represents a classification error; logLoss variable [or a respective log loss of each machine learning model], which represents a log loss or entropy loss for a binary outcome; brier variable, which represents a Brier score; mse variable, which represents a mean square error; rmse variable, which represents a root mean square error; and mae variable, which represents a mean absolute error, and so on.”)
selecting, based on the plurality of performance metrics, a first dataset of the plurality of training datasets
(GUELMAN, “[0016] In other aspects, there is provided a computer-implemented method for monitoring and improving a performance of a machine learning model, the method comprising: receiving or storing one or more model data sets representative of the machine learning model, wherein the machine learning model has being trained with a first set of training data [selecting … a first dataset of the plurality of training datasets]; analyzing at least one of the first set of training data and the one or more model data sets, based on one or more performance parameters for the machine learning model [based on the plurality of performance metrics], to generate one or more performance data sets; and process the one or more performance data sets to determine one or more values representing a performance of the machine learning model.”)
and a second dataset of the plurality of training datasets;
(GUELMAN, “[0017] In some embodiments, the method includes selecting a second set of training data based on the performance data [a second dataset of the plurality of training datasets] and re-training the machine learning model using the second set of training data.”)
GUELMAN, Galicia and ZARE are related to the same field of endeavor (i.e.: training ensemble models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of GUELMAN with teachings of Galicia and ZARE to enable continuous assessment and quantification of model performance.(GUELMAN, Abstract).
Galicia in view of ZARE and GUELMAN do not teach:
training, using an aggregated weighted dataset comprising the first dataset weighted using a first weight for a first machine learning model of the plurality of machine learning models and the second dataset weighted using a second weight for a second machine learning model of the plurality of machine learning models,
a new machine learning model with improved performance metrics compared to training the plurality of machine learning models using the plurality of training datasets due to the aggregated weighted dataset representing an optimal combination of a subset of the plurality of training datasets;
predicting, using the new machine learning model, whether an action will be completed by a deadline.
Wang teaches:
training, using an aggregated weighted dataset comprising the first dataset weighted using a first weight for a first machine learning model of the plurality of machine learning models and the second dataset weighted using a second weight for a second machine learning model of the plurality of machine learning models,
(Wang, (col. 10 line [18 – 25]), “FIG. 5 is a flow diagram illustrating operations 500 of a method for generating an aggregated machine learning model from a first machine learning model and a second machine learning model [training, using an aggregated weighted dataset comprising the first dataset weighted using a first weight for a first machine learning model of the plurality of machine learning models and the second dataset weighted using a second weight for a second machine learning model of the plurality of machine learning models], selecting the first machine learning model, the second machine learning model, or the aggregated machine learning model for usage, and performing an inference with the selected machine learning model according to some embodiments.”)
a new machine learning model with improved performance metrics compared to training the plurality of machine learning models using the plurality of training datasets due to the aggregated weighted dataset representing an optimal combination of a subset of the plurality of training datasets;
(Wang, (col. 9 line [1 – 13]), “Using only one machine learning model does not necessarily reveal significant information from the training data. An aggregated model (e.g., ensemble of models) can be used to improve the performance of a model [a new machine learning model with improved performance metrics compared to training the plurality of machine learning models using the plurality of training datasets due to the aggregated weighted dataset representing an optimal combination of a subset of the plurality of training datasets], e.g., for forecasting. By combing different individual models, certain embodiments herein generate a plurality of aggregated models, e.g., C(N, 1) + C(N, 2) + C(N, 3) + ... + C(N, N). In certain embodiments, validation is performed on each of the aggregated models to generate each’s accuracy and error metrics. Thus, in certain embodiments, the selection of the number of individual models (N) controls how many aggregated models are generated. In one embodiment, N is 2, 3, 4, or 5 (or any other positive integer).”)
Wang, Galicia, ZARE and GUELMAN are related to the same field of endeavor (i.e.: training ensemble models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Wang with teachings of Galicia, ZARE and GUELMAN to add model aggregation and selection mechanism to enable the combination of multiple training models (Wang, Abstract).
Galicia in view of ZARE, GUELMAN and Wang do not teach:
predicting, using the new machine learning model, whether an action will be completed by a deadline.
Chen teaches:
predicting, using the new machine learning model, whether an action will be completed by a deadline.
(Chen, “[0026] Additionally, some jobs may be highly parallel and iterative in nature. For example, a job may be run in a repeatable pattern over time. The performance of such a job is sensitive to resource availability. In an example, the job is a machine learning job that predicts an amount of resources and the type of resources (e.g., how many processors or virtual machines) that should be used to satisfy a deadline or the criteria for the successful completion of a job [predicting, using the new machine learning model, whether an action will be completed by a deadline], per iteration. The execution of a job may produce intermediate results. Processing such a job may include processing multiple iterations of a task, with different intermediary results being produced per iteration. Each of the intermediate results may be used to produce a final result of the job. The time that it takes to complete a first iteration of the task may be noted, and determined to be too long. In this example, it may be advantageous to accelerate this process for subsequent iterations of the task and thus allocate more resources. A resource claim for a batch job is programmable, but typically before the job starts executing.”)
Chen, Galicia, ZARE, GUELMAN and Wang are related to the same field of endeavor (i.e.: training ensemble models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Chen with teachings of Galicia, ZARE, GUELMAN and Wang to add runtime optimization of system efficiency and scheduling based on observed model performance or intermediate outputs. (Chen, Abstract).
Claim(s) 5, 9, 12 – 13, 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Galicia in view of ZARE, GUELMAN and Wang.
Regarding claim 5, Galicia teaches: training a plurality of machine learning models, using a plurality of datasets,
(Galicia, page: 832, “Ensembles of prediction models are one of the most successful methods used in practical applications [34]. An ensemble usually improves the results of the single base models it combines. In this paper we focus on ensembles for time series forecasting, and especially for big data. We propose to combine DT, GBT and RF in an ensemble [training a plurality of machine learning models, using a plurality of datasets], due to the good results obtained by each of these algorithms when applied to big data time series forecasting [15].”)
wherein each training dataset corresponds to a different time period;
(Galicia, page: 833, “The time series used in this work is related to the total electrical energy consumption in Spain, from January 1st 2007 at midnight to June 21st 2016 at 11:40 pm. In short, it is a time series of nine and a half years with a high sampling frequency, namely 10 min intervals, including 49,832 measurements in total. When using the proposed methodology with a prediction horizon of 4 h (h is set to 24 values), the dataset consists of 20,742 instances and 144 attributes, corresponding to 5.70 MiB of storage size. These 144 attributes correspond to a window w of 144 past values (24 h). For the static ensemble, this dataset is divided into a training set and a test set consisting of 60% and 40% of the data, respectively. The training set has 298,752 measurements; it includes data from January 1st, 2007 at midnight to September 8th, 2012 at 10:30 am [wherein each training dataset corresponds to a different time period]. The test set contains the remaining data, namely 199,080 measurements from September 8th, 2012 at 10:40 am to June 21st, 2016 at 11:40 pm.”)
inputting a new dataset into each machine learning model of the plurality of machine learning models
PNG
media_image3.png
283
609
media_image3.png
Greyscale
[AltContent: textbox ([inputting a new dataset into each model of the plurality of machine learning models])][AltContent: rect][AltContent: arrow](Galicia, Fig. 2)
Galicia does not teach:
to obtain a plurality of performance metrics for the plurality of machine learning models
selecting, based on the plurality of performance metrics, a first dataset of the plurality of datasets and a second dataset of the plurality of datasets;
training, using an aggregated weighted dataset comprising the first dataset weighted using a first weight for a first machine learning model of the plurality of machine learning models and the second dataset weighted using a second weight for a second machine learning model of the plurality of machine learning models, a new machine learning model with improved performance metrics compared to training the plurality of machine learning models using the plurality of datasets due to the aggregated weighted dataset representing an optimal combination of a subset of the plurality of datasets
ZARE teaches:
to obtain a plurality of performance metrics for the plurality of machine learning models
(ZARE, “[0041] … In some embodiments, training of a model may terminate when a training criterion is satisfied. Training criteria may include number of epochs, training time, performance metric values (e.g., an estimate of accuracy in reproducing test data) [to obtain a plurality of performance metrics for the plurality of machine learning models], or the like. Machine learning framework 238 may be configured to adjust model parameters and/or hyperparameters during training. For example, machine learning framework 238 may be configured to modify model parameters and/or hyperparameters (i.e., hyperparameter tuning) using an optimization technique during training, consistent with disclosed embodiments…”)
ZARE and Galicia are related to the same field of endeavor (i.e.: training ensemble models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of ZARE with teachings of Galicia to calculate accuracy metrics for each model to perform a statistical analysis of these metrics to rank the independent variables, enhancing the ensemble’s ability to select the most relevant features and improve overall model accuracy (ZARE, ¶[0004]).
Galicia in view of ZARE do not teach:
selecting, based on the plurality of performance metrics, a first dataset of the plurality of datasets and a second dataset of the plurality of datasets;
training, using an aggregated weighted dataset comprising the first dataset weighted using a first weight for a first machine learning model of the plurality of machine learning models and the second dataset weighted using a second weight for a second machine learning model of the plurality of machine learning models, a new machine learning model with improved performance metrics compared to training the plurality of machine learning models using the plurality of datasets due to the aggregated weighted dataset representing an optimal combination of a subset of the plurality of datasets
GUELMAN teaches:
selecting, based on the plurality of performance metrics, a first dataset of the plurality of training datasets
(GUELMAN, “[0016] In other aspects, there is provided a computer-implemented method for monitoring and improving a performance of a machine learning model, the method comprising: receiving or storing one or more model data sets representative of the machine learning model, wherein the machine learning model has being trained with a first set of training data [selecting … a first dataset of the plurality of training datasets]; analyzing at least one of the first set of training data and the one or more model data sets, based on one or more performance parameters for the machine learning model [based on the plurality of performance metrics], to generate one or more performance data sets; and process the one or more performance data sets to determine one or more values representing a performance of the machine learning model.”)
and a second dataset of the plurality of training datasets;
(GUELMAN, “[0017] In some embodiments, the method includes selecting a second set of training data based on the performance data [a second dataset of the plurality of training datasets] and re-training the machine learning model using the second set of training data.”)
GUELMAN, Galicia and ZARE are related to the same field of endeavor (i.e.: training ensemble models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of GUELMAN with teachings of Galicia and ZARE to enable continuous assessment and quantification of model performance.(GUELMAN, Abstract).
Galicia in view of ZARE and GUELMAN do not teach:
training, using an aggregated weighted dataset comprising the first dataset weighted using a first weight for a first machine learning model of the plurality of machine learning models and the second dataset weighted using a second weight for a second machine learning model of the plurality of machine learning models, a new machine learning model with improved performance metrics compared to training the plurality of machine learning models using the plurality of datasets due to the aggregated weighted dataset representing an optimal combination of a subset of the plurality of datasets
Wang teaches:
training, using an aggregated weighted dataset comprising the first dataset weighted using a first weight for a first machine learning model of the plurality of machine learning models and the second dataset weighted using a second weight for a second machine learning model of the plurality of machine learning models,
(Wang, (col. 10 line [18 – 25]), “FIG. 5 is a flow diagram illustrating operations 500 of a method for generating an aggregated machine learning model from a first machine learning model and a second machine learning model [training, using an aggregated weighted dataset comprising the first dataset weighted using a first weight for a first machine learning model of the plurality of machine learning models and the second dataset weighted using a second weight for a second machine learning model of the plurality of machine learning models], selecting the first machine learning model, the second machine learning model, or the aggregated machine learning model for usage, and performing an inference with the selected machine learning model according to some embodiments.”)
a new machine learning model with improved performance metrics compared to training the plurality of machine learning models using the plurality of training datasets due to the aggregated weighted dataset representing an optimal combination of a subset of the plurality of training datasets;
(Wang, (col. 9 line [1 – 13]), “Using only one machine learning model does not necessarily reveal significant information from the training data. An aggregated model (e.g., ensemble of models) can be used to improve the performance of a model [a new machine learning model with improved performance metrics compared to training the plurality of machine learning models using the plurality of training datasets due to the aggregated weighted dataset representing an optimal combination of a subset of the plurality of training datasets], e.g., for forecasting. By combing different individual models, certain embodiments herein generate a plurality of aggregated models, e.g., C(N, 1) + C(N, 2) + C(N, 3) + ... + C(N, N). In certain embodiments, validation is performed on each of the aggregated models to generate each’s accuracy and error metrics. Thus, in certain embodiments, the selection of the number of individual models (N) controls how many aggregated models are generated. In one embodiment, N is 2, 3, 4, or 5 (or any other positive integer).”)
Wang, Galicia, ZARE and GUELMAN are related to the same field of endeavor (i.e.: training ensemble models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Wang with teachings of Galicia, ZARE and GUELMAN to add model aggregation and selection mechanism to enable the combination of multiple training models (Wang, Abstract).
Regarding claim 9, Galicia in view of ZARE, GUELMAN and Wang teach the method of claim 5.
Galicia further teaches: wherein the new dataset corresponds to a time period that is later than the different time periods corresponding to the plurality of datasets.
(Galicia, page: 833, “The time series used in this work is related to the total electrical energy consumption in Spain, from January 1st 2007 at midnight to June 21st 2016 at 11:40 pm. In short, it is a time series of nine and a half years with a high sampling frequency, namely 10 min intervals, including 49,832 measurements in total. When using the proposed methodology with a prediction horizon of 4 h (h is set to 24 values), the dataset consists of 20,742 instances and 144 attributes, corresponding to 5.70 MiB of storage size. These 144 attributes correspond to a window w of 144 past values (24 h). For the static ensemble, this dataset is divided into a training set and a test set consisting of 60% and 40% of the data, respectively. The training set has 298,752 measurements; it includes data from January 1st, 2007 at midnight to September 8th, 2012 at 10:30 am. [wherein the new dataset corresponds to a time period that is later than the different time periods corresponding to the plurality of datasets] The test set contains the remaining data, namely 199,080 measurements from September 8th, 2012 at 10:40 am to June 21st, 2016 at 11:40 pm.”)
Claim 17 recites analogous limitation as claim 9, so is rejected under similar rationale.
Regarding claim 13, ZARE teaches: One or more tangible, non-transitory, computer-readable media instructions that when executed by one or more processors effectuate operations comprising:
(ZARE, “[0039] Programs 236 may include one or more programs (e.g., modules, code, scripts, or functions) used to perform methods consistent with disclosed embodiments. Programs may include operating systems (not shown) that perform known operating system functions when executed by one or more processors [that when executed by one or more processors effectuate operations comprising]. Disclosed embodiments may operate and function with computer systems running any type of operating system. Programs 236 may be written in one or more programming or scripting languages. One or more of such software sections or modules of memory 230 [One or more tangible, non-transitory, computer-readable media instructions] may be integrated into a computer system, non-transitory computer-readable media, or existing communications software. Programs 236 may also be implemented or replicated as firmware or circuit logic.”)
The rest of the limitations are analogous to claim 5, so are rejected under similar rationale.
Regarding claim 12, Galicia in view of ZARE, GUELMAN and Wang teach the method of claim 5.
ZARE further teaches: wherein the plurality of performance metrics comprises one or more of speed, accuracy and precision.
(ZARE, “[0029] In step 116 of process 100, the one or more computing devices can calculate accuracy metric values for the models using the validation sets associated with the training sets used to create respective models, consistent with disclosed embodiments. For example, after a model is created using a training set, the model accuracy is evaluated using the validation set that was associated with the training set at step 112. An accuracy metric value [wherein the plurality of performance metrics comprises one or more of speed, accuracy and precision] may be calculated for each model, or a portion of the models. The accuracy metric value can reflect how accurately each model predicts the validation data provided to it. Any accuracy metric may be used in step 116, such as a true positive rate or a true negative rate. In some embodiments, the Area Under Curve (AUC), also known as the Area Under the Receiver Operating Characteristics (AUROC), may be used. In the non-limiting decision tree example provided above, XGBoost can be used to determine the AUC for each model. As would be appreciated by those of skill in the art, other suitable training packages may be used to calculate the AUC, or another suitable accuracy metric, for machine learning models generated using such a training package.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of ZARE with teaching of Galicia, GUELMAN and Wang for the same reasons disclosed for claim 5.
Claim 20 recites analogous limitation as claim 12, so is rejected under similar rationale.
Claim(s) 2, 6 – 8 and 14 – 16 are rejected under 35 U.S.C. 103 as being unpatentable over Galicia in view of ZARE, GUELMAN, Wang, Chen and in further view of Nookula et al., Pub. No.: US11677634B1 and Nagpal et al., Pub. No.: US20200034745A1.
Regarding claim 2, Galicia in view of ZARE, GUELMAN, Wang, Chen teach the method of claim 1.
Galicia in view of ZARE, GUELMAN, Wang, Chen do not teach:
selecting a subset of the plurality of machine learning models;
grouping each machine learning model of the plurality of machine learning models into a plurality of groups based on time periods corresponding to training data of each machine learning model; and selecting, based on comparing the plurality of performance metrics, a machine learning model from each group of the plurality of groups to add to the subset of the plurality of machine learning models.
Nookula teaches:
selecting a subset of the plurality of machine learning models;
(Nookula, (col. 7 lines (59 – 67) – col. 8 lines (1 – 5)) “FIG. 4 is a flow diagram that illustrates selecting and deploying models based on sensor availability, according to some embodiments. At block 402, a model selection and deployment service determines whether an indication of sensor availability at a remote client network has been received. For example, the indication may be a message or request originating from a hub device. If not, then the process may wait for at least a threshold period of time before determining again whether any other indications have been received. If the indication of sensor availability has been received, then at block 404, based on the sensor availability, the model selection and deployment service selects a data processing model from a group of models [selecting a subset of the plurality of machine learning models] that are available for deployment.”)
Nookula, Galicia, ZARE, GUELMAN, Wang and Chen are related to the same field of endeavor (i.e.: training ensemble models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Nookula with teachings of Galicia, ZARE, GUELMAN, Wang and Chen to enhance dynamic model selection and deployment to improve an ensemble training model by adapting to real-time performance, enhancing accuracy, optimizing resource use, and ensuring faster deployment (Nookula, Abstract).
Galicia in view of ZARE, GUELMAN, Wang, Chen and Nookula do not teach:
grouping each machine learning model of the plurality of machine learning models into a plurality of groups based on time periods corresponding to training data of each machine learning model; and
selecting, based on comparing the plurality of performance metrics, a machine learning model from each group of the plurality of groups
Nagpal teaches:
grouping each machine learning model of the plurality of machine learning models into a plurality of groups based on time periods corresponding to training data of each machine learning model; and
(Nagpal, “[0093] Specifically, the present technique uses a tournament to evaluate individual prediction models on one or more past time windows of data, and then uses one or more map reduce functions to select the one or a plurality of prediction models that best predict a second time window. This allows one or a plurality of prediction models to be selected and aggregated, or reconciled, to create an individual prediction model that may contain elements of a plurality of models. Furthermore, the past time window may be sampled at different frequencies such as at a short interval for selecting a prediction model for a short term time period, or at a medium term interval for selecting a prediction model for a medium term time period, or at a long term interval for selecting a prediction model for a long term time period. Additionally, individual prediction models for differing time periods [based on time periods corresponding to training data of each machine learning model] may be analyzed in relation to each other such as when a model is selected because it does the best overall for all intervals, or selecting the model that does the best for each interval and aggregating those models together to create a single hybrid model. Such techniques provide prediction models that can be dynamically selected using actual usage data, and can be selected from a group of models [grouping each machine learning model of the plurality of machine learning models into a plurality of groups] or aggregation of models as circumstances dictate. Further details regarding the trainer are provided below at least in association with the discussion of FIG. 3 and FIG. 5.”)
selecting, based on comparing the plurality of performance metrics, a machine learning model from each group of the plurality of groups
(Nagpal, “[0093] Specifically, the present technique uses a tournament to evaluate individual prediction models on one or more past time windows of data, and then uses one or more map reduce functions to select the one or a plurality of prediction models that best predict a second time window. This allows one or a plurality of prediction models to be selected and aggregated, or reconciled, to create an individual prediction model that may contain elements of a plurality of models. Furthermore, the past time window may be sampled at different frequencies such as at a short interval for selecting a prediction model for a short term time period, or at a medium term interval for selecting a prediction model for a medium term time period, or at a long term interval for selecting a prediction model for a long term time period. Additionally, individual prediction models for differing time periods may be analyzed in relation to each other such as when a model is selected because it does the best overall for all intervals, or selecting the model that does the best for each interval and aggregating those models together to create a single hybrid model. Such techniques provide prediction models that can be dynamically selected using actual usage data, and can be selected from a group of models [selecting, based on comparing the plurality of performance metrics, a machine learning model from each group of the plurality of groups] or aggregation of models as circumstances dictate. Further details regarding the trainer are provided below at least in association with the discussion of FIG. 3 and FIG. 5.”)
to add to the subset of the plurality of machine learning models.
(Nagpal Fig. 1D)
PNG
media_image5.png
611
486
media_image5.png
Greyscale
[AltContent: rect][AltContent: arrow][AltContent: textbox ([to add to the subset of the plurality of machine learning models]
])]
Nagpal, Galicia, ZARE, GUELMAN, Wang, Chen and Nookula are related to the same field of endeavor (i.e.: training ensemble models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Nagpal with teachings of Galicia, ZARE, GUELMAN, Wang, Chen and Nookula to enhance system efficiency by optimizing forecasting accuracy, distributing memory and processing demands, and reducing inter-component communication. (Nagpal, ¶[0010]).
Claim(s) 6 and 14 recite analogous limitation as claim 2, so are rejected under similar rationale.
Regarding claim 7, Galicia, ZARE, GUELMAN, Wang, Chen, Nookula and Nagpal teach the method of claim 6.
Nagpal further teaches: wherein selecting the machine learning model from each group of the plurality of groups comprises selecting the machine learning model associated with a highest performance metric from each group of the plurality of groups.
(Nagpal, “[0093] Specifically, the present technique uses a tournament to evaluate individual prediction models on one or more past time windows of data, and then uses one or more map reduce functions to select the one or a plurality of prediction models that best predict a second time window. This allows one or a plurality of prediction models [from each group of the plurality of groups] to be selected and aggregated, or reconciled, to create an individual prediction model that may contain elements of a plurality of models. Furthermore, the past time window may be sampled at different frequencies such as at a short interval for selecting a prediction model for a short term time period, or at a medium term interval for selecting a prediction model for a medium term time period, or at a long term interval for selecting a prediction model for a long term time period. Additionally, individual prediction models for differing time periods may be analyzed in relation to each other such as when a model is selected because it does the best overall for all intervals, or selecting the model that does the best [selecting a machine learning model associated with the highest performance metric] for each interval and aggregating those models together to create a single hybrid model. Such techniques provide prediction models that can be dynamically selected using actual usage data, and can be selected from a group of models or aggregation of models as circumstances dictate. Further details regarding the trainer are provided below at least in association with the discussion of FIG. 3 and FIG. 5.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Nagpal with teaching of Galicia, ZARE, GUELMAN, Wang, Chen, Nookula for the same reasons disclosed for claim 2.
Claim 15 recites analogous limitation as claim 7, so is rejected under similar rationale.
Regarding claim 8, Galicia, ZARE, GUELMAN, Wang, Chen, Nookula and Nagpal teach the method of claim 6.
ZARE further teaches: selecting the machine learning model associated with a median performance metric from each group of the plurality of groups.
(ZARE, “[0031] In some embodiments, the one or more computing devices can determine whether the plurality of machine learning models [selecting the machine learning model] satisfies an accuracy criterion [performance metric]. For example, the one or more computing devices can determine one or more statistics (e.g., mean, median, [associated with a median from each group of the plurality of groups] standard deviation, or the like) of the accuracy metric values for the plurality of machine learning models. The one or more computing devices can be configured to end the analysis without ranking the model parameters, or not provide or return the ranked model parameters, when the one or more statistics fails to satisfy the accuracy criterion. For example, the accuracy criterion may be a threshold value (e.g., a predetermined value between 0.5 and 0.8) and the one or more computing devices can be configured to end the analysis without ranking the model parameters when an average of the accuracy metric values for the plurality of machine learning models does not exceed this threshold value. In some embodiments, the one or more computing devices can perform the ranking using only the machine learning models having accuracy metric values satisfying the accuracy criterion. For example, when two of three machine learning models have accuracy metric values satisfying the accuracy criterion (e.g., exceeding a predetermined threshold), the one or more computing devices can perform the ranking using only these two machine learning models.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of ZARE with teaching of Galicia, GUELMAN, Wang, Chen, Nookula and Nagpal for the same reasons disclosed for claim 2.
Claim 16 recites analogous limitation as claim 8, so is rejected under similar rationale.
Claim(s) 3, 10 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Galicia in view of ZARE, GUELMAN, Wang, Chen and in further view of Sivakumar et al., Pub. No.: US20210287084A1.
Regarding claim 3, Galicia in view of ZARE, GUELMAN, Wang, Chen teach the method of claim 1.
Galicia in view of ZARE, GUELMAN, Wang and Chen do not teach:
comparing a first performance metric of the plurality of performance metrics with a second performance metric of the plurality of performance metrics; and
based on a determination that the first performance metric is greater than the second performance metric,
assigning the first weight to the first machine learning model that corresponds to the first performance metric and the second weight to the second machine learning model that corresponds to the second performance metric, wherein the first weight is greater than the second weight.
Sivakumar teaches:
comparing a first performance metric of the plurality of performance metrics with a second performance metric of the plurality of metrics; and
(Sivakumar, “[0073] Also, in one embodiment, the first output and the second output may be compared to labels associated with the additional data to determine an accuracy of the first output and an accuracy of the second output [comparing a first performance metric of the plurality of performance metrics with a second performance metric of the plurality of metrics]. For example, the accuracy may be associated with a validation loss for the trained model. In another example, a first trained model with a lower validation loss than a second trained model may have a higher accuracy than the second trained model.”)
based on a determination that the first performance metric is greater than the second performance metric,
(Sivakumar, “[0073] Also, in one embodiment, the first output and the second output may be compared to labels associated with the additional data to determine an accuracy of the first output and an accuracy of the second output. For example, the accuracy may be associated with a validation loss for the trained model. In another example, a first trained model with a lower validation loss than a second trained model may have a higher accuracy than the second trained model [based on a determination that the first performance metric is greater than the second performance metric] (i.e.: the accuracy is associated with validation loss, where a lower validation loss indicates higher accuracy).
assigning the first weight to the first machine learning model that corresponds to the first performance metric and the second weight to the second machine learning model that corresponds to the second performance metric, wherein the first weight is greater than the second weight.
(Sivakumar, “[0074] Additionally, in one embodiment, the weight assigned to the predetermined augmentation for the training data set may be determined based on the comparison of the first output to the second output [assigning the first weight to the first machine learning model that corresponds to the first performance metric and the second weight to the second machine learning model that corresponds to the second performance metric]. For example, a value of the weight may be proportional to an improvement of an accuracy of the first output (associated with the model trained with the augmented sample set) over an accuracy of the second output (associated with the model trained with the non-augmented sample set). In another example, if a first improvement in accuracy (associated with a first augmentation) is greater than a second improvement in accuracy (associated with a second augmentation), the first augmentation may me assigned a greater weight than the second [wherein the first weight is greater than the second weight] augmentation.”)
Sivakumar, Galicia, ZARE, GUELMAN, Wang and Chen are related to the same field of endeavor (i.e.: training ensemble models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Sivakumar with teachings of Galicia, ZARE, GUELMAN, Wang and Chen to improve selecting and testing various augmentations on a sample set, it determines the most effective augmentations based on model performance to assign weights to these augmentations, selecting the best ones to apply to the full training dataset. (Sivakumar, ¶[0010]).
Claim(s) 10 and 18 recite analogous limitation as claim 3, so are rejected under similar rationale.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Galicia in view of ZARE, GUELMAN, Wang, Chen and in further view of DENG et al., Pub. No.: US20190180184A1.
Regarding claim 4, Galicia in view of ZARE, GUELMAN, Wang and Chen teach the method of claim 1.
ZARE further teaches: determining a first performance metric of the plurality of performance metrics that fails to satisfy a threshold; and
(ZARE, “[0031] In some embodiments, the one or more computing devices can determine whether the plurality of machine learning models satisfies an accuracy criterion. For example, the one or more computing devices can determine one or more statistics (e.g., mean, median, standard deviation, or the like) of the accuracy metric values for the plurality of machine learning models. The one or more computing devices can be configured to end the analysis without ranking the model parameters, or not provide or return the ranked model parameters, when the one or more statistics fails to satisfy the accuracy criterion [determining a first performance metric of the plurality of performance metrics that fails to satisfy a threshold]. For example, the accuracy criterion may be a threshold value (e.g., a predetermined value between 0.5 and 0.8) and the one or more computing devices can be configured to end the analysis without ranking the model parameters when an average of the accuracy metric values for the plurality of machine learning models does not exceed this threshold value. In some embodiments, the one or more computing devices can perform the ranking using only the machine learning models having accuracy metric values satisfying the accuracy criterion. For example, when two of three machine learning models have accuracy metric values satisfying the accuracy criterion (e.g., exceeding a predetermined threshold), the one or more computing devices can perform the ranking using only these two machine learning models.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of ZARE with teaching of Galicia, GUELMAN, Wang and Chen for the same reasons disclosed for claim 1.
Galicia, ZARE, GUELMAN, Wang and Chen do not teach:
in response to determining that the first performance metric fails to satisfy the threshold, assigning zero weight to the first machine learning model
DENG teaches:
in response to determining that the first performance metric fails to satisfy the threshold, assigning zero weight to the first machine learning model
(DENG, “[0023] In one embodiment, a weight-pruning technique uses an analytic threshold function that optimally reduces the number of weights, thereby increasing performance of a neural network [in response to determining that the first performance metric fails to satisfy the threshold]. The analytic threshold function may be applied to the weights of the various layers of the neural network so that weights having magnitudes that are less than a threshold are set to zero [assigning zero weight to the first machine learning model of the subset of machine learning models.] and weights that are greater than the threshold are not affected. Additionally, the threshold function is differentiable and parameters of the threshold function may be optimized during back-propagation. The analytic threshold function may be trained concurrently with network weights during back-propagation, thereby avoiding a time-consuming iterative process.”)
DENG, Galicia, ZARE, GUELMAN, Wang and Chen are related to the same field of endeavor (i.e.: training ensemble models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of DENG with teachings of Galicia, ZARE, GUELMAN, Wang and Chen to fine-tune the balance between model accuracy and speed to lead to a more efficient neural network that retains or even improves accuracy while being less resource-intensive. (DENG, ¶[0007]).
Claim(s) 11 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Galicia in view of ZARE, GUELMAN, Wang and in further view of DENG.
Regarding claim 11, Galicia in view of ZARE, GUELMAN and Wang teach the method of claim 5.
ZARE further teaches: determining a first performance metric of the plurality of performance metrics that fails to satisfy a threshold; and
(ZARE, “[0031] In some embodiments, the one or more computing devices can determine whether the plurality of machine learning models satisfies an accuracy criterion. For example, the one or more computing devices can determine one or more statistics (e.g., mean, median, standard deviation, or the like) of the accuracy metric values for the plurality of machine learning models. The one or more computing devices can be configured to end the analysis without ranking the model parameters, or not provide or return the ranked model parameters, when the one or more statistics fails to satisfy the accuracy criterion [determining a first performance metric of the plurality of performance metrics that fails to satisfy a threshold]. For example, the accuracy criterion may be a threshold value (e.g., a predetermined value between 0.5 and 0.8) and the one or more computing devices can be configured to end the analysis without ranking the model parameters when an average of the accuracy metric values for the plurality of machine learning models does not exceed this threshold value. In some embodiments, the one or more computing devices can perform the ranking using only the machine learning models having accuracy metric values satisfying the accuracy criterion. For example, when two of three machine learning models have accuracy metric values satisfying the accuracy criterion (e.g., exceeding a predetermined threshold), the one or more computing devices can perform the ranking using only these two machine learning models.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of ZARE with teaching of Galicia, GUELMAN and Wang for the same reasons disclosed for claim 5.
Galicia in view of ZARE, GUELMAN and Wang do not teach:
in response to determining that the first performance metric fails to satisfy the threshold, assigning zero weight to the first machine learning model of a subset, of the plurality of machine learning models, that corresponding to the first performance metric.
DENG teaches:
in response to determining that the first performance metric fails to satisfy the threshold, assigning zero weight to the first machine learning model of a subset, of the plurality of machine learning models,
(DENG, “[0023] In one embodiment, a weight-pruning technique uses an analytic threshold function that optimally reduces the number of weights, thereby increasing performance of a neural network [in response to determining that the first performance metric fails to satisfy the threshold]. The analytic threshold function may be applied to the weights of the various layers of the neural network so that weights having magnitudes that are less than a threshold are set to zero [assigning zero weight to the first machine learning model of the subset, of the plurality of machine learning models,] and weights that are greater than the threshold are not affected. Additionally, the threshold function is differentiable and parameters of the threshold function may be optimized during back-propagation. The analytic threshold function may be trained concurrently with network weights during back-propagation, thereby avoiding a time-consuming iterative process.”)
that corresponding to the first performance metric.
(DENG “[0045] … At 705, a difference between the output and the training data is minimized to determine a set of weights w that enhance a speed performance of the neural network, an accuracy of the neural network, or a combination thereof, by minimizing a cost function C based on a derivative of the cost function C with respect to first parameter α of the threshold function h(w) and based on a derivative of the cost function C with respect to a second parameter β of the threshold function h(w). At 706, inputting training data to the neural network to generate an output based on the training data, back-propagating the output through the neural network, and minimizing a difference between the output and the training data are done repeatedly to determine the set of weights w that optimize the speed performance of the neural network [that corresponding to the first performance metric], the accuracy of the neural network, or a combination thereof. At 707, the process ends.”)
DENG, Galicia, ZARE, GUELMAN and Wang are related to the same field of endeavor (i.e.: training ensemble models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of DENG with teachings of Galicia, ZARE, GUELMAN and Wang to fine-tune the balance between model accuracy and speed to lead to a more efficient neural network that retains or even improves accuracy while being less resource-intensive. (DENG, ¶[0007]).
Claim 19 recites analogous limitation as claim 11, so are rejected under similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Qureshi et al., US10902341, (2016).
Qureshi teaches profile data and/or list item data with a service provider action that may lead to a particular user action with respect to a list item (e.g., a task item of a to-do list, a content item of a content item queue, etc.).
Gombolay et al., Pub. No.: US20170293844, (2017).
Gombolay discusses a computational framework for a task scheduling system. One or more classifiers are trained to predict (i) whether a first action should be scheduled instead of a second action using pairwise comparisons between actions scheduled by a demonstrator at particular times and actions not scheduled by the demonstrator at the particular times, and (ii) whether a particular action should be scheduled for a particular agent at a particular time..
Any inquiry concerning this communication or earlier communications from the examiner
should be directed to MATIYAS T MARU whose telephone number is (571)270-0902 or via email: matiyas.maru@uspto.gov. The examiner can normally be reached Monday 8:00am - Friday 4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a
USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to
use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor,
Michelle Bechtold can be reached on (571)431-0762. The fax phone number for the organization were this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from
Patent Center. Unpublished application information in Patent Center is available to registered users.
To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit
https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and
https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional
questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like
assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA)
or 571-272-1000.
/M.T.M./ Examiner, Art Unit 2148
/MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148