Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This office action is in response to the claims filed on 12/05/2025.
Claims 1-20 are presented for examination.
Response to Arguments
In reference to applicant’s argument regarding the Double Patenting Rejection.
Applicant’s Argument:
Claims 1-20 stand provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of United States Patent No. 11,809,968 in view of United States Patent Application Publication No. 2018/0240041 to Koch et al. (hereinafter "Koch"). As demonstrated by the above amendment and below arguments, independent claims 1, 8, and 15 have been amended to be patentably distinct over US 11,809,968 in view of Koch. Accordingly, Applicant respectfully requests that the non-statutory double patenting rejection be withdrawn. Applicant reserves the right to file one or more Terminal Disclaimers at a later time.
Examiner’s Response:
The Examiner maintains the outstanding double patenting rejections, which are updated below to reflect the Applicant's latest claim amendments.
In reference to applicant’s argument regarding the rejection under 35 U.S.C. § 103.
Applicant’s Argument:
Applicant respectfully submits that Koch in view of Wubbels fails to disclose at least the features "wherein the request includes an indication of an initial set of hyperparameters that define a starting point within a single specified search space portion of the multiple search space portions within the hyperparameter search space, the specified search space portion specified in the reqiuest,""train, using machine learning, a generation model to determine whether a search space portion is likely to provide a set of hyperparameters that improves a success metric by which success of the hyperparameter tuning is evaluated," and "based at least on the determination of whether the at least one set of hyperparameters improved the success metric, apply the generation model to determine whether the search space portion is likely to provide another set of hyperparameters that improves the success metric" of amended claim 1.
Examiner’s Response:
Examiner respectfully disagrees to applicant’s argument because Knoch still teaches "wherein the request includes an indication of an initial set of hyperparameters that define a starting point within a single specified search space portion of the multiple search space portions within the hyperparameter search space,” Koch as [Par.0053], that manager application request to generate the model for each of the selected hyperparameter. Further in [Par.0167-0169], each of model is associated with a particular value of hyperparameters and the request to generate model including the selected hyperparameter set based on the iteratively tuning search method. Therefore, the request to select the hyperparameter set include determining the first set of hyperparameter value to train on the specific selected model type at the fist tuning search space of the multiple searching space, therefore, the first tunning searching step is included in the request to select the first set of hyperparameter to train on the specified selected model. Therefore, the first set of hyperparameter value is selected during the first tuning search (the specific search space portion specified in the request) method of the iteration tuning search methods, that correspond to the initial set of hyperparameters that define a starting point within a single specified search space portion of the multiple search space portions within the hyperparameter search space. Therefore, the applicant’s argument is not persuasive, the rejection is still maintained.
Double Patenting
The non-statutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1-20 of Patent Number: 11809968 in view of Koch et al. (Pub. No. US 20180240041– hereinafter, Koch) as follows.
This is a non-provisional non-statutory double patenting rejection.
Instant Application: 17173970
Patent Number: 11809968
Claims 1, 8, 15
Claims 1, 3, 7.
However, however patent number 11809968 does not teach “wherein the request includes an indication of an initial set of hyperparameters that define a starting point within a specified search space portion of the multiple search space portions within the hyperparameter search space, the specified search space portion specified in the request;” , “wherein, for each searching space portion of the subset that is sequentially selected the generation of at least one set of hyperparameters from the search space portion comprises generation of a batch of sets of hyperparameters comprising a predetermined quantity of sets of hypermeters”;
On the other hand, Knoch teaches wherein the request includes an indication of an initial set of hyperparameters that define a starting point within a specified search space portion of the multiple search space portions within the hyperparameter search space, the specified search space portion specified in the request; (Koch, [Par.0053, 0167-0169]).
wherein, for each searching space portion of the subset that is sequentially selected the generation of at least one set of hyperparameters from the search space portion, (Koch, [Par. 0098])
comprises generation of a batch of sets of hyperparameters comprising a predetermined quantity of sets of hypermeters (Koch, [Par.0139-0142])”;
It would have been obvious to a person of ordinary skill in the arts at the time of the applicant's invention to modify the teachings of Patent No. 11809968 by incorporating the request includes an indication of an initial set of hyperparameters that define a starting point within a single search space portion of the multiple search space portions within the hyperparameter search space”, “wherein, for each searching space portion of the subset that is sequentially selected the generation of at least one set of hyperparameters from the search space portion comprises generation of a batch of sets of hyperparameters comprising a predetermined quantity of sets of hypermeters”, as taught by Koch for the purpose of identifying/selecting the best set of hyperparameter during the tuning process, (Koch, [Par.0130]).
Claims 2, 9, 16
Claim 1
Claims 3, 10, 17
However, the patent number 11809968 does not teach the claims 3, 10 and 17 of the instant application.
As the patent number 11809968 does not teaches the medium of claim 2, further storing instructions that cause the processor, for at least one search space portion of the subset that is sequentially selected, to: apply the prediction model during the prediction mode to generate a prediction of whether the use of the processing and storage resources to perform the hyperparameter tuning with the at least one set of hyperparameters generated from the at least one search space portion as an input will improve the success metric; and in response to a prediction that the success metric will be improved, perform operations comprising: use the processing and storage resources to perform the hyperparameter tuning with the at least one set of hyperparameters generated from the at least one search space portion as input to generate an output; evaluate the output to determine whether the success metric is improved; and further train, by machine learning, the generation model using the at least one set of hyperparameters and the evaluation of the output.”
On the other hand, Koch teaches the medium of claim 2, further storing instructions that cause the processor, for at least one search space portion of the subset that is sequentially selected, to (Koch, [Par.0167]), apply the prediction model during the prediction mode to generate a prediction of whether the use of the processing and storage resources (Koch, [Par.0005]), to perform the hyperparameter tuning with the at least one set of hyperparameters generated from the at least one search space portion as an input will improve the success metric (Koch, [Fig.17, [Par.0240]), and in response to a prediction that the success metric will be improved, perform operations comprising: use the processing and storage resources to perform the hyperparameter tuning with the at least one set of hyperparameters generated from the at least one search space portion as input to generate an output (Koch, [Fig.17, [Par.0240]), evaluate the output to determine whether the success metric is improved (Koch, [Par.0005]), and further train, by machine learning, the generation model using the at least one set of hyperparameters and the evaluation of the output (Koch, [Fig.17, [Par.0240]).
It would have been obvious to a person of ordinary skill in the arts at the time of the applicant's invention to modify the teachings of Patent No. 11809968 by incorporating the medium of claim 2, further storing instructions that cause the processor, for at least one search space portion of the subset that is sequentially selected, to: apply the prediction model during the prediction mode to generate a prediction of whether the use of the processing and storage resources to perform the hyperparameter tuning with the at least one set of hyperparameters generated from the at least one search space portion as an input will improve the success metric; and in response to a prediction that the success metric will be improved, perform operations comprising: use the processing and storage resources to perform the hyperparameter tuning with the at least one set of hyperparameters generated from the at least one search space portion as input to generate an output; evaluate the output to determine whether the success metric is improved; and further train, by machine learning, the generation model using the at least one set of hyperparameters and the evaluation of the output, as taught by Koch for the purpose of improvement the hyperparameter tuning process to increase the accuracy and reduce the error, (Koch, [Par.0240]).
Claims 4, 11, 18
Claim 2, however patent number 11809968 does not teach determine the accuracy of the prediction model based at least on the evaluation of the output
On the other hand, Koch teaches determine the accuracy of the prediction model based at least on the evaluation of the output,” (Koch, [Fig.17, [Par.0240]).
It would have been obvious to a person of ordinary skill in the arts at the time of the applicant's invention to modify the teachings of Patent No. 11809968 by incorporating the determine the accuracy of the prediction model based at least on the evaluation of the output as taught by Koch for the purpose of improvement the hyperparameter tuning process to increase the accuracy and reduce the error, (Koch, [Par.0240]).
Claims 5, 12, 19
Claims 2 and claim 3
Claims 6, 13, 20
However, the patent number 11809968 does not teach the claims 6, 13, 20 of the instant application 17173970.
As the patent number 11809968 does not teaches “The medium of claim 1, wherein: and the medium further stores instructions that cause the processor to begin the sequential selection of at least the subset of the multiple search space portions with the single search space portion that includes the starting point”
On the other hand, Koch teaches The medium of claim 1, wherein: and the medium further stores instructions that cause the processor to begin the sequential selection of at least the subset of the multiple search space portions with the single search space portion that includes the starting point (Koch, [Par.0155]).
And wherein the batch of sets of hyperparameters includes points that all exist within a same portion (Koch, [Par.0167-0169]).
It would have been obvious to a person of ordinary skill in the arts at the time of the applicant's invention to modify the teachings of Patent No. 11809968 by incorporating the medium further stores instructions that cause the processor to begin the sequential selection of at least the subset of the multiple search space portions with the single search space portion that includes the starting point, And wherein the batch of sets of hyperparameters includes points that all exist within a same portion as taught by Koch for the purpose of identifying/selecting the best set of hyperparameter during the tuning process, (Koch, [Par.0130]).
Claims 7, 14
Claims 3, 7
however patent number 11809968 does not teach andwherein the batch of sets of hyperparameters include points that are located within multiple portions.
On the other hand, Koch teaches and wherein the batch of sets of hyperparameters include points that are located within multiple portions (Koch, [Par.00155]. ).
It would have been obvious to a person of ordinary skill in the arts at the time of the applicant's invention to modify the teachings of Patent No. 11809968 by incorporating and wherein the batch of sets of hyperparameters include points that are located within multiple portions as taught by Koch for the purpose of identifying/selecting the best set of hyperparameter during the tuning process, (Koch, [Par.0130]).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 6, 7, 8, 13, 14, 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Koch et al. (Pub. No. US 20180240041– hereinafter, Koch) in view of over Wubbels et al. (Patent. No. US10599984– hereinafter, Wubbles).
Regarding claim 1, Koch teaches a non-transitory computer-readable medium storing instructions configured to cause a processor to (Koch, [Par.0006], “The computer-readable medium has instructions stored thereon that, when executed by the processor, cause the computing device to automatically select hyperparameter values based on objective criteria for training a predictive model.”):
receive, from a requesting device, a request to perform hyperparameter tuning of hyperparameters of an artificial intelligence (AI) model (Koch, [Par.0053], “Selection manager application 312 performs operations associated with selecting sets of hyperparameter configurations to evaluate for the model type based on inputs provided from user device 200. Selection manager application 312 requests that the computing devices of worker system 106 generate a model for each hyperparameter configuration in the selected sets of hyperparameter configurations.”);
divide a hyperparameter search space into multiple search space portions (Koch, [Par.0152], “In an operation 602, iteration manager 314 is instantiated. Iteration manager 314 determines a configuration list 322 that includes a set of hyperparameter configurations to evaluate as described further below. Again, a hyperparameter configuration includes a value for each hyperparameter to evaluate based on the selected model type. A set of hyperparameter configurations includes a plurality of hyperparameter configurations selected for evaluation before a next set of hyperparameter configurations is selected for evaluation based on the tuning search method(s) and the objective function values computed for each hyperparameter configuration.” Examiner’s note, the hyper-parameter searching system iteratively generates the hyperparameter tuning with plurality set of hyper-parameter, therefore, the hyper-parameter searching divides the searching into the multiple searching portions (Plurality set of hyperparameter). ),
wherein the request includes an indication of an initial set of hyperparameters that define a starting point within a specified search space portion of the multiple search space portions within the hyperparameter search space, the specified search space portion specified in the request (Koch, Par.0053], “Selection manager application 312 requests that the computing devices of worker system 106 generate a model for each hyperparameter configuration in the selected sets of hyperparameter configurations”. And further in [Par.0167-0169], “[0167], In an operation 624, the results are provided to iteration manager 314. Based on the baseline results and hyperparameters, iteration manager 314 determines a first set of hyperparameter configurations to evaluate in a first iteration. Again, each hyperparameter configuration includes a specific value for each hyperparameter based on the selected model type. For example, iteration manager 314 executes a first tuning search method of the tuning search method specified in operation 522…[0169] Referring to FIG. 6B, in an operation 628, each hyperparameter configuration is selected from configuration list 322 and assigned to a session. For example, if the model type is support vector machine, a first value for the penalty parameter C and a second value for the degree parameter is assigned as a pair to a session with different values for the pair assigned to different sessions. Iteration manager 314 defined the pair of values for each hyperparameter configuration included in configuration list 322.” Examiner’s note, Kock teaches, each of model is associated with a particular value of hyperparameters and the request to generate model including the selected hyperparameter set based on the iteratively tuning search method. Therefore, the request to select the hyperparameter set include determining the first set of hyperparameter value to train on the specific selected model type at the fist tuning search space of the multiple searching space, therefore, the first tunning searching step is included in the request to select the first set of hyperparameter to train on the specified selected model. Therefore, the first set of hyperparameter value is selected during the first tuning search (the specific search space portion specified in the request) method of the iteration tuning search methods, that correspond to the initial set of hyperparameters that define a starting point within a single specified search space portion of the multiple search space portions within the hyperparameter search space.);
train, using machine learning, a generation model to determine whether a search space portion is likely to provide a set of hyperparameters that improves a success metric by which success of the hyperparameter tuning is evaluated (Koch, [Par.0079], “For illustration, an “autotune” statement used with the FOREST procedure included in SAS® Visual Data Mining and Machine Learning 8.1 may be used to evaluate different hyperparameter configurations and to select the best configuration of hyperparameter values for the forest model type. A tuneForest action selects different hyperparameter configurations to run a forestTrain action and a forestScore action multiple times to train and validate the forest model as it searches for a model that has reduced validation error.” Examiner’s note, the machine learning model is trained to select the best hyperparameter value, that corresponds to determine whether a search space portion (hyperparameter selection) is likely to provide a set of hyperparameters that improves a success metric);
sequentially select at least a subset of the multiple search space portions (Koch, [Par.0167], “In an operation 624, the results are provided to iteration manager 314. Based on the baseline results and hyperparameters, iteration manager 314 determines a first set of hyperparameter configurations to evaluate in a first iteration. Again, each hyperparameter configuration includes a specific value for each hyperparameter based on the selected model type. For example, iteration manager 314 executes a first tuning search method of the tuning search method specified in operation 522. Alternatively, multiple tuning search methods may be executed concurrently such that iteration manager 314 executes each tuning search method to determine a set of hyperparameters that are combined to define the first set of hyperparameter configurations.” Examiner’s note, each set of hyperparameter is iteratively selected and generated by the tuning search method.),
wherein for each search space portion that is selected, the processor is caused to: generate at least one set of hyperparameters from the search space portion (Koch, [Par.0167], “In an operation 624, the results are provided to iteration manager 314. Based on the baseline results and hyperparameters, iteration manager 314 determines a first set of hyperparameter configurations to evaluate in a first iteration. Again, each hyperparameter configuration includes a specific value for each hyperparameter based on the selected model type. For example, iteration manager 314 executes a first tuning search method of the tuning search method specified in operation 522. Alternatively, multiple tuning search methods may be executed concurrently such that iteration manager 314 executes each tuning search method to determine a set of hyperparameters that are combined to define the first set of hyperparameter configurations.”, Examiner’s note, the first set of hyperparameter is generated by the model tuning application, wherein, the model tuning application is executed by the processor, [Par.0049], “ f FIG. 2, model tuning application 222 is implemented in software (comprised of computer-readable and/or computer-executable instructions) stored in computer-readable medium 208 and accessible by processor 210 for execution of the instructions that embody the operations of model tuning application 222” );
wherein, for each search space portion of the subset that is sequentially selected (Koch, [Par. 0098, [0234], “[0098], In an operation 518, a ninth indicator may be received that defines a cache tolerance value and a scaling factor value for each hyperparameter. The cache tolerance value is used to determine when a subsequent hyperparameter configuration is “close enough” to a previously executed configuration to not repeat execution with the subsequent hyperparameter configuration. “ and [0234], “Referring to FIG. 11, the results of applying the GA search method in subsequent iterations after the first iteration applied LHS are shown. Iteration 0 corresponds to default value 1000 that shows the objective function value using the default hyperparameter configuration, and iteration 1 corresponds to first iteration value 1002 that shows the best objective function value computed using the set of hyperparameter configurations defined using LHS. Subsequent symbols show the best objective value for subsequent iterations that used the GA search method to define the set of hyperparameter configurations.” Examiner’s note, each searching portion is sequential selected.) ;
the generation of at least one set of hyperparameters from the search space portion comprises generation of a batch of sets of hyperparameters comprising a predetermined quantity of sets of hyperparameters (Koch, [Par.0139-0142], “If the Bayesian search method is selected, a population size, an initial LHS size for the Kriging model, a maximum number of points in the Kriging model, a number of trial points when optimizing the Kriging model using LHS sampling at each iteration, and a maximum number of iterations may be specified where the population size defines the number of hyperparameter configurations to evaluate each iteration…[0143], For example, the following statements request creation of a gradient boosting tree model type with the listed target variable and input variables included in the input dataset “mycaslib.dmagecr” with the results stored in “mycaslib.mymodel” where the tuning evaluation parameters include a population size of 5, a maximum of three iterations, the ASE objective function, the default search method, with the “ntrees” hyperparameter selected between 10 and 50 with an initial value of 10, with the “vars_to_try” hyperparameter selected as 4, 8, 12, 16, Or 20 with an initial value of 4, and with the remaining hyperparameters for the gradient boosting tree model type using the default values” Examiner’s note, each set of hyperparameter for each iteration is associated with assigned population size and the iteration number for each batch these are corresponding to the quantity of set of hyperparameter).
perform the hyperparameter tuning with the at least one set of hyperparameters as an input to determine whether the at least one set of hyperparameters improved the success metric (Koch, [Par.0155], “specified for the GA tuning search method or the Bayesian search method because one hyperparameter configuration is carried forward each iteration. The best point is carried forward so that if the next iteration does not find an improvement, the returned set of evaluations still includes the current best for consideration in generating the next iteration of hyperparameter configurations. If the GSS tuning search method is selected, twice the number of hyperparameters is added to the value of n. For the LHS, Grid, or Random tuning search methods, n is determined as one less than a sample size.” Examiner’s note, the set of hyperparameter is carried forward to each iteration, to determine whether the process is providing any improvement or not (improved the success metric).);
based at least on the determination of whether the at least one set of hyperparameters improved the success metric, apply the generation model to determine whether the search space portion is likely to provide another set of hyperparameters that improves the success metric (Koch, [Par.0181], “n operation 652, the results for each hyperparameter configuration included in configuration list 322 is provided to iteration manager 314. Based on the results and the current tuning search method(s), iteration manager 314 determines a next set of hyperparameter configurations to evaluate in a next iteration. The best model hyperparameter configurations from the previous iteration are used to generate the next population of hyperparameter configurations to evaluate with the selected mode type. If multiple search methods are running concurrently, the results from all of the hyperparameter configurations include in configuration list 322 as part of the current iteration are used to determine the next population irrespective of whether or not that search method requested evaluation of a specific hyperparameter configuration. This process is repeated for remaining iterations based on the search method(s) chosen. In this manner, a search method gains information based on one or more hyperparameter configurations generated by another search method.” Examiner’s note, the selected model type is literately generated based on current set of the best hyperparameter to find the next best hyperparameter.);
However, Koch does not teach and rule out the search space portion from providing further sets of hyperparameters in response to a determination that the search space portion is unlikely to provide another set of hyperparameters that improves the success metric;
And terminate the performance of the hyperparameter tuning when all search space portions of the multiple search space portions are ruled out from providing further sets of hyperparameters.
On the other hand, Wubbels teaches and rule out the search space portion from providing further sets of hyperparameters in response to a determination that the search space portion is unlikely to provide another set of hyperparameters that improves the success metric (Wubbels,. [Col,.7, line 21-45], “In addition, during training, the training module 410 can also select the appropriate hyperparameters for the machine learning model 420. In machine learning and for purposes of discussion here, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters (e.g., weights in a model) are derived via training. The hyperparameters can indicate the number of layers 500, 510, 520, 530 contained in the machine learning model 420, and a number of neurons 540, 550 contained in each layer 500, 510, 520, 530. In other words, the number of layers can represent one hyperparameter, and the number of neurons per layer can represent another hyperparameter independent of the first hyperparameter. To select the hyperparameters, the training module 410 can create multiple models with various hyperparameters. Each model can have varying number of layers 500, 510, 520, 530 and varying number of neurons 540, 550 contained in each layer 500, 510, 520, 530. Consequently, the multiple models can vary in accuracy and/or latency. The training module 410 can train the multiple models on the same inputs, and measure the performance of the multiple models at the end of the training.” And [Col.11, lines 23-48, “The appropriate optimization technique can include: identifying the optimal checkpoint from which a model should be created, tuning hyperparameters used in training, evaluating performance gains produced by model transformation methodologies like ensembling and/or co-distilling. When one or more of the above optimization techniques have been used, the process of generating the machine learning model can become faster because less processor power, and memory is necessary in generating a deployable machine learning model. Using either hyperparameter tuning or co-distillation (or both) can reduce the overall size of a generated model. Due to the smaller size the time of inference is reduced. These techniques can thus decrease the latency of diagnoses when a model is deployed. In a similar vein, using either ensembling or optimal-checkpoint selection (or both) can improve the accuracy of the generated model. Optimal checkpoint selection ensures a single model is achieving the highest possible accuracy. Ensembling gives insight into how the accuracy of multiple models combined improves with the number of models used in an ensemble. Optimal checkpoint selection can also reduce training time if used to distinguish a ‘stopping point’ for model training: rather than training for a fixed number of steps, a model can stop training as soon as its accuracy stops improving. Combining these techniques (for example, co-distilling using an ensembled model as a teacher) allows for the generation of a model that is both highly accurate and fast.” Examiner’s note, the hyperparameter is tunning that is used in the training, when the accuracy of the training is stop improving, then model is stop training ( stop tunning/find the hyperparameter), therefore, the process of selecting/finding the hyperparameter set to train is stopped or rule out when the training does not provide a better result.) ;
and terminate the performance of the hyperparameter tuning when all search space portions of the multiple search space portions are ruled out from providing further sets of hyperparameters ((Wubbels,. [Col,.7, line 21-45], “In addition, during training, the training module 410 can also select the appropriate hyperparameters for the machine learning model 420. In machine learning and for purposes of discussion here, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters (e.g., weights in a model) are derived via training. The hyperparameters can indicate the number of layers 500, 510, 520, 530 contained in the machine learning model 420, and a number of neurons 540, 550 contained in each layer 500, 510, 520, 530. In other words, the number of layers can represent one hyperparameter, and the number of neurons per layer can represent another hyperparameter independent of the first hyperparameter. To select the hyperparameters, the training module 410 can create multiple models with various hyperparameters. Each model can have varying number of layers 500, 510, 520, 530 and varying number of neurons 540, 550 contained in each layer 500, 510, 520, 530. Consequently, the multiple models can vary in accuracy and/or latency. The training module 410 can train the multiple models on the same inputs, and measure the performance of the multiple models at the end of the training.” And [Col.11, lines 23-48, “The appropriate optimization technique can include: identifying the optimal checkpoint from which a model should be created, tuning hyperparameters used in training, evaluating performance gains produced by model transformation methodologies like ensembling and/or co-distilling. When one or more of the above optimization techniques have been used, the process of generating the machine learning model can become faster because less processor power, and memory is necessary in generating a deployable machine learning model. Using either hyperparameter tuning or co-distillation (or both) can reduce the overall size of a generated model. Due to the smaller size the time of inference is reduced. These techniques can thus decrease the latency of diagnoses when a model is deployed. In a similar vein, using either ensembling or optimal-checkpoint selection (or both) can improve the accuracy of the generated model. Optimal checkpoint selection ensures a single model is achieving the highest possible accuracy. Ensembling gives insight into how the accuracy of multiple models combined improves with the number of models used in an ensemble. Optimal checkpoint selection can also reduce training time if used to distinguish a ‘stopping point’ for model training: rather than training for a fixed number of steps, a model can stop training as soon as its accuracy stops improving. Combining these techniques (for example, co-distilling using an ensembled model as a teacher) allows for the generation of a model that is both highly accurate and fast.” Examiner’s note, the hyperparameter is tunning that is used in the training, when the accuracy of the training is stop improving, then model is stop training ( stop tunning/find the hyperparameter), therefore, the hyperparameter tunning is stop.).
Koch and Wubbels are analogous in arts because they have the same filed of endeavor of the perform the hyperparameter tuning process based on the training of the machine learning model.
Accordingly, it would have been obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the receive, from a requesting device, a request to perform hyperparameter tuning of hyperparameters of an artificial intelligence (AI) model; divide a hyperparameter search space into multiple search space portions, wherein the request includes an indication of an initial set of hyperparameters that define a starting point within a single search space portion of the multiple search space portions within the hyperparameter search space; train, using machine learning, a generation model to determine whether a search space portion is likely to provide a set of hyperparameters that improves a success metric by which success of the hyperparameter tuning is evaluated; sequentially select at least a subset of the multiple search space portions, wherein for each search space portion that is selected, the processor is caused to:generate at least one set of hyperparameters from the search space portion,wherein, for each search space portion of the subset that is sequentially selected the generation of at least one set of hyperparameters from the search space portion comprises generation of a batch of sets of hyperparameters comprising a predetermined quantity of sets of hyperparameters, as taught by Koch, to include the rule out the search space portion from providing further sets of hyperparameters in response to a determination that the search space portion is unlikely to provide another set of hyperparameters that improves the success metric; and terminate the performance of the hyperparameter tuning when all search space portions of the multiple search space portions are ruled out from providing further sets of hyperparameter, as taught by Wubbles. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the performance of the machine learning, (Wubbels, [Col.10, lines 60-67], “Co-distilling is related to assembling. Co-distilling is a technique to improve the performance of the machine learning model 900 by training the machine learning model 900 on the inference of a more computationally expensive machine learning model 910, such as an ensembled machine learning model. Co-distilling is an attempt to achieve the same high model performance of the more computationally expensive machine learning model 910, but without requiring the intensive compute resources. The more computationally expensive model 910 can be thought of as a teacher model.”).
Regarding claim 6, Koch teaches the medium of claim 1, wherein: and the medium further stores instructions that cause the processor to begin the sequential selection of at least the subset of the multiple search space portions with the single search space portion that includes the starting point (Koch, [Par.0155], “may be determined as either one less than a population size specified for the GA tuning search method or the Bayesian search method because one hyperparameter configuration is carried forward each iteration. The best point is carried forward so that if the next iteration does not find an improvement, the returned set of evaluations still includes the current best for consideration in generating the next iteration of hyperparameter configurations. If the GSS tuning search method is selected, twice the number of hyperparameters is added to the value of n.” Examiner’s note, the best first set of selected hyperparameters are carried forward to the next searching, therefore, next iteration of multiple searching includes the first selected set of hyperparameter (starting point).).
And wherein the batch of sets of hyperparameters includes points that all exist within a same portion (Koch, [Par.0167-0169], ““In an operation 624, the results are provided to iteration manager 314. Based on the baseline results and hyperparameters, iteration manager 314 determines a first set of hyperparameter configurations to evaluate in a first iteration. Again, each hyperparameter configuration includes a specific value for each hyperparameter based on the selected model type. For example, iteration manager 314 executes a first tuning search method of the tuning search method specified in operation 522. Alternatively, multiple tuning search methods may be executed concurrently such that iteration manager 314 executes each tuning search method to determine a set of hyperparameters that are combined to define the first set of hyperparameter configurations. For illustration, the LHS, the Random, and/or the Grid search methods may be used in a first iteration to define the first set of hyperparameter configurations that sample the search space. The initial configuration list 322 is also called a “population” Examiner’s note, the first set of the hyperparameter including the specific value for the first iteration, that portion includes the value of that portion.).
Regarding claim 7, Koch teaches the medium of claim 1, wherein, for each search space portion of the subset that is sequentially selected (Koch, [Par.0167], “In an operation 624, the results are provided to iteration manager 314. Based on the baseline results and hyperparameters, iteration manager 314 determines a first set of hyperparameter configurations to evaluate in a first iteration. Again, each hyperparameter configuration includes a specific value for each hyperparameter based on the selected model type. For example, iteration manager 314 executes a first tuning search method of the tuning search method specified in operation 522. Alternatively, multiple tuning search methods may be executed concurrently such that iteration manager 314 executes each tuning search method to determine a set of hyperparameters that are combined to define the first set of hyperparameter configurations.” Examiner’s note, each set of hyperparameter is iteratively selected and generated by the tuning search method.):
the performance of hyperparameter tuning with the at least one set of hyperparameters as an input comprises the performance of the hyperparameter tuning with each set of hyperparameters of the batch of sets of hyperparameters (Koch, [Par.0181], “In operation 652, the results for each hyperparameter configuration included in configuration list 322 is pro-vided to iteration manager 314. Based on the results and the current tuning search method(s), iteration manager 314 determines a next set of hyperparameter configurations to evaluate in a next iteration. The best model hyperparameter configurations from the previous iteration are used to generate the next population of hyperparameter configurations to evaluate with the selected mode type. If multiple search methods are running concurrently, the results from all of the hyperparameter configurations include in configuration list 322 as part of the current iteration are used to determine the next population irrespective of whether or not that search method requested evaluation of a specific hyperparameter configuration. This process is repeated for remaining iterations based on the search method(s) chosen.” Examiner’s note, each set of hyperparameter is selected in each batch/iteration, the best selected hyperparameter is carried forward as the input to the next batch.);
and the application of the generation model to determine whether the search space portion is likely to provide another set of hyperparameters that improves the success metric comprises an evaluation of each set of hyperparameters of the batch of sets of (Koch, [Par.0155], “may be determined as either one less than a population size specified for the GA tuning search method or the Bayesian search method because one hyperparameter configuration is carried forward each iteration. The best point is carried forward so that if the next iteration does not find an improvement, the returned set of evaluations still includes the current best for consideration in generating the next iteration of hyperparameter configurations. If the GSS tuning search method is selected, twice the number of hyperparameters is added to the value of n.) Examiner’s note, the best set of hyperparameter is selected in each iteration/batch and carried forward, therefore, if the improvement is not found in next batch, then the current best hyperparameter value is still keeping.).
and wherein the batch of sets of hyperparameters include points that are located within multiple portions (Koch, [Par.00155]. “The best point is carried forward so that if the next iteration does not find an improvement, the returned set of evaluations still includes the current best for consideration in generating the next iteration of hyperparameter configurations. If the GSS tuning search method is selected, twice the number of hyperparameters is added to the value of n. For the LHS, Grid, or Random tuning search methods, n is determined as one less than a sample size. n may then be limited by a configuration of selection manager device 104. When selection manager device 104 is configured in single-machine mode and n is greater than four and not specified by the second indicator, n is limited to four. When selection manager device 104 is configured in single-machine mode and n is specified by the second indicator, n may be limited to 32 or a number of threads of selection manager device 104. When selection manager device 104 is configured in distributed mode, and n is not specified by the second indicator, n≤W/P may be used. When selection manager device 104 is configured in distributed mode and n is specified by the second indicator, n≤2W/P may be applied.” Examiner’s note, the best point is carried to the next iteration from the current iteration, therefore, the best point is located in multiple portion).
Regarding claim 8, is rejected for the same reason as the claim 1, since these claims recite the same limitations.
Regarding claim 13, is rejected for the same reason as the claim 6, since these claims recite the same limitations.
Regarding claim 14, is rejected for the same reason as the claim 7, since these claims recite the same limitations.
Regarding claim 15, is rejected for the same reason as the claim 1, since these claims recite the same limitations.
Regarding claim 20, is rejected for the same reason as the claim 6, since these claims recite the same limitations.
Claims 2, 3, 9, 10, 16, 17 are rejected under 35 U.S.C. 103 as being unpatentable over Koch et al. (Pub. No. US 20180240041– hereinafter, Koch) in view of Wubbles et al. (Patent. No. US 10599984– hereinafter, Wubbles) and further in view of Ganu et al. (Patent. No. US 10380236– hereinafter, Ganu) and further in view of Convertino et al. (Pub. No. US 20200097847-hereinafter, Convertino).
Regarding claim 2, Koch teaches the medium of claim 1, wherein: the performance of the hyperparameter tuning comprises use of processing and storage resources to instantiate an instance of the Al model with a set of hyperparameters from among the at least one set of hyperparameters (Koch, [Par.0005], “For each session of the plurality of sessions, a hyperparameter configuration is assigned to the session of the plurality of sessions, training of a model of the model type by the session computing devices allocated to the session is requested; scoring of the trained model by the session computing devices allocated to the session is requested to compute an objective function value, the computed objective function value is received when the requested scoring is complete, and the received objective function value and the assigned hyperparameter configuration are stored. The model is trained using the assigned hyperparameter configuration and a training dataset that is a first portion of the input dataset. The trained model is scored using the assigned hyperparameter configuration and a validation dataset that is a second portion of the input dataset. A best hyperparameter configuration is identified based on an extreme value of the stored objective function values. The identified best hyperparameter configuration is output.”) ,
to train the instance with training data (Koch, [Par.0005], “For each session of the plurality of sessions, a hyperparameter configuration is assigned to the session of the plurality of sessions, training of a model of the model type by the session computing devices allocated to the session is requested; scoring of the trained model by the session computing devices allocated to the session is requested to compute an objective function value, the computed objective function value is received when the requested scoring is complete, and the received objective function value and the assigned hyperparameter configuration are stored. The model is trained using the assigned hyperparameter configuration and a training dataset that is a first portion of the input dataset.”),
and the medium further stores instructions that cause the processor to: train, using machine learning (Koch, [Par.0149], “In an operation 534, the selected hyperparameters may be used to train the selected model type for a second dataset 1824 (shown referring to FIG. 18). In addition or in the alternative, the selected hyperparameters may be used to score second dataset 1824 with selected model data 320.”),
a prediction model during a training mode to determine whether continuing the performance of hyperparameter tuning will cause an improvement in the success metric (Koch, [Par.0155], “in may be determined as either one less than a population size specified for the GA tuning search method or the Bayesian search method because one hyperparameter configuration is carried forward each iteration. The best point is carried forward so that if the next iteration does not find an improvement, the returned set of evaluations still includes the current best for consideration in generating the next iteration of hyperparameter configurations.” Examiner’s note, the machine learning model is iteratively trained to determine whether the next iteration provide any improvement.);
and after the training of the prediction model during the training mode, perform operations comprising: based at least on the evaluation of whether the set of hyperparameters improved the success metric (Koch, [Par.0155], “in may be determined as either one less than a population size specified for the GA tuning search method or the Bayesian search method because one hyperparameter configuration is carried forward each iteration. The best point is carried forward so that if the next iteration does not find an improvement, the returned set of evaluations still includes the current best for consideration in generating the next iteration of hyperparameter configurations.”),
apply the prediction model during a prediction mode to determine whether continuing the performance of hyperparameter tuning will cause an improvement in the success metric (Koch, [Par.0181], “n operation 652, the results for each hyperparameter configuration included in configuration list 322 is provided to iteration manager 314. Based on the results and the current tuning search method(s), iteration manager 314 determines a next set of hyperparameter configurations to evaluate in a next iteration. The best model hyperparameter configurations from the previous iteration are used to generate the next population of hyperparameter configurations to evaluate with the selected mode type. If multiple search methods are running concurrently, the results from all of the hyperparameter configurations include in configuration list 322 as part of the current iteration are used to determine the next population irrespective of whether or not that search method requested evaluation of a specific hyperparameter configuration. This process is repeated for remaining iterations based on the search method(s) chosen. In this manner, a search method gains information based on one or more hyperparameter configurations generated by another search method.” Examiner’s note, select the best hyperparameter and carried forward to the next iteration to determine if the hyperparameter tuning is continually improving the success metric);
However, Koch does not teach and to test the instance with testing data to test the set of hyperparameters to determine whether the set of hyperparameters improves the success metric, and terminate the performance of hyperparameter tuning in response to: an accuracy of the prediction model in predicting improvement in the success metric is below a predetermined low accuracy threshold, and none of the sets of hyperparameters of the at least one set of hyperparameters that has been tested has yet improved the success metric to meet the criteria threshold; or the accuracy of the prediction model is above a predetermined high accuracy threshold and a determination that continuing the performance of hyperparameter tuning will not cause an improvement in the success metric.
On the other hand, Ganu teaches terminate the performance of hyperparameter tuning in response to the accuracy of the prediction model is above a predetermined high accuracy threshold (Ganu, [Co. 12, lines 53-67 and Col.13, lines 1-5], “Again, the training error of the model is monitored across each iteration. The process progressively reduces the complexity of the model using the dropout parameter, and continues until the model is no longer able to improve against the synthetic data set. Because the synthetic data set contain randomized truth labels, little reasoning may be learned from this data. Thus, any improvement that is seen against the synthetic data may be assumed to be generated from memorization. Accordingly, if the training error indicates that the model is performing better than a naïve model that simply randomly predicts the output label based on label proportions, the process lowers the complexity of the model using the dropout parameter, and performs another tuning iteration. When the training error indicates that the model is performing no better than the naïve model, the process may stop, as the training error indicates that the model is at a complexity level where it is no longer able to memorize the training data.” )),
and a determination that continuing the performance of hyperparameter tuning will not cause an improvement in the success metric (Ganu, [Co. 12, lines 53-67 and Col.13, lines 1-5], “Again, the training error of the model is monitored across each iteration. The process progressively reduces the complexity of the model using the dropout parameter, and continues until the model is no longer able to improve against the synthetic data set. Because the synthetic data set contain randomized truth labels, little reasoning may be learned from this data. Thus, any improvement that is seen against the synthetic data may be assumed to be generated from memorization. Accordingly, if the training error indicates that the model is performing better than a naïve model that simply randomly predicts the output label based on label proportions, the process lowers the complexity of the model using the dropout parameter, and performs another tuning iteration. When the training error indicates that the model is performing no better than the naïve model, the process may stop, as the training error indicates that the model is at a complexity level where it is no longer able to memorize the training data.” Examiner’s note, the process is stopped since the continue training will not provide any improvement. ).
Koch, Wubbles and Ganu are analogous in arts because they have the same filed of endeavor of the perform the hyperparameter/parameter tuning process based on the training of the machine learning model.
Accordingly, it would have been obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the performance of the hyperparameter tuning comprises use of processing and storage resources to instantiate an instance of the Al model with a set of hyperparameters from among the at least one set of hyperparameters to train the instance with training data, and the medium further stores instructions that cause the processor to: train, using machine learning, a prediction model during a training mode to determine whether continuing the performance of hyperparameter tuning will cause an improvement in the success metric and after the training of the prediction model during the training mode, perform operations comprising: based at least on the evaluation of whether the set of hyperparameters improved the success metric apply the prediction model during a prediction mode to determine whether continuing the performance of hyperparameter tuning will cause an improvement in the success metric, as taught by Koch, to include the terminate the performance of hyperparameter tuning in response to the accuracy of the prediction model is above a predetermined high accuracy threshold and a determination that continuing the performance of hyperparameter tuning will not cause an improvement in the success metric, as taught by Ganu. The modification would have been obvious because one of the ordinary skills in art would be improve the training accuracy(Ganu, [Col. 12, lines 15-25], “In some situations, a deep learning model of high com-plexity may improve its accuracy on a training data set through two mechanisms. First, the model may improve by learning the task at hand through higher level features, which is what is generally desired. Second, the model may improve by simply memorizing the training data, which does not result in any real "learning." Deep neural networks used in practice can memorize training datasets especially when the number of model parameters is of the same order as the number of data points.”).
However, Koch, Wubbles and Ganu do not teach and to test the instance with testing data to test the set of hyperparameters to determine whether the set of hyperparameters improves the success metric,
On the other hand, Convertino teaches and to test the instance with testing data to test the set of hyperparameters to determine whether the set of hyperparameters improves the success metric (Convertino, Par.0042], “The type of ML model selected will depend on characteristics of the data as well as the defined goals of the ML model. Example ML model types include neural networks, clustering models, decision trees, random forest classifiers, etc. Once an ML model type is selected, the ML model is built (step 108), for example, by setting values for one or more hyperparameters, and then validated (step 110), for example, by training and testing the model using at least some of the available data. If not satisfied with the results of the ML validation process, the developer of the ML model can iteratively return to any of the previous steps, for example, to modify the available data, chose a different ML type, or set different hyperparameters values. Finally, once the ML model developer is satisfied with the results generated by the ML model, the ML model can be deployed and used (step 112), for example, by sharing with other data analysts and domain experts or by embedding the ML model into a business process.” Examiner’s note, the testing process is generated on the available data (testing data)).
Koch, Wubbles, Ganu and Convertino are analogous in arts because they have the same filed of endeavor of the perform the hyperparameter/parameter tuning process based on the training of the machine learning model.
Accordingly, it would have been obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the combined teaching of Koch and Ganu of the performance of the hyperparameter tuning comprises use of processing and storage resources to instantiate an instance of the Al model with a set of hyperparameters from among the at least one set of hyperparameters to train the instance with training data, and the medium further stores instructions that cause the processor to: train, using machine learning, a prediction model during a training mode to determine whether continuing the performance of hyperparameter tuning will cause an improvement in the success metric and after the training of the prediction model during the training mode, perform operations comprising: based at least on the evaluation of whether the set of hyperparameters improved the success metric apply the prediction model during a prediction mode to determine whether continuing the performance of hyperparameter tuning will cause an improvement in the success metric, and the terminate the performance of hyperparameter tuning in response to the accuracy of the prediction model is above a predetermined high accuracy threshold and a determination that continuing the performance of hyperparameter tuning will not cause an improvement in the success metric, as set forth above, to include the testing the instance with testing data to test the set of hyperparameters to determine whether the set of hyperparameters improves the success metric, as taught by Convertino. The modification would have been obvious because one of the ordinary skills in art would be motivated to select the best set of hyperparameter to improve the performance (Convertino, [Par.0039], “Hyperparameters, such as the number of layers or the dropout rate, can dramatically affect the performance of ML models. Hyperparameter tuning, to improve performance, is therefore critical to the successful implementation of ML models. To configure an ML model to work well in practice, hyperparameters should be tuned when training the model.”).
Regarding claim 3, Koch teaches the medium of claim 2, further storing instructions that cause the processor, for at least one search space portion of the subset that is sequentially selected, to (Koch, [Par.0167], “In an operation 624, the results are provided to iteration manager 314. Based on the baseline results and hyperparameters, iteration manager 314 determines a first set of hyperparameter configurations to evaluate in a first iteration. Again, each hyperparameter configuration includes a specific value for each hyperparameter based on the selected model type. For example, iteration manager 314 executes a first tuning search method of the tuning search method specified in operation 522. Alternatively, multiple tuning search methods may be executed concurrently such that iteration manager 314 executes each tuning search method to determine a set of hyperparameters that are combined to define the first set of hyperparameter configurations.” Examiner’s note, each set of hyperparameter is iteratively selected and generated by the tuning search method.),
apply the prediction model during the prediction mode to generate a prediction of whether the use of the processing and storage resources (Koch, [Par.0005], “For each session of the plurality of sessions, a hyperparameter configuration is assigned to the session of the plurality of sessions, training of a model of the model type by the session computing devices allocated to the session is requested; scoring of the trained model by the session computing devices allocated to the session is requested to compute an objective function value, the computed objective function value is received when the requested scoring is complete, and the received objective function value and the assigned hyperparameter configuration are stored. The model is trained using the assigned hyperparameter configuration and a training dataset that is a first portion of the input dataset. The trained model is scored using the assigned hyperparameter configuration and a validation dataset that is a second portion of the input dataset. A best hyperparameter configuration is identified based on an extreme value of the stored objective function values. The identified best hyperparameter configuration is output.”)
to perform the hyperparameter tuning with the at least one set of hyperparameters generated from the at least one search space portion as an input will improve the success metric (Koch, [Fig.17, [Par.0240], “Referring to FIG. 17, a model improvement (error reduction or accuracy increase where higher is better) for the suite of ten common machine learning test problems illustrated in FIG. 16 is shown. A sixth curve 1700 shows results for each dataset using the decision tree model type. A seventh curve 1702 shows results for each dataset using the support vector machine model type. An eighth curve 1704 shows results for each dataset using the neural network model type. A ninth curve 1706 shows results for each dataset using the forest model type. A tenth curve 1708 shows results for each dataset using the gradient boosting model type. Overall, the benchmark results, when averaged across all datasets, was 8.53% average improvement for the neural network model type, was 8.45% average improvement for the support vector machine model type, was 6.25% average improvement for the decision tree model type, was 2.09% average improvement for the forest model type, and was 8.45% average improvement for the gradient boosting tree model type using hyperparameter selection system 100.” Examiner’s note, the result is improved through the iterative training based on the different models.);
and in response to a prediction that the success metric will be improved, perform operations comprising: use the processing and storage resources to perform the hyperparameter tuning with the at least one set of hyperparameters generated from the at least one search space portion as input to generate an output (Koch, [Fig.17, [Par.0240], “Referring to FIG. 17, a model improvement (error reduction or accuracy increase where higher is better) for the suite of ten common machine learning test problems illustrated in FIG. 16 is shown. A sixth curve 1700 shows results for each dataset using the decision tree model type. A seventh curve 1702 shows results for each dataset using the support vector machine model type. An eighth curve 1704 shows results for each dataset using the neural network model type. A ninth curve 1706 shows results for each dataset using the forest model type. A tenth curve 1708 shows results for each dataset using the gradient boosting model type. Overall, the benchmark results, when averaged across all datasets, was 8.53% average improvement for the neural network model type, was 8.45% average improvement for the support vector machine model type, was 6.25% average improvement for the decision tree model type, was 2.09% average improvement for the forest model type, and was 8.45% average improvement for the gradient boosting tree model type using hyperparameter selection system 100.”);
evaluate the output to determine whether the success metric is improved (Koch, [Par.0005], “The trained model is scored using the assigned hyperparameter configuration and a validation dataset that is a second portion of the input dataset. A best hyperparameter configuration is identified based on an extreme value of the stored objective function values. The identified best hyperparameter configuration is output.”, Examiner’s note, the best hyperparameter is evaluated based on the output);
and further train, by machine learning, the generation model using the at least one set of hyperparameters and the evaluation of the output (Koch, [Fig.17, [Par.0240], “Referring to FIG. 17, a model improvement (error reduction or accuracy increase where higher is better) for the suite of ten common machine learning test problems illustrated in FIG. 16 is shown. A sixth curve 1700 shows results for each dataset using the decision tree model type. A seventh curve 1702 shows results for each dataset using the support vector machine model type. An eighth curve 1704 shows results for each dataset using the neural network model type. A ninth curve 1706 shows results for each dataset using the forest model type. A tenth curve 1708 shows results for each dataset using the gradient boosting model type. Overall, the benchmark results, when averaged across all datasets, was 8.53% average improvement for the neural network model type, was 8.45% average improvement for the support vector machine model type, was 6.25% average improvement for the decision tree model type, was 2.09% average improvement for the forest model type, and was 8.45% average improvement for the gradient boosting tree model type using hyperparameter selection system 100.”) .
Regarding claim 9, is rejected for the same reason as the claim 2, since these claims recite the same limitations.
Regarding claim 10, is rejected for the same reason as the claim 3, since these claims recite the same limitations.
Regarding claim 16, is rejected for the same reason as the claim 2, since these claims recite the same limitations.
Regarding claim 17, is rejected for the same reason as the claim 3, since these claims recite the same limitations.
Claims 4, 11, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Koch et al. (Pub. No. US 20180240041– hereinafter, Koch) in view of Wubbles et al. (Patent. No. US 10599984– hereinafter, Wubbles) and further in view of Naumov et al. (Patent. No. US 11468313– hereinafter, Naumov).
Regarding claim 4, Koch teaches the medium of claim 3, further storing instructions that cause the processor, in response to the prediction that the success metric will be improved, to (Koch, [Fig.17, [Par.0240], “Referring to FIG. 17, a model improvement (error reduction or accuracy increase where higher is better) for the suite of ten common machine learning test problems illustrated in FIG. 16 is shown. A sixth curve 1700 shows results for each dataset using the decision tree model type. A seventh curve 1702 shows results for each dataset using the support vector machine model type. An eighth curve 1704 shows results for each dataset using the neural network model type. A ninth curve 1706 shows results for each dataset using the forest model type. A tenth curve 1708 shows results for each dataset using the gradient boosting model type. Overall, the benchmark results, when averaged across all datasets, was 8.53% average improvement for the neural network model type, was 8.45% average improvement for the support vector machine model type, was 6.25% average improvement for the decision tree model type, was 2.09% average improvement for the forest model type, and was 8.45% average improvement for the gradient boosting tree model type using hyperparameter selection system 100.” Examiner’s note, the results show improvement of each model type):
determine the accuracy of the prediction model based at least on the evaluation of the output (Koch, [Fig.17, [Par.0240], “Referring to FIG. 17, a model improvement (error reduction or accuracy increase where higher is better) for the suite of ten common machine learning test problems illustrated in FIG. 16 is shown. A sixth curve 1700 shows results for each dataset using the decision tree model type. A seventh curve 1702 shows results for each dataset using the support vector machine model type. An eighth curve 1704 shows results for each dataset using the neural network model type. A ninth curve 1706 shows results for each dataset using the forest model type. A tenth curve 1708 shows results for each dataset using the gradient boosting model type. Overall, the benchmark results, when averaged across all datasets, was 8.53% average improvement for the neural network model type, was 8.45% average improvement for the support vector machine model type, was 6.25% average improvement for the decision tree model type, was 2.09% average improvement for the forest model type, and was 8.45% average improvement for the gradient boosting tree model type using hyperparameter selection system 100.” Examiner’s note, the output of each model type show the improvement rate);
However, Koch does not teach and further train the prediction model, in a return to the training mode from the prediction mode, and using machine learning, based on whether the accuracy of the prediction model is below a prediction training accuracy threshold,
On the other hand, Naumov teaches and further train the prediction model, in a return to the training mode from the prediction mode, and using machine learning, based on whether the accuracy of the prediction model is below a prediction training accuracy threshold (Naumov, [Col. 13, lines 43-58], “In some embodiments, one or more of modules 102 (e.g., training module 106) may adjust the amplitude factor based on a measured and/or anticipated predictive accuracy of a trained artificial neural network. For example, training module 106 may train (e.g., begin a training process) artificial neural network 140 while using a periodic regularization function having an amplitude factor having a first amplitude factor value (e.g., 1). This may result in at least a portion of trained artificial neural network 212 having a first anticipated predictive accuracy (e.g., 30%). Training module 106 may then, based on this first predictive accuracy being below a predetermined threshold (e.g., 75%) train (e.g., re-train, continue to train, etc.) artificial neural network 140 by adjusting the amplitude factor included in the periodic regularization function to a second amplitude factor value (e.g., 0.001).” ).
Koch, Wubbles and Naumov are analogous in arts because they have the same filed of endeavor of the performing to continue training the machine learning model the hyperparameter tuning process based on the training of the machine learning model.
Accordingly, it would have been obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the processor, in response to the prediction that the success metric will be improved, to: determine the accuracy of the prediction model based at least on the evaluation of the output, as taught by Koch, to include the further train the prediction model, in a return to the training mode from the prediction mode, and using machine learning, based on whether the accuracy of the prediction model is below a prediction training accuracy threshold, as taught by Naumov. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the prediction accuracy (Naumov, [Col. 13, lines 43-58], “In some embodiments, one or more of modules 102 (e.g., training module 106) may adjust the amplitude factor based on a measured and/or anticipated predictive accuracy of a trained artificial neural network. For example, training module 106 may train (e.g., begin a training process) artificial neural network 140 while using a periodic regularization function having an amplitude factor having a first amplitude factor value (e.g., 1). This may result in at least a portion of trained artificial neural network 212 having a first anticipated predictive accuracy (e.g., 30%). Training module 106 may then, based on this first predictive accuracy being below a predetermined threshold (e.g., 75%) train (e.g., re-train, continue to train, etc.) artificial neural network 140 by adjusting the amplitude factor included in the periodic regularization function to a second amplitude factor value (e.g., 0.001).”).
Regarding claim 11 is rejected for the same reason as the claim 4, since these claims recite the same limitations.
Regarding claim 18 is rejected for the same reason as the claim 4, since these claims recite the same limitations.
Claims 5, 12, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Koch et al. (Pub. No. US 20180240041– hereinafter, Koch) in view of Wubbles et al. (Patent. No. US 10599984– hereinafter, Wubbles) and further in view of Hu et al. (Pub. No. US 20190188530 -hereinafter, Hu).
Regarding claim 5, Koch as modified in view of Wubbles teaches the medium of claim 3, further storing instructions that cause the processor, in response to a prediction that the success metric will not be improved (Col. 11, lines 55-68, col 12, lines 1-15], “The appropriate optimization technique can include: identifying the optimal checkpoint from which a model should be created, tuning hyperparameters used in training, evaluating performance gains produced by model transformation methodologies like ensembling and/or co-distilling…Using either hyperparameter tuning or co-distillation (or both) can reduce the overall size of a generated model. Due to the smaller size the time of inference is reduced. These techniques can thus decrease the latency of diagnoses when a model is deployed. In a similar vein, using either ensembling or optimal-checkpoint selection (or both) can improve the accuracy of the generated model. Optimal checkpoint selection ensures a single model is achieving the highest possible accuracy. Ensembling gives insight into how the accuracy of multiple models combined improves with the number of models used in an ensemble. Optimal checkpoint selection can also reduce training time if used to distinguish a ‘stopping point’ for model training: rather than training for a fixed number of steps, a model can stop training as soon as its accuracy stops improving. Combining these techniques (for example, co-distilling using an ensembled model as a teacher) allows for the generation of a model that is both highly accurate and fast.” Examiner’s note, tuning hyperparameters used in training, the training or the hyperparameter tunning is stopped when the training accuracy is stop improving.),
the generation model using the at least one set of hyperparameters and the prediction that the success metric will not be improved (Wubbles, [Col.7, lines 54-67], “In addition, during training, the training module 410 can also select the appropriate hyperparameters for the machine learning model 420. In machine learning and for purposes of discussion here, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters (e.g., weights in a model) are derived via training. The hyperparameters can indicate the number of layers 500, 510, 520, 530 contained in the machine learning model 420, and a number of neurons 540, 550 contained in each layer 500, 510, 520, 530. In other words, the number of layers can represent one hyperparameter, and the number of neurons per layer can represent another hyperparameter independent of the first hyperparameter.” And (Col. 11, lines 55-68, col 12, lines 1-15], “The appropriate optimization technique can include: identifying the optimal checkpoint from which a model should be created, tuning hyperparameters used in training, evaluating performance gains produced by model transformation methodologies like ensembling and/or co-distilling…Using either hyperparameter tuning or co-distillation (or both) can reduce the overall size of a generated model. Due to the smaller size the time of inference is reduced. These techniques can thus decrease the latency of diagnoses when a model is deployed. In a similar vein, using either ensembling or optimal-checkpoint selection (or both) can improve the accuracy of the generated model. Optimal checkpoint selection ensures a single model is achieving the highest possible accuracy. Ensembling gives insight into how the accuracy of multiple models combined improves with the number of models used in an ensemble. Optimal checkpoint selection can also reduce training time if used to distinguish a ‘stopping point’ for model training: rather than training for a fixed number of steps, a model can stop training as soon as its accuracy stops improving. Combining these techniques (for example, co-distilling using an ensembled model as a teacher) allows for the generation of a model that is both highly accurate and fast.”)
Koch and Wubbels are analogous in arts because they have the same filed of endeavor of the perform the hyperparameter tuning process based on the training of the machine learning model.
Accordingly, it would have been obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the hyperparameter tunning system, as taught by Koch, to include the in response to a prediction that the success metric will not be improved, the generation model using the at least one set of hyperparameters and the prediction that the success metric will not be improved, as taught by Wubbles. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the performance of the machine learning, (Wubbels, [Col.10, lines 60-67], “Co-distilling is related to assembling. Co-distilling is a technique to improve the performance of the machine learning model 900 by training the machine learning model 900 on the inference of a more computationally expensive machine learning model 910, such as an ensembled machine learning model. Co-distilling is an attempt to achieve the same high model performance of the more computationally expensive machine learning model 910, but without requiring the intensive compute resources. The more computationally expensive model 910 can be thought of as a teacher model.”).
However, neither Koch nor Wubbles teach and based on whether the accuracy of the prediction model has been found to be above a generation training accuracy threshold, to further train, by machine learning
On the other hand, Hu teaches and based on whether the accuracy of the prediction model has been found to be above a generation training accuracy threshold, to further train, by machine learning (Hu, [0082], “In the present embodiment, the electronic device may determine the characteristic vector of the image including the image area and the image including the image area as training samples to continue training the convolutional neural network, in response to determining that the recognition accuracy being greater than the preset threshold (for example, 80%).”).
Koch, Wubbles and Hu are analogous in arts because they have the same filed of endeavor of the perform the hyperparameter tunning process based on the training of the machine learning model.
Accordingly, it would have been obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the combined teaching of Koch and Wubbles of the hyperparameter tunning system, and in response to a prediction that the success metric will not be improved, the generation model using the at least one set of hyperparameters and the prediction that the success metric will not be improved, as set forth above, to include the and based on whether the accuracy of the prediction model has been found to be above a generation training accuracy threshold, to further train, by machine learning . The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the accuracy, (Hu, [Par.0082], “It may be understood that using the image set and the characteristic vector set corresponding to the recognition accuracy greater than the preset threshold as the training sample set and continuing to train the convolutional neural network may help improve the recognition accuracy of the image area.”).
Regarding claim 12, is rejected for the same reason as the claim 5, since these claims recite the same limitations.
Regarding claim 19, is rejected for the same reason as the claim 5, since these claims recite the same limitations.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure is provide below.
Basu et al. (PUB. NO.:US20200226496-herainafter, Basu) teaches the hyperparameter tuning system.
Sarkar et al. (PUB. NO.: US20190095785-hereinafter, Sarkar) teaches the selecting and tuning the parameter to continue the training of the machine learning model.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EM N TRIEU whose telephone number is (571)272-5747. The examiner can normally be reached on Mon-Fri from 9:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/E.T./Examiner, Art Unit 2128
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128