Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This office action is in response to the claims filed on 11/03/2023.
Claims 1-20 are presented for examination.
Double Patenting
The non-statutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-7, 15-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1-7, 15-27 of Patent Number: 11809968. Additionally, the claim 8-14 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1-7, 15-27 of Patent Number: 11809968 in view of Basu et al. (Pub. No. US20200226496 – hereinafter, Basu) and further in view of Convetino et al (PUB. NO. US 20200097847, hereinafter Convertino).
This is a non-provisional non-statutory double patenting rejection.
Instant application 18501846
Patent Number 11809968
Claim 1:
receive a set of hyperparameters for an artificial intelligence (AI) model, the hyperparameters configured to be tuned according to a hyperparameter tuning technique based on a success metric; train a prediction model using a machine learning process, the prediction model configured to estimate whether further application of the hyperparameter tuning technique will cause an improvement in the success metric; test the hyperparameters using the hyperparameter tuning technique until a stopping point; apply the prediction model to determine if further testing of the hyperparameters after the stopping point is predicted to improve the success metric; and terminate the hyperparameter tuning technique when: at least one of an accuracy of the prediction model in predicting improvement in the success metric is above a predetermined accuracy threshold, and the prediction model predicts that further application of the hyperparameter tuning technique will not result in an improvement to the success metric; or the accuracy of the prediction model in predicting improvement in the success metric is below the predetermined accuracy threshold, and an accuracy of hyperparameter optimization is determined to be below a predetermined tuning accuracy threshold
Claim 1:
determine a plurality of batches of hyperparameters for an artificial intelligence (AI) model, the plurality of batches of hyperparameters configured to be tuned according to a hyperparameter tuning technique based on a success metric ...: train the instance of the AI model using training data; test the instance of the AI model using testing data to generate AI model testing results; train a prediction model using a machine learning process and the AI model testing results, the prediction model configured to estimate whether further application of the hyperparameter tuning technique will cause an improvement in the success metric; test the hyperparameters using the hyperparameter tuning technique until a stopping point; apply the prediction model to determine if further testing of the hyperparameters after the stopping point is predicted to improve the success metric; and terminate the hyperparameter tuning technique when: (i) an accuracy of the prediction model in predicting improvement in the success metric is above a predetermined accuracy threshold, and the prediction model predicts that further application of the hyperparameter tuning technique will not result in an improvement to the success metric; or (ii) the accuracy of the prediction model in predicting improvement in the success metric is below the predetermined accuracy threshold, and an accuracy of hyperparameter optimization is determined to be below a predetermined tuning accuracy threshold.
Claim 2:
continuing the hyperparameter tuning technique when the accuracy of the prediction model in predicting improvement in the success metric is below the predetermined accuracy threshold and the accuracy of hyperparameter optimization is determined to be above the predetermined tuning accuracy threshold.
Claim 2:
continuing the hyperparameter tuning technique when the accuracy of the prediction model in predicting improvement in the success metric is below the predetermined accuracy threshold and the accuracy of hyperparameter optimization is determined to be above the predetermined tuning accuracy threshold.
Claim 3:
receiving a next batch of hyperparameters to be tested by the hyperparameter tuning technique; applying the prediction model to predict whether the next batch of hyperparameters is likely result in improvement of the success metric; and causing the hyperparameter optimization to skip the next batch of hyperparameters if the prediction model does not predict improvement in the success metric.
Claim 3:
receiving a next batch of hyperparameters to be tested by the hyperparameter tuning technique; applying the prediction model to predict whether the next batch of hyperparameters is likely to result in improvement of the success metric; and causing the hyperparameter optimization to skip the next batch of hyperparameters if the prediction model does not predict improvement in the success metric.
Claim 4:
wherein at least one of the predetermined accuracy threshold or the predetermined tuning accuracy threshold are configured to bias against terminating the hyperparameter tuning technique.
Claim 4:
wherein at least one of the predetermined accuracy threshold or the predetermined tuning accuracy threshold are configured to bias against terminating the hyperparameter tuning technique.
Claim 5:
wherein the success metric comprises one or more of: an accuracy of the Al model, a speed of convergence of the hyperparameter tuning technique, a loss function, a speed of assessment of the Al model, or a speed of training of the Al model.
Claim 5:
wherein the success metric comprises one or more of: an accuracy of the AI model, a speed of convergence of the hyperparameter tuning technique, a loss function, a speed of assessment of the AI model, or a speed of training of the AI model.
Claim 6:
wherein the prediction model is configured to be applied in parallel across a plurality of computing devices.
Claim 6:
wherein the prediction model is configured to be applied in parallel across a plurality of computing devices.
Claim 7:
further storing instructions for applying the prediction model to select a next batch of hyperparameters based on a likelihood that the next batch of hyperparameters will improve the success metric, and testing the next batch of hyperparameters with the hyperparameter tuning technique.
Claim 7:
The medium of claim 1, further storing instructions for applying the prediction model to select a next batch of hyperparameters based on a likelihood that the next batch of hyperparameters will improve the success metric, and testing the next batch of hyperparameters with the hyperparameter tuning technique.
Regarding claim 8, patent in view of Basu teaches the method of claim 8 a computer-implemented method, comprising: receiving one or more settings for a machine learning process, the settings configured to be set to a value before the machine learning process begins according to a setting establishment procedure based on one or more measurements of performance (Basu, [Par.0064-0067]),the settings configured not to be adjusted after initiation of the machine learning process (Basu, [Par.0002]),configure a prediction model to learn an effect of a change to the one or more settings on the one or more measurements of performance (Basu, [Par.0032-0033, 0054-0055], adjust the settings in a first round of the setting establishment procedure (Basu, Fig.3D and par.0089-0091],
However, Patent and Basu do not teach the settings configured not to be adjusted after initiation of the machine learning process; apply the prediction model to predict whether a second round of the setting establishment procedure is likely to result in the measurements of performance becoming closer to one or more targets; and performing the second round of the setting establishment procedure when the prediction model predicts that the measurements of performance are likely to become closer to the targets or when the prediction model is unable to make a prediction, and refraining from performing the second round when the prediction model predicts that the measurements of performance are unlikely to become closer to the targets.
On the other hand, Convertino, teaches apply the prediction model to predict whether a second round of the setting establishment procedure is likely to result in the measurements of performance becoming closer to one or more targets (Convertino, [Par.0045-0050], and performing the second round of the setting establishment procedure when the prediction model predicts that the measurements of performance are likely to become closer to the targets or when the prediction model is unable to make a prediction (Convertion, [par.0042],), and refraining from performing the second round when the prediction model predicts that the measurements of performance are unlikely to become closer to the targets (Convertion, [par.0042],.
It would have been obvious to a person of ordinary skill in the arts at the time of the applicant's invention to modify the combined teachings of Patent No. 11809968 and Basu by incorporating the modify the settings configured to be set to a value before the machine learning process begins according to a setting establishment procedure based on one or more measurements of performance, the settings configured not to be adjusted after initiation of the machine learning process, configure a prediction model to learn an effect of a change to the one or more settings on the one or more measurements of performance; adjust the settings in a first round of the setting establishment procedure; as taught by Basu, to include apply the prediction model to predict whether a second round of the setting establishment procedure is likely to result in the measurements of performance becoming closer to one or more targets; and performing the second round of the setting establishment procedure when the prediction model predicts that the measurements of performance are likely to become closer to the targets or when the prediction model is unable to make a prediction, and refraining from performing the second round when the prediction model predicts that the measurements of performance are unlikely to become closer to the targets, as taught by Convertino. for the purpose to improve the performance (Convertino, [Par.0039])
Regarding claim 9, patent in view of Basu teaches the method of claim 8, further comprising performing the second round of the setting procedure when the first round of the setting procedure caused one or more of the measurements of performance to become closer to the one or more targets (Convertino, [Par.0045-0050],).
It would have been obvious to a person of ordinary skill in the arts at the time of the applicant's invention to modify the combined teachings of Patent No. 11809968 and Basu by incorporating the modify the hyperparemeter tunning, to include performing the second round of the setting procedure when the first round of the setting procedure caused one or more of the measurements of performance to become closer to the one or more targets, as taught by Convertino, for the purpose to improve the performance (Convertino, [Par.0039],).
Regarding claim 10, patent in view of Basu and Convertino teaches The method of claim 8, further comprising: receiving a new group of values for the settings to be evaluated by the setting establishment procedure (Convertion, [Par.0046], );
using the prediction model to estimate whether the next group of values is likely to cause the measurements of performance to become closer to the one or more targets (Convertino, [Par.0045-0050],)
and refraining from applying the setting establishment procedure to the new group of values if the prediction model indicates that it is unlikely that the measurements of performance will become closer to the one or more targets based on the new group of values (Convertion, [par.0042],).
It would have been obvious to a person of ordinary skill in the arts at the time of the applicant's invention to modify the combined teachings of Patent No. 11809968 and Basu by incorporating the hyperparameter tunning, to include performing the second round of the setting procedure when the first round of the setting procedure caused one or more of the measurements of performance to become closer to the one or more targets, as taught by Convertino for the purpose to improve the performance (Convertino, [Par.0039]).
Regarding claim 11, patent in view of Basu and Convertino teaches the method of claim 8, wherein the prediction model is configured to favor performing the second round of the setting establishment procedure (Convertion, [par.0042],).
It would have been obvious to a person of ordinary skill in the arts at the time of the applicant's invention to modify the combined teachings of Patent No. 11809968 and Basu by incorporating the hyperparameter tunning, to include the prediction model is configured to favor performing the second round of the setting establishment procedure, as taught by Convertino for the purpose to improve the performance (Convertino, [Par.0039]).
Claim 12:
wherein the measurements of performance comprise one or more of: an accuracy of a prediction model produced by the machine learning process, a speed of convergence of the setting establishment procedure, a loss function, a speed of assessment of the prediction model produced by the machine learning process, or a speed of training of the machine learning process.
Claim 5
The medium of claim 1, wherein the success metric comprises one or more of: an accuracy of the Al model, a speed of convergence of the hyperparameter tuning technique, a loss function, a speed of assessment of the Al model, or a speed of training of the Al model.
Claim 13:
wherein the prediction model is configured to be applied in parallel across a plurality of computing devices.
Claim 6 ;
The medium of claim 1, wherein the prediction model is configured to be applied in parallel across a plurality of computing devices
Claim 14:
applying the prediction model to select a new set of settings based on a likelihood that the new set of settings will move the measurements of performance closer to the targets, and using the setting establishment procedure with the new set of settings.
Claim 27:
The computer-implemented method of claim 21, further comprising applying the prediction model to select a next batch of hyperparameters based on a likelihood that the next batch of hyperparameters will improve the success metric, and testing the next batch of hyperparameters with the hyperparameter tuning technique..
Claim 15 is rejected for the same reason as the claim 1, since these claims recites the same limitations.
Claim 16 is rejected for the same reason as the claim 2, since these claims recite the same limitations.
Claim 17 is rejected for the same reason as the claim 3, since these claims recite the same limitations.
Claim 18 is rejected for the same reason as the claim 4, since these claims recite the same limitations.
Claim 19 is rejected for the same reason as the claim 5, since these claims recite the same limitations.
Claim 20 is rejected for the same reason as the claim 6, since these claims recite the same limitations.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 4, 5, 6, 7, 15, 18, 19, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Basu et al. (Pub. No. US20200226496 – hereinafter, Basu) in view of Ganu et al. (Patent.NO.: US 10380236 -herinafter, Ganu).
Regarding claim 1, Basu teaches a non-transitory computer-readable medium storing instructions configured to cause one or more processors to:receive a set of hyperparameters for an artificial intelligence (AI) model, the hyperparameters configured to be tuned according to a hyperparameter tuning technique based on a success metric, (Basu, [Par.0019, 0078-0079], “[0019], “ FIG. 5 is a block diagram illustrating components of a machine, according to some example embodiments, able to read instructions from a machine-readable medium, [0078], lines 1-3], “The hyperparameter tuning server 110 may then receive a selection of one or more hyperparameters 226 to optimize (Operation 308).” [Par.0079], “the hyperparameters to optimize may be specific to the selected machine-learning model. Accordingly, depending on which machine-learning model is selected, the hyperparameter tuning server 110 may optimize different sets of hyperparameters. By way of example, and without limitation, examples of hyperparameters include a learning rate, a minimal loss reduction, maximal depth, minimum sum of instance weight for a child, subsample ratio, subsample ratio of columns for a tree, subsample ratio of columns for each split, and one or more regularization terms.”).
train a prediction model using a machine learning process, the prediction model configured to estimate whether further application of the hyperparameter tuning technique will cause an improvement in the success metric (Basu, [Par.0054, lines 1-3], “To execute a selected machine learning model, the networked architecture 102 includes a master server 112 and a set of execution servers 114.” For further explain see [Par.0055, lines 1-9], “Within this framework, each of the execution servers of the set of execution servers 114 may perform a particular function evaluation (e.g., the performance metric being determined). In one embodiment, each function evaluation takes a set of hyperparameter values as input, evaluates on one or more of the execution servers, and reduces to a performance metric value on the master server 112. This performance metric value is the observed function evaluation for the given set of hyperparameter.”);
test the hyperparameters using the hyperparameter tuning technique until a stopping point (Basu, [Par.0055, lines 9-14], “Maximizing the hyperparameter values according to their corresponding performance metric values may be perfoi flied by the master server 112. In alternative embodiments, the maximizing of the hyperparameter values may be performed by the set of execution servers 114.” And [Par.0071, lines 1-3], “On a first iteration, the initial resulting quality metric values are temporarily stored as a baseline set of hyperparameter values and quality metric values.” Examiner’s note, Where the baseline set of hyperparameter value is considered as the stopping point.);
apply the prediction model to determine if further testing of the hyperparameters after the stopping point is predicted to improve the success metric (Basu, [Par.0071, lines 3-8], “The master server 112 then instructs the one or more execution servers 114 to execute the kernel function and/or quality metric function over the entirety of the training data selected from the database 116 using the baseline set of hyperparameter values and corresponding quality metric values.” For further explain, see [0072, lines 1-4], “The search optimization application 218 may be configured to identify the best (e.g., optimized) hyperparameter vectors or regions within the domain X for further exploration.”);
and terminate the hyperparameter tuning technique when: the prediction model predicts that further application of the hyperparameter tuning technique will not result in an improvement to the success metric (Basu, [Par.0086, lines 11-23], “The iterative loop ends when the master server 112 determines that the values of the hyperparameter vector x, are converging on corresponding values. In one embodiment, the convergence may be determined by comparing a difference value determined from a current hyperparameter value with its prior value, and a convergence threshold. In this manner, the comparison indicates whether a current hyperparameter value has changed from its prior value. If the change ( e.g., the delta between the current hyperparameter value and the prior hyperparameter value) is above the convergence threshold, then it is likely that the hyperparameter value is still approaching a limit value.”);
However, Basu does not teach and terminate the hyperparameter tuning technique when: at least one of an accuracy of the prediction model in predicting improvement in the success metric is above a predetermined accuracy threshold, and the prediction model predicts that further application of the hyperparameter tuning technique will not result in an improvement to the success metric; or the accuracy of the prediction model in predicting improvement in the success metric is below the predetermined accuracy threshold, and an accuracy of hyperparameter optimization is determined to be below a predetermined tuning accuracy threshold;
On the other hand, Ganu teaches terminate the hyperparameter tuning technique when: at least one of an accuracy of the prediction model in predicting improvement in the success metric is above a predetermined accuracy threshold and the prediction model predicts that further application of the hyperparameter tuning technique will not result in an improvement to the success metric (Ganu, [Col.13, ;Lines 9-27], “An embodiment of the tuning process is depicted in FIG. 5. The process includes two stages. In the first stage 502, at operation 515, the text annotation model 510 is trained using real training data 520, to generate a model error 525. The model error 525 is then evaluated using a model complexity adjuster 530, which may determine, based on the model error, whether the model's complexity may be reduced. In some embodiments, if the model's error 525 is remains sufficiently low (e.g., below a threshold), the model's complexity may be reduced at operation 534. The reduction in complexity may be performed using a dropout hyperparameter that controls the number of hidden units that are active in the model 510. In some embodiments, the dropout parameter may indicate a probability that a given hidden unit in the model is randomly zero'ed out, or “dropped out” from the model. The process of the first stage 502 then repeats to iteratively reduce the complexity of the model 510, until the model error 525 is no longer acceptable (e.g., is above a specified threshold). At that point, the process may stop 532.” And [Col.55, lines 55-67], “Next, in a second phase of tuning, the model configured with the tentative dropout value is iteratively trained on the synthetic data set with random truth labels. Again, the training error of the model is monitored across each iteration. The process progressively reduces the complexity of the model using the dropout parameter, and continues until the model is no longer able to improve against the synthetic data set. Because the synthetic data set contain randomized truth labels, little reasoning may be learned from this data. Thus, any improvement that is seen against the synthetic data may be assumed to be generated from memorization. Accordingly, if the training error indicates that the model is performing better than a naïve model that simply randomly predicts the output label based on label proportions, the process lowers the complexity of the model using the dropout parameter, and performs another tuning iteration. When the training error indicates that the model is performing no better than the naïve model, the process may stop, as the training error indicates that the model is at a complexity level where it is no longer able to memorize the training data.” Examiner’s note, the claim does not define what is the hyperparameter tunning technique, therefore, the process progressively reduces the complexity of the model using the dropout parameter, that is considered as the hyperparameter tunning technique, for example, when the training error of the model no longer performs the better result, then the tunning is stopped, and when the error of the of the iteration training is above the threshold, the process will be stopped, that is corresponding to the improvement in the success metric is above a predetermined accuracy threshold.),
Basu and Ganu are analogous in arts because they have the same filed of endeavor of the perform the hyperparameter/parameter tuning process based on the training of the machine learning model.
Accordingly, it would have been obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the and terminate the hyperparameter tuning technique when and the prediction model predicts that further application of the hyperparameter tuning technique will not result in an improvement to the success metric, as taught by Basu, to include the terminate the hyperparameter tuning technique when: at least one of an accuracy of the prediction model in predicting improvement in the success metric is above a predetermined accuracy threshold and the prediction model predicts that further application of the hyperparameter tuning technique will not result in an improvement to the success metric, as taught by Ganu. The modification would have been obvious because one of the ordinary skills in art would be improve the training accuracy (Ganu, [Col. 12, lines 15-25], “In some situations, a deep learning model of high complexity may improve its accuracy on a training data set through two mechanisms. First, the model may improve by learning the task at hand through higher level features, which is what is generally desired. Second, the model may improve by simply memorizing the training data, which does not result in any real "learning." Deep neural networks used in practice can memorize training datasets especially when the number of model parameters is of the same order as the number of data points.”).
Regarding claim 4, Basu teaches the medium of claim 1, wherein at least one of the predetermined accuracy threshold or the predetermined tuning accuracy threshold are configured to bias against terminating the hyperparameter tuning technique (Basu, [Claim 3, lines 7-11], “The assignment of the baseline set of hyperparameter values as a current set of hyperparameter values comprises discounting at least one hyperparameter value of the baseline set of hyperparameter values by the determined baseline bias value.”).
Regarding claim 5 Basu teaches the medium of claim 1, wherein the success metric comprises one or more of: an accuracy of the Al model, a speed of convergence of the hyperparameter tuning technique, a loss function, a speed of assessment of the Al model, or a speed of training of the Al model. (Basu, [Par.0067-0068], “n addition, the tuning parameter(s) 228 may include parameters relating to whether the master server 112 is to determine that one or more of the hyperparameter values are converging on a particular value. For example, the tuning parameters may include a predetermined distance or convergence threshold and a predetermined convergence percentage. The predetermined distance or convergence threshold indicates a threshold value for determining whether a particular hyperparameter value is converging on a particular value. As discussed below, the master server 112 may determine whether a hyperparameter value is converging on a particular value by computing a difference between a current hyperparameter value and its prior hyperparameter value. This difference value may then be compared with the convergence threshold to determine whether the hyperparameter value is converging on a particular value. [0068] The predetermined convergence percentage indicates the percentage of hyperparameter values that are to satisfy the convergence threshold for the master server 112 to affirmatively determine that the hyperparameter values have converged on corresponding values.”
Regarding claim 6 , Basu teaches the medium of claim 1, wherein the prediction model is configured to be applied in parallel across a plurality of computing devices (Basu, [Claim 7], “evaluation of the machine-learning model using the sampled training data and the initial plurality of hyperparameter values to obtain a plurality of corresponding performance metric values is performed in parallel within a distributed computing architecture.”). . .
Regarding claim 7, Basu teaches the medium of claim 1, further storing instructions for applying the prediction model to select a next batch of hyperparameters based on a likelihood that the next batch of hyperparameters will improve the success metric (Basu, [Fig. 3D, Par.0089-0091], “Referring to FIG. 3D, at operation 326, the master server 112 adjusts the hyperparameter values, the performance metric values, and/or quality metric values by the determined bias value. This adjusted set of values is then merged with the prior set of hyperparameter values and performance metric values and/or quality metric values (Operation 328). In one embodiment, Operations 326-328 correspond to step eleven of Algorithm 2.,,,[0091], ] Next, at Operation 332, the master server 112 determines whether the current set of hyperparameter values are converging on a particular value. As discussed above, the master server 112 may determine this convergence by comparing one or more difference values with corresponding converging threshold values. In one embodiment, the master server 112 determines that the hyperparameter values are converging when a predetermined percentage of the hyperparameters (e.g., 80%) are associated with a difference value less than or equal to a corresponding convergence threshold value. The predetermined percentage of the hyperparameters may be stored as one or more of the tuning parameters 228. Where the master server 112 determines that the hyperparameter values are not converging (e.g., the “NO” branch of Operation 332), the method 302 returns to Operation 320 as shown in FIG. 3C. Alternatively, where the master server 112 determines that the hyperparameter values are converging (e.g., the “YES” branch of Operation 332), the method 302 proceeds to Operation 334, where the master server 112 returns the hyperparameter vector x* that includes optimized (e.g., maximized) hyperparameter values for corresponding hyperparameters to the hyperparameter tuning server 110. In one embodiment, the optimized hyperparameter values are stored as the hyperparameter values 230. Operation 334 may correspond to step fourteen of Algorithm 2.”
and testing the next batch of hyperparameters with the hyperparameter tuning technique (Basu, [Par.0089-0094], “Referring to FIG. 3D, at operation 326, the master server 112 adjusts the hyperparameter values, the performance metric values, and/or quality metric values by the determined bias value. This adjusted set of values is then merged with the prior set of hyperparameter values and performance metric values and/or quality metric values (Operation 328). In one embodiment, Operations 326-328 correspond to step eleven of Algorithm 2…[0094]The global maximum is around 1.085, which is obtained at (5,5) but the function f further has two local maximums. The testing on the trimodal shekel function included three methods: (1) Algorithm 1; (2) Algorithm 2 with ten prior random evaluations of f(x.sub.1, x.sub.2); and (3) Algorithm 2 with 20 prior random evaluations of f(x.sub.1, x.sub.2). The testing was repeated 25 times for each method and averaged over the accumulative largest evaluations’)..,
Regarding claim 15, Basu teaches An apparatus comprising:a non-transitory computer-readable medium storing a set of hyperparameters for an artificial intelligence (Al) model, the hyperparameters configured to be adjusted according to a hyperparameter selection technique based on one or more parameters; (Basu, [Par.0019, 0078-0079], “[0019], “ FIG. 5 is a block diagram illustrating components of a machine, according to some example embodiments, able to read instructions from a machine-readable medium, [0078], lines 1-3], “The hyperparameter tuning server 110 may then receive a selection of one or more hyperparameters 226 to optimize (Operation 308).” [Par.0079], “the hyperparameters to optimize may be specific to the selected machine-learning model. Accordingly, depending on which machine-learning model is selected, the hyperparameter tuning server 110 may optimize different sets of hyperparameters. By way of example, and without limitation, examples of hyperparameters include a learning rate, a minimal loss reduction, maximal depth, minimum sum of instance weight for a child, subsample ratio, subsample ratio of columns for a tree, subsample ratio of columns for each split, and one or more regularization terms.”).
and a processor configured to: train a prediction model using a machine learning process, the prediction model configured to estimate whether further application of the hyperparameter selection technique will cause an improvement in at least one of the hyperparameters (Basu, [Par.0054, lines 1-3], “To execute a selected machine learning model, the networked architecture 102 includes a master server 112 and a set of execution servers 114.” For further explain see [Par.0055, lines 1-9], “Within this framework, each of the execution servers of the set of execution servers 114 may perform a particular function evaluation (e.g., the performance metric being determined). In one embodiment, each function evaluation takes a set of hyperparameter values as input, evaluates on one or more of the execution servers, and reduces to a performance metric value on the master server 112. This performance metric value is the observed function evaluation for the given set of hyperparameter.”);
select the hyperparameters using the hyperparameter selection technique (Basu, [Par.0012], “he computing device then selects maximized hyperparameter values from the initial set of hyperparameter values based on the corresponding quality metric value or performance metric value. Thereafter, the computing device, then evaluates the larger corpus of training data using the maximized hyperparameter values. This evaluation results in another corresponding set of performance metric values. The maximized hyperparameter values and their corresponding set of performance metric values are then merged with the prior set of hyperparameter values.”);
apply the prediction model to determine if further adjustment of the hyperparameters is likely to improve the success metric (Basu, [Par.0071, lines 3-8], “The master server 112 then instructs the one or more execution servers 114 to execute the kernel function and/or quality metric function over the entirety of the training data selected from the database 116 using the baseline set of hyperparameter values and corresponding quality metric values.” For further explain, see [0072, lines 1-4], “The search optimization application 218 may be configured to identify the best (e.g., optimized) hyperparameter vectors or regions within the domain X for further exploration.”);
and terminate the hyperparameter selection technique when: and the prediction model predicts that further application of the hyperparameter selection technique will not result in an improvement to the hyperparameter (Basu, [Par.0086, lines 11-23], “The iterative loop ends when the master server 112 determines that the values of the hyperparameter vector x, are converging on corresponding values. In one embodiment, the convergence may be determined by comparing a difference value determined from a current hyperparameter value with its prior value, and a convergence threshold. In this manner, the comparison indicates whether a current hyperparameter value has changed from its prior value. If the change ( e.g., the delta between the current hyperparameter value and the prior hyperparameter value) is above the convergence threshold, then it is likely that the hyperparameter value is still approaching a limit value.”);
However, Basu does not teach and terminate the hyperparameter selection technique when: an accuracy of the prediction model in predicting improvement in at least one of the hyperparameters is above a predetermined accuracy threshold, , and the prediction model predicts that further application of the hyperparameter selection technique will not result in an improvement to the hyperparameter; or the accuracy of the prediction model in predicting improvement in the hyperparameter is below the predetermined accuracy threshold, and an accuracy of hyperparameter adjustment is determined to be below a predetermined adjustment accuracy threshold;
On the other hand, Ganu teaches terminate the hyperparameter selection technique when: an accuracy of the prediction model in predicting improvement in at least one of the hyperparameters is above a predetermined accuracy threshold, , and the prediction model predicts that further application of the hyperparameter selection technique will not result in an improvement to the hyperparameter (Ganu, [Col.13, ;Lines 9-27], “An embodiment of the tuning process is depicted in FIG. 5. The process includes two stages. In the first stage 502, at operation 515, the text annotation model 510 is trained using real training data 520, to generate a model error 525. The model error 525 is then evaluated using a model complexity adjuster 530, which may determine, based on the model error, whether the model's complexity may be reduced. In some embodiments, if the model's error 525 is remains sufficiently low (e.g., below a threshold), the model's complexity may be reduced at operation 534. The reduction in complexity may be performed using a dropout hyperparameter that controls the number of hidden units that are active in the model 510. In some embodiments, the dropout parameter may indicate a probability that a given hidden unit in the model is randomly zero'ed out, or “dropped out” from the model. The process of the first stage 502 then repeats to iteratively reduce the complexity of the model 510, until the model error 525 is no longer acceptable (e.g., is above a specified threshold). At that point, the process may stop 532.” And [Col.55, lines 55-67], “Next, in a second phase of tuning, the model configured with the tentative dropout value is iteratively trained on the synthetic data set with random truth labels. Again, the training error of the model is monitored across each iteration. The process progressively reduces the complexity of the model using the dropout parameter, and continues until the model is no longer able to improve against the synthetic data set. Because the synthetic data set contain randomized truth labels, little reasoning may be learned from this data. Thus, any improvement that is seen against the synthetic data may be assumed to be generated from memorization. Accordingly, if the training error indicates that the model is performing better than a naïve model that simply randomly predicts the output label based on label proportions, the process lowers the complexity of the model using the dropout parameter, and performs another tuning iteration. When the training error indicates that the model is performing no better than the naïve model, the process may stop, as the training error indicates that the model is at a complexity level where it is no longer able to memorize the training data.” Examiner’s note, when the training arror of the model no longer performs the better result, then the tunning is stopped, and when the error of the of the iteration training is above the threshold, the process will be stopped, that is corresponding to the improvement in the success metric is above a predetermined accuracy threshold.),
Basu and Ganu are analogous in arts because they have the same filed of endeavor of the perform the hyperparameter/parameter tuning process based on the training of the machine learning model.
Accordingly, it would have been obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the a and terminate the hyperparameter selection technique when: and the prediction model predicts that further application of the hyperparameter selection technique will not result in an improvement to the hyperparameter, as taught by Basu, to include the terminate the hyperparameter selection technique when: an accuracy of the prediction model in predicting improvement in at least one of the hyperparameters is above a predetermined accuracy threshold, , and the prediction model predicts that further application of the hyperparameter selection technique will not result in an improvement to the hyperparameter, as taught by Ganu. The modification would have been obvious because one of the ordinary skills in art would be improve the training accuracy (Ganu, [Col. 12, lines 15-25], “In some situations, a deep learning model of high com-plexity may improve its accuracy on a training data set through two mechanisms. First, the model may improve by learning the task at hand through higher level features, which is what is generally desired. Second, the model may improve by simply memorizing the training data, which does not result in any real "learning." Deep neural networks used in practice can memorize training datasets especially when the number of model parameters is of the same order as the number of data points.”).
Regarding claim 18 is rejected for the same reason as the claim 4, since these claims recite the same limitations.
Regarding claim 19 is rejected for the same reason as the claim 5, since these claims recite the same limitations.
Regarding claim 20 is rejected for the same reason as the claim 6, since these claims recite the same limitations.
Claims 2, 16 are rejected under 35 U.S.C. 103 as being unpatentable over Basu et al. (Pub. No. US20200226496 – hereinafter, Basu) in view of Ganu et al. (Patent.NO.: US 10380236 -herinafter, Ganu) further in view of Moore et al. (Pub. No. US20200057958– hereinafter, Moore).
Regarding claim 2, Basu teaches the medium of claim 1, further storing instructions for continuing the hyperparameter tuning technique when the accuracy of the prediction model in predicting improvement in the success metric is below the predetermined accuracy threshold (Basu, [Fig. 3D, 3C, Par.0091, lines 6-16], “In one embodiment, the master server 112 determines that the hyperparameter values are converging when a predetermined percentage of the hyperparameters ( e.g., 80%) are associated with a difference value less than or equal to a corresponding convergence threshold value. The predetermined percentage of the hyperparameters may be stored as one or more of the tuning parameters 228. Where the master server 112 determines that the hyperparameter values are not converging ( e.g., the "NO" branch of Operation 332), the method 302 returns to Operation 320 as shown in FIG. 3C.” where the convergence threshold corresponding to the accuracy threshold.)
However, neither Basu nor Ganu teaches and the accuracy of hyperparameter optimization is determined to be above the predetermined tuning accuracy threshold.
On the other hand, Moore teaches and the accuracy of hyperparameter optimization is determined to be above the predetermined tuning accuracy threshold (Moore, [Par.0029, lines 1-7], “In stage 270, the hyperparameter values for each previously used hyperparameter determined to be in common with the hyperparameters of the version of the selected machine learning model may be searched based on a threshold value. For example, if a previously used hyperparameter value is 10, stage 270 may select a threshold range of 5, so that values between 5 and 15 will be tested for suitability.”).
Basu, and Moore are analogous in arts because they have the same field of endeavor of optimizing hyperparameter values for training data.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to modify the continuing the hyperparameter tuning technique when the accuracy of the prediction model in predicting improvement in the success metric is below the predetermined accuracy threshold, as taught by Basu, to include the accuracy of hyperparameter optimization is determined to be above the predetermined tuning accuracy threshold, as taught by Moore.. The modification would have been obvious because one of the ordinary skills in art would be motivated to tuning a hyper-parameter to achieve the search result (Moore, [Par. 0002], “Hyperparameters also may be searched algorithmically using a brute force approach. A search algorithm may execute to find the optimal hyperparameters within the set of all possible combinations, but this approach may require an exponentially larger amount of computing time as the number of hyperparameters increases. Compounding the prob-lem, the search algorithm may require its own hyperparameters, and significant time may be spent tuning those hyperparameters to achieve a useable search result.”).
Regarding claim 16 is rejected for the same reason as the claim 2, since these claims recite the same limitations.
Claims 3, 17 are rejected under 35 U.S.C. 103 as being unpatentable over Basu et al. (Pub. No. US20200226496 – hereinafter, Basu) in view of Ganu et al. (Patent.NO.: US 10380236 -herinafter, Ganu) and further in view of Ura et al. (Pub. No. US20190122078– hereinafter, Ura).
Regarding claim 3, Basu teaches the medium of claim 1, further storing instructions for:receiving a next batch of hyperparameters to be tested by the hyperparameter tuning technique; applying the prediction model to predict whether the next batch of hyperparameters is likely result in improvement of the success metric (Basu, [Fig. 3D, Par.0086], “Beginning with Operation 320, the method 302 enters an iterative loop that searches for a maximized set of hyperparameter values. As discussed with reference to FIG. 3D, each successive iteration results in a merging of a maximized set of hyperparameter values with a prior set of hyperparameter values. This feature is illustrated in FIG. 1, where the one or more execution servers 114 output various hyperparameter values and their corresponding performance metric values, which are then merged with the prior set of hyperparameter values to result in a new (or next) set of hyperparameter vectors 118. The iterative loop ends when the master server 112 determines that the values of the hyperparameter vector x.sub.t are converging on corresponding values. In one embodiment, the convergence may be determined by comparing a difference value determined from a current hyperparameter value with its prior value, and a convergence threshold. In this manner, the comparison indicates whether a current hyperparameter value has changed from its prior value. If the change (e.g., the delta between the current hyperparameter value and the prior hyperparameter value) is above the convergence threshold, then it is likely that the hyperparameter value is still approaching a limit value. In one embodiment, this convergence threshold may be defined as one or more of the tuning parameter(s) 228. Furthermore, each of the hyperparameters of the vector x.sub.t may each be associated with a convergence threshold value.”);
However, Basu does not teach causing the hyperparameter optimization to skip the next batch of hyperparameters if the prediction model does not predict improvement in the success metric.
On the other hand, Ura teaches and causing the hyperparameter optimization to skip the next batch of hyperparameters if the prediction model does not predict improvement in the success metric (Ura, [Par.0039, lines 6-9], “On the other hand, if the total resources 15 are equal to or less than the threshold 16, the processing unit 12 allows the learning process 14d to be performed while withholding the execution of the learning process 14c.” The learning process 14c is withhold if the total resources 15 are equal to or less than the threshold 16).
Basu, Guna and Ura are analogous in arts because they have the same field of endeavor of optimizing hyperparameter values for training data.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the :receiving a next batch of hyperparameters to be tested by the hyperparameter tuning technique; applying the prediction model to predict whether the next batch of hyperparameters is likely result in improvement of the success metric, as taught by Basu, to include the causing the hyperparameter optimization to skip the next batch of hyperparameters if the prediction model does not predict improvement in the success metric., as taught by Ura. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve success metric (Ura, [Par.0004, lines 1-6], “As for machine learning, it is preferable that the built model achieve high accuracy, that is, have a high ability to predict the outcomes of unknown instances (sometimes called prediction performance) with accuracy. The larger the size of training data (sample size) used in learning, the better the prediction performance.”).
Regarding claim 17 is rejected for the same reason as the claim 3, since these claims recite the same limitations.
Claims 8-14 are rejected under 35 U.S.C. 103 as being unpatentable over Basu et al. (Pub. No. US20200226496 – hereinafter, Basu) in view of Convertino et l. (Pub. No: US20200097847, hereinafter Convertino).
Regarding claim 8, Basu teaches a computer-implemented method, comprising: receiving one or more settings for a machine learning process, the settings configured to be set to a value before the machine learning process begins according to a setting establishment procedure based on one or more measurements of performance (Basu, [Par.0064-0067], [0064], “The model selection application 214 is configured to provide one or more graphical user interfaces that allow the user of the one or more client devices 104-108 to select a machine-learning model from one or more machine-learning model(s) 232. Examples of trainable machine-learning model(s) 232 include, but are not limited to, Nearest Neighbor, Naïve Bayes, Decision Trees, Linear Regression, Support Vector Machines (SVM), and neural networks[0065] Furthermore, each of the trainable machine-learning model(s) 232 may be associated with one or more hyperparameters that define the corresponding machine-learning model. Examples of hyperparameters include, but are not limited to, a learning rate, a minimal loss reduction, maximal depth, minimum sum of instance weight for a child, subsample ratio, subsample ratio of columns for a tree, subsample ratio of columns for each split, and one or more regularization terms. In selecting a machine-learning model via the model selection application 214, the hyperparameter tuning server 110 may automatically select one or more hyperparameters 226 associated with the selected machine-learning model 232 to optimize. Additionally, and/or alternatively, a user may specify which of the hyperparameters 226 associated with a selected machine-learning model 232 to optimize….[0067], “In addition, the tuning parameter(s) 228 may include parameters relating to whether the master server 112 is to determine that one or more of the hyperparameter values are converging on a particular value. For example, the tuning parameters may include a predetermined distance or convergence threshold and a predetermined convergence percentage. The predetermined distance or convergence threshold indicates a threshold value for determining whether a particular hyperparameter value is converging on a particular value.”),,
the settings configured not to be adjusted after initiation of the machine learning process (Basu, [Par.0002], “Hyperparameter tuning is an important and challenging problem in the field of machine-learning since the model accuracy for a machine-learning model can vary drastically given different hyperparameters. in general, a hyperparameter is a parameter of a machine-learning model whose value is set before the learning process begins, and whose value cannot be estimated from the training data on which the machine-learning model is to operate.”
configure a prediction model to learn an effect of a change to the one or more settings on the one or more measurements of performance (Basu, [Par.0032-0033, 0054-0055], “To execute a selected machine-learning model, the networked architecture 102 includes a master server 112 and a set of execution servers 114. In one embodiment, the master server 112 and/or the set of execution servers 114 are executed in a distributed fashion using the Apache Spark computation framework. As known in the art of distributed computing, Apache Spark is an open-source distributed general-purpose cluster-computing framework and is available from the Apache Software Foundation. Under this framework, and in one embodiment, the computation nodes for executing the machine-learning model are organized into two categories—a “driver” (e.g., the master server 112) and “executors” (e.g., the set of execution servers 114). in some instances, the master server 112 is implemented as a singleton node that orchestrates the computation, and the set of execution servers 114 apply transformations in parallel at the level of partitions of individual training examples. The individual results of each of the execution servers are then communicated (e.g., via the network 124) to the master server 112…[0055] Within this framework, each of the execution servers of the set of execution servers 114 may perform a particular function evaluation (e.g., the performance metric being determined). In one embodiment, each function evaluation takes a set of hyperparameter values as input, evaluates on one or more of the execution servers, and reduces to a performance metric value on the master server 112. This performance metric value is the observed function evaluation for the given set of hyperparameters. Maximizing the hyperparameter values according to their corresponding performance metric values may be perfoi flied by the master server 112. In alternative embodiments, the maximizing of the hyperparameter values may be performed by the set of execution servers 114. As discussed below with reference to FIGS. 3A-3D, the master server 112 and/or set of execution servers 114 may implement one or more of the operations described in Algorithm 2, above.;
adjust the settings in a first round of the setting establishment procedure (Basu, Fig.3D and par.0089-0091], “Referring to FIG. 3D, at operation 326, the master server 112 adjusts the hyperparameter values, the performance metric values, and/or quality metric values by the determined bias value. This adjusted set of values is then merged with the prior set of hyperparameter values and performance metric values and/or quality metric values (Operation 328). In one embodiment, Operations 326-328 correspond to step eleven of Algorithm 2. [0090] At Operation 330, the master server 112 identifies the merged set of hyperparameter values and corresponding performance metric values as the next set of values to use in performing the next iteration of Operations 320-330. in this regard, Operation 330 corresponds to step twelve of Algorithm 2.”).;
However, Basu does not teach the settings configured not to be adjusted after initiation of the machine learning process; apply the prediction model to predict whether a second round of the setting establishment procedure is likely to result in the measurements of performance becoming closer to one or more targets; and performing the second round of the setting establishment procedure when the prediction model predicts that the measurements of performance are likely to become closer to the targets or when the prediction model is unable to make a prediction, and refraining from performing the second round when the prediction model predicts that the measurements of performance are unlikely to become closer to the targets.
On the other hand, Convertino, teaches apply the prediction model to predict whether a second round of the setting establishment procedure is likely to result in the measurements of performance becoming closer to one or more targets (Convertino, [Par.0045-0050], “…Step 206 includes validating hypotheses formed at step 204 using details of the experiments. For example, while testing a general hypothesis, a user may drill down into specific experiments to determine, for example, 1) what the details of a given experiment say about the hypotheses, and 2) whether the predictions generated using the current state of a given ML model can be trusted by looking at the results. The combination of steps 204 and 206 represent an in-depth iterative investigation. Steps 204 and 206 may be iteratively performed with the support of detailed reports on hyperparameters and performance metrics from individual experiments. [0049] Step 208 includes deciding if additional tuning of one or more hyperparameters is needed. Once a user has gathered evidence through hypotheses and validation from a current batch of experiments, the user then decides, for example, 1) whether the ML model in its current state meets expectations, 2) whether additional training will improve the performance of the ML model, and 3) whether additional tuning of the hyperparameters is worth the effort, given limited available resources to run additional experiments. This step may be performed with the support of summative and detailed reports from steps 204 and 206..”);
and performing the second round of the setting establishment procedure when the prediction model predicts that the measurements of performance are likely to become closer to the targets or when the prediction model is unable to make a prediction (Convertion, [par.0042], “FIG. 1 shows a flow diagram of steps in an example ML model development and deployment workflow, according to some embodiments. The example ML model development and deployment workflow may begin with acquiring, cleaning, transforming, enriching, or otherwise preparing available data to be used with the ML model (step 102). Next, certain relevant features may be engineered, for example, based on defined goals of the ML model (e.g., predicting a customer churn based on customer data) (step 104). Next, a type of ML model to be used is selected (step 106). The type of ML model selected will depend on characteristics of the data as well as the defined goals of the ML model. Example ML model types include neural networks, clustering models, decision trees, random forest classifiers, etc. Once an ML model type is selected, the ML model is built (step 108), for example, by setting values for one or more hyperparameters, and then validated (step 110), for example, by training and testing the model using at least some of the available data. If not satisfied with the results of the ML validation process, the developer of the ML model can iteratively return to any of the previous steps, for example, to modify the available data, chose a different ML type, or set different hyperparameters values. Finally, once the ML model developer is satisfied with the results generated by the ML model, the ML model can be deployed and used (step 112), for example, by sharing with other data analysts and domain experts or by embedding the ML model into a business process” Examiner’s note, the training is iteratively trained when the learning result is not satisfied, therefore, the second round (another round) is trained when the performance is not close to target (satisfied result) ),
,
and refraining from performing the second round when the prediction model predicts that the measurements of performance are unlikely to become closer to the targets (Convertion, [par.0042], “FIG. 1 shows a flow diagram of steps in an example ML model development and deployment workflow, according to some embodiments. The example ML model development and deployment workflow may begin with acquiring, cleaning, transforming, enriching, or otherwise preparing available data to be used with the ML model (step 102). Next, certain relevant features may be engineered, for example, based on defined goals of the ML model (e.g., predicting a customer churn based on customer data) (step 104). Next, a type of ML model to be used is selected (step 106). The type of ML model selected will depend on characteristics of the data as well as the defined goals of the ML model. Example ML model types include neural networks, clustering models, decision trees, random forest classifiers, etc. Once an ML model type is selected, the ML model is built (step 108), for example, by setting values for one or more hyperparameters, and then validated (step 110), for example, by training and testing the model using at least some of the available data. If not satisfied with the results of the ML validation process, the developer of the ML model can iteratively return to any of the previous steps, for example, to modify the available data, chose a different ML type, or set different hyperparameters values. Finally, once the ML model developer is satisfied with the results generated by the ML model, the ML model can be deployed and used (step 112), for example, by sharing with other data analysts and domain experts or by embedding the ML model into a business process” Examiner’s note, the training is iteratively trained when the learning result is not satisfied, therefore, the second round (another round) is trained when the performance is not close to target (satisfied result)).
Basu and Convertino are analogous in arts because they have the same field of endeavor of optimizing hyperparameter values for training data.
Accordingly, it would have been prima facie obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention to modify the settings configured to be set to a value before the machine learning process begins according to a setting establishment procedure based on one or more measurements of performance, the settings configured not to be adjusted after initiation of the machine learning process, configure a prediction model to learn an effect of a change to the one or more settings on the one or more measurements of performance; adjust the settings in a first round of the setting establishment procedure; as taught by Basu, to include apply the prediction model to predict whether a second round of the setting establishment procedure is likely to result in the measurements of performance becoming closer to one or more targets; and performing the second round of the setting establishment procedure when the prediction model predicts that the measurements of performance are likely to become closer to the targets or when the prediction model is unable to make a prediction, and refraining from performing the second round when the prediction model predicts that the measurements of performance are unlikely to become closer to the targets, as taught by Convertino. The modification would have been obvious because one of the ordinary skills in art would be motivated toTo improve the performance (Convertino, [Par.0039], “Hyperparameters, such as the number of layers or the dropout rate, can dramatically affect the performance of ML models. Hyperparameter tuning, to improve performance, is therefore critical to the successful implementation of ML models. To configure an ML model to work well in practice, hyperparameters should be tuned when training the model. Evaluating the effect of a given hyperparameter setting is expensive since it usually requires the model to be trained and tested, which can be time consuming and computationally expensive.”).
Regarding claim 9. Basu in view of Convertino teaches the method of claim 8, further comprising performing the second round of the setting procedure when the first round of the setting procedure caused one or more of the measurements of performance to become closer to the one or more targets (Convertino, [Par.0045-0050], “…Step 206 includes validating hypotheses formed at step 204 using details of the experiments. For example, while testing a general hypothesis, a user may drill down into specific experiments to determine, for example, 1) what the details of a given experiment say about the hypotheses, and 2) whether the predictions generated using the current state of a given ML model can be trusted by looking at the results. The combination of steps 204 and 206 represent an in-depth iterative investigation. Steps 204 and 206 may be iteratively performed with the support of detailed reports on hyperparameters and performance metrics from individual experiments. [0049] Step 208 includes deciding if additional tuning of one or more hyperparameters is needed. Once a user has gathered evidence through hypotheses and validation from a current batch of experiments, the user then decides, for example, 1) whether the ML model in its current state meets expectations, 2) whether additional training will improve the performance of the ML model, and 3) whether additional tuning of the hyperparameters is worth the effort, given limited available resources to run additional experiments. This step may be performed with the support of summative and detailed reports from steps 204 and 206..”).
Basu and Convertino are analogous in arts because they have the same field of endeavor of optimizing hyperparameter values for training data.
Accordingly, it would have been prima facie obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention to modify the hyperparemeter tunning, as taught by Basu, to include performing the second round of the setting procedure when the first round of the setting procedure caused one or more of the measurements of performance to become closer to the one or more targets, as taught by Convertino. The modification would have been obvious because one of the ordinary skills in art would be motivated toTo improve the performance (Convertino, [Par.0039], “Hyperparameters, such as the number of layers or the dropout rate, can dramatically affect the performance of ML models. Hyperparameter tuning, to improve performance, is therefore critical to the successful implementation of ML models. To configure an ML model to work well in practice, hyperparameters should be tuned when training the model. Evaluating the effect of a given hyperparameter setting is expensive since it usually requires the model to be trained and tested, which can be time consuming and computationally expensive.”).
Regarding claim 10, Basu as modified in view of Convertino teaches The method of claim 8, further comprising: receiving a new group of values for the settings to be evaluated by the setting establishment procedure (Convertion, [Par.0046], “Par.0046], “Step 202 includes setting hyperparameter values for an ML model. This initial step may rely on the knowledge and expertise of the user ( e.g., a data science or ML professional) to set initial hyperparameter values based, for example, on their understanding of the data, the problem to be solved using the data, and the model type being utilized. Due to the iterative nature of the process, this step of setting hyperparameter values may be repeated after subsequent steps in the hyperparameter tuning process.”);
using the prediction model to estimate whether the next group of values is likely to cause the measurements of performance to become closer to the one or more targets (Convertino, [Par.0045-0050], “…Step 206 includes validating hypotheses formed at step 204 using details of the experiments. For example, while testing a general hypothesis, a user may drill down into specific experiments to determine, for example, 1) what the details of a given experiment say about the hypotheses, and 2) whether the predictions generated using the current state of a given ML model can be trusted by looking at the results. The combination of steps 204 and 206 represent an in-depth iterative investigation. Steps 204 and 206 may be iteratively performed with the support of detailed reports on hyperparameters and performance metrics from individual experiments. [0049] Step 208 includes deciding if additional tuning of one or more hyperparameters is needed. Once a user has gathered evidence through hypotheses and validation from a current batch of experiments, the user then decides, for example, 1) whether the ML model in its current state meets expectations, 2) whether additional training will improve the performance of the ML model, and 3) whether additional tuning of the hyperparameters is worth the effort, given limited available resources to run additional experiments. This step may be performed with the support of summative and detailed reports from steps 204 and 206..”)
and refraining from applying the setting establishment procedure to the new group of values if the prediction model indicates that it is unlikely that the measurements of performance will become closer to the one or more targets based on the new group of values (Convertion, [par.0042], “FIG. 1 shows a flow diagram of steps in an example ML model development and deployment workflow, according to some embodiments. The example ML model development and deployment workflow may begin with acquiring, cleaning, transforming, enriching, or otherwise preparing available data to be used with the ML model (step 102). Next, certain relevant features may be engineered, for example, based on defined goals of the ML model (e.g., predicting a customer churn based on customer data) (step 104). Next, a type of ML model to be used is selected (step 106). The type of ML model selected will depend on characteristics of the data as well as the defined goals of the ML model. Example ML model types include neural networks, clustering models, decision trees, random forest classifiers, etc. Once an ML model type is selected, the ML model is built (step 108), for example, by setting values for one or more hyperparameters, and then validated (step 110), for example, by training and testing the model using at least some of the available data. If not satisfied with the results of the ML validation process, the developer of the ML model can iteratively return to any of the previous steps, for example, to modify the available data, chose a different ML type, or set different hyperparameters values. Finally, once the ML model developer is satisfied with the results generated by the ML model, the ML model can be deployed and used (step 112), for example, by sharing with other data analysts and domain experts or by embedding the ML model into a business process”).
Basu and Convertino are analogous in arts because they have the same field of endeavor of optimizing hyperparameter values for training data.
Accordingly, it would have been prima facie obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention to modify the hyperparemeter tunning, as taught by Basu, to include performing the second round of the setting procedure when the first round of the setting procedure caused one or more of the measurements of performance to become closer to the one or more targets, as taught by Convertino. The modification would have been obvious because one of the ordinary skills in art would be motivated toTo improve the performance (Convertino, [Par.0039], “Hyperparameters, such as the number of layers or the dropout rate, can dramatically affect the performance of ML models. Hyperparameter tuning, to improve performance, is therefore critical to the successful implementation of ML models. To configure an ML model to work well in practice, hyperparameters should be tuned when training the model. Evaluating the effect of a given hyperparameter setting is expensive since it usually requires the model to be trained and tested, which can be time consuming and computationally expensive.”).
Regarding claim 11, Basu as modified in view of convertino teaches the method of claim 8, wherein the prediction model is configured to favor performing the second round of the setting establishment procedure (Convertion, [par.0042], “FIG. 1 shows a flow diagram of steps in an example ML model development and deployment workflow, according to some embodiments. The example ML model development and deployment workflow may begin with acquiring, cleaning, transforming, enriching, or otherwise preparing available data to be used with the ML model (step 102). Next, certain relevant features may be engineered, for example, based on defined goals of the ML model (e.g., predicting a customer churn based on customer data) (step 104). Next, a type of ML model to be used is selected (step 106). The type of ML model selected will depend on characteristics of the data as well as the defined goals of the ML model. Example ML model types include neural networks, clustering models, decision trees, random forest classifiers, etc. Once an ML model type is selected, the ML model is built (step 108), for example, by setting values for one or more hyperparameters, and then validated (step 110), for example, by training and testing the model using at least some of the available data. If not satisfied with the results of the ML validation process, the developer of the ML model can iteratively return to any of the previous steps, for example, to modify the available data, chose a different ML type, or set different hyperparameters values. Finally, once the ML model developer is satisfied with the results generated by the ML model, the ML model can be deployed and used (step 112), for example, by sharing with other data analysts and domain experts or by embedding the ML model into a business process”).
Basu and Convertino are analogous in arts because they have the same field of endeavor of optimizing hyperparameter values for training data.
Accordingly, it would have been prima facie obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention to modify the hyperparemeter tunning, as taught by Basu, to include he prediction model is configured to favor performing the second round of the setting establishment procedure, as taught by Convertino. The modification would have been obvious because one of the ordinary skills in art would be motivated toTo improve the performance (Convertino, [Par.0039], “Hyperparameters, such as the number of layers or the dropout rate, can dramatically affect the performance of ML models. Hyperparameter tuning, to improve performance, is therefore critical to the successful implementation of ML models. To configure an ML model to work well in practice, hyperparameters should be tuned when training the model. Evaluating the effect of a given hyperparameter setting is expensive since it usually requires the model to be trained and tested, which can be time consuming and computationally expensive.”).
Regarding claim 12 is rejected for the same reason as the claim 5, since these claims recites the same limitations.
Regarding claim 13 is rejected for the same reason as the claim 6, since these claims recites the same limitations.
Regarding the claim 14. Basu teaches the method of claim 8, further comprising applying the prediction model to select a new set of settings based on a likelihood that the new set of settings will move the measurements of performance closer to the targets, and using the setting establishment procedure with the new set of settings (Basu, [Par.0012, 0089], .[0012], “ The computing device then selects maximized hyperparameter values from the initial set of hyperparameter values based on the corresponding quality metric value or performance metric value. Thereafter, the computing device, then evaluates the larger corpus of training data using the maximized hyperparameter values. This evaluation results in another corresponding set of performance metric values. The maximized hyperparameter values and their corresponding set of performance metric values are then merged with the prior set of hyperparameter values. This merged set is then assigned as the next of hyperparameter values that the computing device is to evaluate using the larger corpus of training data. As discussed below with reference to FIG. 2 and FIGS. 3A-D, these operations are performed iteratively until the computing device determines that the hyperparameter values are converging to a particular value. When the computing device determines that the hyperparameter values are converging to the particular value (e.g., a predetermined percentage of the hyperparameter values are within a predetermined distance to the particular value), the computing device returns the hyperparameter values for each of the hyperparameters.” And [Par.0089], “Referring to FIG. 3D, at operation 326, the master server 112 adjusts the hyperparameter values, the performance metric values, and/or quality metric values by the determined bias value. This adjusted set of values is then merged with the prior set of hyperparameter values and performance metric values and/or quality metric values (Operation 328). In one embodiment, Operations 326-328 correspond to step eleven of Algorithm 2.”).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EM N TRIEU whose telephone number is (571)272-5747. The examiner can normally be reached on Mon-Fri from 9:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/E.T./Examiner, Art Unit 2128
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128