DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to amendments filed November 21st, 2025. The status of the claims is as follows. Claims 1, 7, 10, and 11 are amended and Claims 4-5, 8-9 and 12-20 are cancelled. Claims 21-33 have been added. Claims 1-3, 6-7, 10-11, 21-33 are currently pending.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-3, 6-7, 10-11, 21-33 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1,
(Step 1): Claim 1 recites A computer-implemented method for determining an optimized list, thus a process, one of the four statutory categories of patentable subject matter.
(Step 2A Prong 1): However, Claim 1 further recites determining an optimized list of sets of hyperparameter values for application to an additional machine learning task which constitutes the evaluation of hyperparameter value lists to determine an optimized list of hyperparameter values for a machine learning task, thus corresponding to a mental process which can be done mentally or by pen and paper
obtaining … data describing a plurality of different machine learning tasks which constitutes the evaluation of machine learning tasks based on random sampling to determine data descriptive of the plurality of machine learning tasks, thus corresponding to a mental process which can be done mentally or by pen and paper
obtaining … a plurality of candidate sets of hyperparameter values which constitutes the evaluation of hyperparameter values to determine a plurality of candidate sets, thus corresponding to a mental process which can be done mentally or by pen and paper
determining … an ordered list of sets of hyperparameters selected from the plurality of candidate sets of hyperparameter values which constitutes the evaluation of the candidate sets of hyperparameters to determine an ordered list of the sets, thus corresponding to a mental process which can be done mentally or by pen and paper.
Thus, Claim 1 recites an abstract idea.
(Step 2A Prong 2): The claim does not recite any additional elements which integrate the abstract idea into a practical application because the additional elements consist of:
by one or more computing devices or by the one or more computing devices, which are instances of implementing an abstract idea on generic computer components (MPEP 2106.05(f))
wherein the ordered list of sets of hyperparameters minimizes an aggregate loss over the plurality of different machine learning tasks which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
storing … the ordered list of sets of hyperparameters, which is insignificant extra-solution activity of data outputting (MPEP 2106.05(g))
for use in training an additional machine learning model to perform an additional machine learning task, which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
wherein each of the plurality of candidate sets of hyperparameter values comprises and identification of one of a number of potential optimization algorithms which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
and thus, the claim is directed to the abstract idea of evaluating data associated with different machine learning tasks as well as evaluating determined hyperparameter values to determine an optimized, ordered list of sets of hyperparameters.
(Step 2B) The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because element a) ((via MPEP 2106.05(f), “apply it on a computer”) cannot provide an inventive concept, elements b) and d) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself, and element c) is further well-understood, routine, and conventional activity of “storing and retrieving information in memory,” by MPEP 2106.05(d), which cannot provide significantly more than the abstract idea itself. Thus, Claim 1 is subject-matter ineligible.
Claim 2, dependent upon Claim 1 recites additional steps of the abstract idea (Claim 2: evaluating … a respective loss for each of the plurality of candidate sets of hyperparameter values for each of the plurality of different machine learning tasks, which constitutes the evaluation of candidate sets for each machine learning task to determine a respective loss, thus corresponding to a mental process which can be done mentally or by pen and paper; identifying … a candidate set of hyperparameter values that provides, in combination with all previously selected sets of hyperparameter values, a minimum alternative loss over the plurality of different machine learning tasks, which constitutes the evaluation of sets of hyperparameter values to determine a candidate set that provides minimum alternative loss over a plurality of different machine learning tasks, thus corresponding to a mental process which can be done mentally or by pen and paper; adding … the identified candidate set of hyperparameter values to the ordered list of sets of hyperparameter … removing … the identified candidate set of hyperparameter values from the plurality of candidate sets of hyperparameter values, which constitutes the evaluation of the plurality of candidate sets and ordered list of sets to reconfigure the lists with the identified candidate set properly added and removed, thus corresponding to a mental process which can be done mentally or by pen and paper). The claim does not recite any additional elements which integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself because the additional elements consist of:
by the one or more computing devices which are instances of implementing an abstract idea on generic computer components (MPEP 2106.05(f))
for a plurality of selection iterations which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly more than the abstract idea itself, because element a) ((via MPEP 2106.05(f), “apply it on a computer”) cannot provide an inventive concept and element b) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claim 2 is subject-matter ineligible.
Claim 3, dependent upon Claim 1 recites additional steps of the abstract idea (Claim 3: adding … a best candidate set of hyperparameter values to the ordered list of sets of hyperparameters which constitutes the evaluation of hyperparameter values to determine a best candidate set to add to the ordered list of sets of hyperparameters, thus corresponding to a mental process which can be done mentally or by pen and paper; identifying … a candidate set of hyperparameter values of the plurality of candidate sets of hyperparameter values that produces the minimum alternative loss, which constitutes the evaluation of the plurality of candidate sets of hyperparameter values to determine a candidate set producing the minimum alternative loss, thus corresponding to a mental process which can be done mentally or by pen and paper). The claim does not recite any additional elements which integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself because the additional elements consist of:
for a first selection iteration of the plurality of selection iterations which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
by the one or more computing devices which are instances of implementing an abstract idea on generic computer components (MPEP 2106.05(f))
wherein the best candidate set of hyperparameters comprises the lowest overall respective loss for each of the plurality of different machine learning tasks among the plurality of candidate sets of hyperparameter values which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
for the remaining plurality of selection iterations which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
the minimum alternative loss comprising a performance difference for tasks for which the candidate set of hyperparameter values produces a lower respective loss than a current lowest respective loss produced for the task by one or more sets of hyperparameters of the ordered list of sets of hyperparameters which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly more than the abstract idea itself because elements a), c), d), e) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself and element b) ((via MPEP 2106.05(f), “apply it on a computer”) cannot provide an inventive concept. Thus, Claim 3 is subject-matter ineligible.
Claim 6, dependent upon Claim 1 recites additional steps of the abstract idea (Claim 6: generating … one or more machine learning tasks of the plurality of different machine learning tasks based on a random sampling of one or more neural network properties which constitutes the evaluation of the random sampling of neural network properties to determine generated machine learning tasks, thus corresponding to a mental process which can be done mentally or by pen and paper). The claim does not recite any additional elements which integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself because the additional elements consist of:
by the one or more computing devices which are instances of implementing an abstract idea on generic computer components (MPEP 2106.05(f)) which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly more than the abstract idea itself because element a) ((via MPEP 2106.05(f), “apply it on a computer”) cannot provide an inventive concept. Thus, Claim 6 is subject-matter ineligible.
Claim 7, dependent upon Claim 6 recites the additional element:
a) wherein the one or more neural network properties comprise at least one of: neural network architectures; activation functions; or model datasets which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because element a) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claim 7 is subject-matter ineligible.
Claim 10, dependent upon Claim 1 recites the additional element:
a) wherein the respective loss for each of the plurality of candidate sets of hyperparameter values for each of the plurality of different machine learning tasks is normalized for aggregation into the aggregate loss which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because element a) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claim 10 is subject-matter ineligible.
Regarding Claim 11,
(Step 1): Claim 11 recites A computer-implemented method for training a machine-learned model, thus a process, one of the four statutory categories of patentable subject matter.
(Step 2A Prong 1): However, Claim 11 further recites obtaining … an optimized list of sets of hyperparameters, which constitutes the evaluation of sets of hyperparameters to determine an optimized list of hyperparameter values for a machine learning task, thus corresponding to a mental process which can be done mentally or by pen and paper;
training … the model on the training data and according to at least one set of hyperparameters from the optimized list of sets of hyperparameters, which falls in the mathematical concept grouping of abstract ideas;
Thus, Claim 11 recites an abstract idea.
(Step 2A Prong 2): The claim does not recite any additional elements which integrate the abstract idea into a practical application because the additional elements consist of:
by one or more computing devices or by the one or more computing devices, which are instances of implementing an abstract idea on generic computer components (MPEP 2106.05(f))
to train an additional model to perform an additional machine learning task, which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
wherein the optimized list of sets of hyperparameters minimizes an aggregate loss over a plurality of different tasks corresponding to a plurality of different neural network architectures, which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
accessing … training data, which is insignificant extra-solution activity of data inputting (MPEP 2106.05(g))
and thus, the claim is directed to the abstract idea of evaluating sets of hyperparameters to optimize a configuration for a machine learning model in training.
(Step 2B) The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because element a) ((via MPEP 2106.05(f), “apply it on a computer”) cannot provide an inventive concept, elements b) and c) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself, and element d) is further well-understood, routine, and conventional activity of “transmitting or receiving data over a network,” by MPEP 2106.05(d), which cannot provide significantly more than the abstract idea itself. Thus, Claim 11 is subject-matter ineligible.
Claim 21, dependent upon Claim 11 recites additional steps of the mathematical concept (Claim 21: training … a plurality of variants of the model separately according to a plurality of sets of hyperparameters from the optimized list of sets of hyperparameters; evaluating … a respective performance of each variant of the model; selecting … a first variant of the model based on the respective performances of the variants of the model, which falls in the mathematical concept grouping of abstract ideas). The claim does not recite any additional elements which integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself because the additional elements consist of:
by the one or more computing devices which are instances of implementing an abstract idea on generic computer components (MPEP 2106.05(f)) which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly more than the abstract idea itself because element a) ((via MPEP 2106.05(f), “apply it on a computer”) cannot provide an inventive concept. Thus, Claim 21 is subject-matter ineligible.
Claim 22, dependent upon Claim 21 recites the additional element:
a) wherein the task performed by the additional machine-learned model is different than the tasks of the plurality of different machine learning tasks which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because element a) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claim 22 is subject-matter ineligible
Claim 23, dependent upon Claim 21 recites the additional element:
a) wherein the task performed by the additional machine-learned model is at least one of the tasks of the plurality of different machine learning tasks which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because element a) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claim 23 is subject-matter ineligible
Claims 24, 25, 26, and 27 recite a computing system comprising one or more processors and one or more non-transitory media storing instructions to perform the method of Claims 11, 21, 22, 23 respectively barring the following additional element present in Claims 11, 21 that is not explicitly present in Claims 24, 25:
by one or more computing devices or by the one or more computing devices, which are instances of implementing an abstract idea on generic computer components (MPEP 2106.05(f))
As performance of an abstract idea on generic computing components cannot integrate an abstract idea into a practical application nor provide significantly more than the abstract idea itself (see MPEP 2106.05(f)), Claims 24, 25, 26, and 27 are rejected for reasons set forth in the rejection of Claims 11, 21, 22, 23 respectively.
Claim 28, dependent upon Claim 1 recites the additional element:
a) wherein the task performed by the additional machine-learned model is at least one of the tasks of the plurality of different machine learning tasks which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because element a) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claim 28 is subject-matter ineligible
Claim 29, dependent upon Claim 10 recites the additional element:
a) wherein the normalization for a respective task maps the loss to a range bounded by a lowest loss achieved for the respective task by any of the plurality of candidate sets of hyperparameters which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because element a) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claim 29 is subject-matter ineligible
Claim 30, dependent upon Claim 10 recites the additional element:
a) wherein the normalization for a respective task maps the loss to a range bounded by a lowest loss achieved for the respective task by any of the plurality of candidate sets of hyperparameters and a loss at initialization of the respective task which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because element a) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claim 30 is subject-matter ineligible
Claim 31, dependent upon Claim 10 recites the additional element:
a) wherein the normalization comprises a linear representation of the loss which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because element a) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claim 31 is subject-matter ineligible
Claim 32, dependent upon Claim 1 recites the additional element:
a) wherein the plurality of different machine learning tasks comprises a first task for training a first model having a first neural network architecture and a second task for training a second model having a second neural network architecture which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because element a) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claim 32 is subject-matter ineligible
Claim 33, dependent upon Claim 1 recites the additional element:
a) wherein the number of potential optimization algorithms comprises a first optimization algorithm parameterized by a first number of hyperparameters and a second optimization algorithm parameterized by a second number of hyperparameters which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because element a) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claim 33 is subject-matter ineligible
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 33 are rejected under 35 U.S.C. 103 as being unpatentable over Wistuba et al. (“Sequential Model-Free Hyperparameter Tuning” [2015], hereinafter “Wistuba”) in view of Xu et al. (CN109816116A, hereinafter “Xu”).
Regarding Claim 1,
Wistuba discloses A computer-implemented method for determining an optimized list of sets of hyperparameter values for application to an additional machine learning task (Wistuba [Abstract]; “We adapt the sequential model-based optimization by replacing its surrogate model and acquisition function with one policy that is optimized for the task of hyperparameter tuning”,
Wistuba [Section IV Paragraph 1]; “With A-SMFO we propose a fast and parallelizable hyperparameter tuning strategy that can be applied easily. This will be extended to NN-SMFO which uses information about the similarity between past seen data sets”)
the method comprising: obtaining, by one or more computing devices, data describing a plurality of different machine learning tasks (Wistuba [Section IV Subsection A]; “contains all hyperparameter configurations that were best on the data sets
PNG
media_image1.png
22
61
media_image1.png
Greyscale
”,
Wistuba [Section VI Subsection A]; “we used 25 classification data sets randomly chosen from the UCI repository to create two meta-data sets” wherein datasets for classification read on a plurality of different machine learning tasks associated with classification)
obtaining, by the one or more computing devices, a plurality of candidate sets of hyperparameter values (Wistuba [Section IV Algorithm 1];
PNG
media_image2.png
237
470
media_image2.png
Greyscale
wherein λ -T reads on a plurality of candidate sets of hyperparameter values being obtained)
determining, by the one or more computing devices, an ordered list of sets of hyperparameters selected from the plurality of candidate sets of hyperparameter values, wherein the ordered list of sets of hyperparameters minimizes an aggregate loss over the plurality of different machine learning tasks (Wistuba [Section IV Subsection A]; “Algorithm 1 will find the best hyperparameter configurations for the meta-training set by sequentially selecting the hyperparameter configurations that minimize Equation 5 given Λ”,
Wistuba [Section IV Equation 5];
PNG
media_image3.png
138
317
media_image3.png
Greyscale
Wistuba [Section IV Algorithm 1];
PNG
media_image4.png
183
334
media_image4.png
Greyscale
wherein the sequential selection for optimized configuration of the meta-training set reads on an ordered list of sets of hyperparameters Λ that minimizes the loss function given by Equation 5)
and storing, by the one or more computing devices, the ordered list of sets of hyperparameters for use in training an additional machine learning model to perform an additional machine learning task (Wistuba [Section IV Subsection A]; “Algorithm 1 will find the best hyperparameter configurations for the meta-training set by sequentially selecting the hyperparameter configurations that minimize Equation 5 given Λ”,
Wistuba [Section IV Equation 5];
PNG
media_image3.png
138
317
media_image3.png
Greyscale
Wistuba [Section IV Algorithm 1];
PNG
media_image4.png
183
334
media_image4.png
Greyscale
wherein the sequential selection for the outputted sequence of hyperparameter configurations to
evaluate reads on a stored ordered list of sets of hyperparameters ΛT that minimizes the loss function given by Equation 5)
Wistuba fails to explicitly disclose but Xu discloses wherein each of the plurality of candidate sets of hyperparameter values comprises an identification of one of a number of potential optimization algorithms (Xu [Page 7 Paragraph 12]; “The technical solution provided by the embodiment of the present application implements automatic tuning of hyperparameters and hyperparameter algorithm switching in different machine learning models, and can interact with the server in the hyperparameter tuning process in the machine learning model. The dynamic adjustment of the parameter search range of the hyperparameter to be optimized is further improved, and the efficiency of the hyperparameter optimization in the machine learning model is further improved.
2B is a flow chart showing a method for optimizing parameters in a machine learning model provided by some embodiments of the present application, which may be performed by the server 13 shown in FIG. 1. As shown in FIG. 2B, the method for optimizing parameters in the machine learning model includes the following steps:
Step 202: Receive a task configuration file sent by the terminal device, where the task configuration file includes a hyper parameter to be optimized, a first algorithm type, and a parameter search range. When the hyperparameter optimization task is subsequently run, the server may obtain multiple reference values of the hyperparameter according to the one or more historical values and the parameter search range; according to the reference value and the first A hyperparameter optimization algorithm calculates a candidate value of the hyperparameter.
Step 204: Run a hyperparameter optimization task to calculate a candidate value of the hyperparameter according to the parameter search range and the first hyperparameter optimization algorithm corresponding to the first algorithm type” wherein the multiple hyper parameters and their calculated candidate values read on a plurality of candidate sets of hyperparameter values; wherein the calculated candidate values include identification of first and second potential algorithm types)
It would have been obvious to modify Wistuba’s method of determining optimized hyperparameters of a machine learning model to use Xu’s candidate hyperparameter identifiers for potential optimization algorithms in hyperparameter optimization. Wistuba’s method of obtaining candidate sets of hyperparameter values and determining a subsequent ordered list of sets of hyperparameters from the obtained candidate sets for use in training an additional machine learning model would modify the obtained candidate sets of hyperparameter values to instead be Xu’s obtained candidate sets of hyperparameter values additionally comprising identifications of potential optimization algorithms. One would have been motivated to utilize candidate sets of hyperparameter values comprising identifiers of potential optimization algorithms because: “the server determines the hyper-parameter optimization task according to the task identifier, and continues to run the hyper-parameter optimization task according to the updated task configuration file” (Xu [Page 7 Paragraph 11]), thus the algorithm hyperparameter identifier allows the model to identify and subsequently select which algorithm should be used for remaining optimization.
Regarding Claim 2,
Wistuba/Xu teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Wistuba/Xu further discloses wherein determining the ordered list of sets of hyperparameters comprises: for a plurality of selection iterations: evaluating, by the one or more computing devices, a respective loss for each of the plurality of candidate sets of hyperparameter values for each of the plurality of different machine learning tasks (Wistuba [Section III Equation 4];
PNG
media_image5.png
76
331
media_image5.png
Greyscale
wherein the average normalized error of the plurality of candidate sets is interpreted as a loss function calculated for each of the plurality of candidate sets λ T)
identifying, by the one or more computing devices, a candidate set of hyperparameter values that provides, in combination with all previously selected sets of hyperparameter values, a minimum alternative loss over the plurality of different machine learning tasks (Wistuba [Section IV Algorithm 1];
PNG
media_image6.png
80
236
media_image6.png
Greyscale
wherein the identified candidate set λ includes all of the minimized alternative losses from equation 5)
adding, by the one or more computing devices, the identified candidate set of hyperparameter values to the ordered list of sets of hyperparameters (Wistuba [Section IV Subsection A]; “Algorithm 1 will find the best hyperparameter configurations for the meta-training set by sequentially selecting the hyperparameter configurations that minimize Equation 5 given Λ” wherein the sequential selection of hyperparameter configurations to add to the meta-training set reads on adding the identified candidate set to the ordered list of sets)
removing, by the one or more computing devices, the identified candidate set of hyperparameter values from the plurality of candidate sets of hyperparameter values (Wistuba [Section III Equation 4];
PNG
media_image7.png
173
328
media_image7.png
Greyscale
wherein calculation of the loss function involving the candidate sets λ in the plurality of candidate sets of hyperparameter values λT reads on removal of the identified candidate sets λ-1,2,3… (candidate sets which have already been iterated on during
PNG
media_image8.png
18
144
media_image8.png
Greyscale
) from the plurality of candidate sets of hyperparameter values)
Regarding Claim 3,
Wistuba/Xu teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Wistuba/Xu further discloses wherein identifying, by the one or more computing devices, the candidate set of hyperparameter values that provides, in combination with all previously selected sets of hyperparameter values, a minimum alternative loss over the plurality of different machine learning tasks comprises: for a first selection iteration of the plurality of selection iterations: adding, by the one or more computing devices, a best candidate set of hyperparameter values to the ordered list of sets of hyperparameters (Wistuba [Section IV Algorithm 1];
PNG
media_image6.png
80
236
media_image6.png
Greyscale
wherein for iterations T, the best candidate set of hyperparameter values λ is added to the ordered list of sets of hyperparameters ΛT)
wherein the best candidate set of hyperparameters comprises the lowest overall respective loss for each of the plurality of different machine learning tasks among the plurality of candidate sets of hyperparameter values (Wistuba [Section IV Algorithm 1];
PNG
media_image6.png
80
236
media_image6.png
Greyscale
wherein best candidate set of hyperparameter values λ is comprised of the minimized respective loss for the plurality of candidate sets λ∈Λ)
identifying, by the one or more computing devices, a candidate set of hyperparameter values of the plurality of candidate sets of hyperparameter values that produces the minimum alternative loss, the minimum alternative loss comprising a performance difference for tasks for which the candidate set of hyperparameter values produces a lower respective loss than a current lowest respective loss produced for the task by one or more sets of hyperparameters of the ordered list of sets of hyperparameters (Wistuba [Section IV Algorithm 1];
PNG
media_image6.png
80
236
media_image6.png
Greyscale
Wistuba [Section III Equations 3, 4];
PNG
media_image9.png
159
329
media_image9.png
Greyscale
wherein best candidate set of hyperparameter values λ for the remaining plurality of iterations produces a minimized performance ranking, wherein the minimized performance ranking is computed through the CANE error metric read on as a minimum alternative loss that compares the performance of tasks for a given λ for the plurality of candidate sets λ∈Λ)
Regarding Claim 33,
Wistuba/Xu teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Xu already discloses wherein the number of potential optimization algorithms comprises a first optimization algorithm parameterized by a first number of hyperparameters and a second optimization algorithm parameterized by a second number of hyperparameters (Xu [Page 7 Paragraph 12]; “The technical solution provided by the embodiment of the present application implements automatic tuning of hyperparameters and hyperparameter algorithm switching in different machine learning models, and can interact with the server in the hyperparameter tuning process in the machine learning model. The dynamic adjustment of the parameter search range of the hyperparameter to be optimized is further improved, and the efficiency of the hyperparameter optimization in the machine learning model is further improved.
2B is a flow chart showing a method for optimizing parameters in a machine learning model provided by some embodiments of the present application, which may be performed by the server 13 shown in FIG. 1. As shown in FIG. 2B, the method for optimizing parameters in the machine learning model includes the following steps:
Step 202: Receive a task configuration file sent by the terminal device, where the task configuration file includes a hyper parameter to be optimized, a first algorithm type, and a parameter search range. When the hyperparameter optimization task is subsequently run, the server may obtain multiple reference values of the hyperparameter according to the one or more historical values and the parameter search range; according to the reference value and the first A hyperparameter optimization algorithm calculates a candidate value of the hyperparameter.
Step 204: Run a hyperparameter optimization task to calculate a candidate value of the hyperparameter according to the parameter search range and the first hyperparameter optimization algorithm corresponding to the first algorithm type” wherein the first and second potential algorithm types associated with determined candidate values being specifically hyperparameter optimization algorithms thus reads on the number of potential algorithms parameterized by respective numbers of hyperparameters)
Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Wistuba et al. (“Sequential Model-Free Hyperparameter Tuning” [2015], hereinafter “Wistuba”) in view of Xu et al. (CN109816116A, hereinafter “Xu”) in view of Bosc (“Learning to Learn Neural Networks” [2016], hereinafter “Bosc”).
Regarding Claim 6,
Wistuba/Xu teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Wistuba/Xu fails to disclose but Bosc discloses generating, by the one or more computing devices, one or more machine learning tasks of the plurality of different machine learning tasks based on a random sampling of one or more neural network properties (Bosc [Section 4 Paragraph 1];
PNG
media_image10.png
96
535
media_image10.png
Greyscale
Bosc [Section 4 Paragraph 5];
PNG
media_image11.png
85
545
media_image11.png
Greyscale
wherein the classification tasks encompassing randomly sampled dataset generation and activation functions reads on generation of machine learning tasks of the plurality of different machine learning tasks based on a random sampling of one or more neural network properties)
It would have been obvious use Bosc’s method of generating machine-learning tasks of a neural-network in Wistuba/Xu’s method of determining optimized hyperparameters of a machine learning model in the neural-network. One would have been motivated to determine optimized hyperparameters for a neural-network that executes machine-learning tasks because: “the RNN learns to improve its predictions with appropriate updates of its internal state” (Bosc [Section 1 Paragraph 2]).
Regarding Claim 7,
The combination of Wistuba/Xu/Bosc teaches the method of Claim 6 (and thus the rejection of Claim 6 is incorporated). The combination already discloses wherein the one or more neural network properties comprise at least one of: neural network architectures; activation functions; or model datasets (Bosc [Section 4 Paragraph 1];
PNG
media_image10.png
96
535
media_image10.png
Greyscale
Bosc [Section 4 Paragraph 5];
PNG
media_image11.png
85
545
media_image11.png
Greyscale
wherein the neural network upon which the task is generated comprises of activation functions, model datasets, and neural network architectures)
Claims 10, 29-31 are rejected under 35 U.S.C. 103 as being unpatentable over Wistuba et al. (“Sequential Model-Free Hyperparameter Tuning” [2015], hereinafter “Wistuba”) in view of Xu et al. (CN109816116A, hereinafter “Xu”) in view of Oliveira (“Binary Classification on French Hospital Data: benchmark of 7 Machine Learning Algorithms” [2018], hereinafter “Oliveira”).
Regarding Claim 10,
Wistuba/Xu teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Wistuba/Xu fails to explicitly disclosed but Oliveira further discloses wherein the respective loss for each of the plurality of candidate sets of hyperparameter values for each of the plurality of different machine learning tasks is normalized for aggregation into the aggregate loss (Oliveira [Section IV Definition 2];
PNG
media_image12.png
82
340
media_image12.png
Greyscale
Oliveira [Section V Subsection A Paragraph 2]; “Choosing hyperparameter values is non-trivial because of two factors: (i) optimal hyperparameter setting depends on the data set;” wherein the plurality of hyperparameters are associated with a plurality of different machine learning tasks
Oliveira [Section V Subsection B];
PNG
media_image13.png
259
331
media_image13.png
Greyscale
wherein the score reads on a loss function measuring performance across a list of hyperparameters, thus reading on the normalized score across iterations being maximized being interpreted as a normalized loss for aggregation into the aggregate loss)
It would have been obvious to use Oliveira’s method of hyperparameter tuning through a loss function normalized into a binary in Wistuba/Xu’s method of determining hyperparameters optimized through a non-normalized loss function of a machine learning model in the neural-network. One would have been motivated to perform Oliveira’s normalization in Wistuba/Xu’s method because normalization of different sets of data to a common scale or range allows one to “evaluate the prediction capability of f [Binary Classifier] on D [dataset]” (Oliveira [Section IV Definition 2])
Regarding Claim 29,
Wistuba/Xu/Oliveira teaches the method of Claim 10 (and thus the rejection of Claim 10 is incorporated). Wistuba already discloses wherein the normalization for a respective task maps the loss to a range bounded by a lowest loss achieved for the respective task by any of the plurality of candidate sets of hyperparameters (Wistuba [Section III Column 2 Paragraph 2];
PNG
media_image14.png
504
352
media_image14.png
Greyscale
wherein the normalization of the error as the new CANE metric being lower bounded by 0 thus reads on normalization mapping the range by a lowest loss achieved for the respective task (0 if the first try on every candidate data set yields consistently best hyperparameter configurations))
Regarding Claim 30,
Wistuba/Xu/Oliveira teaches the method of Claim 10 (and thus the rejection of Claim 10 is incorporated). Wistuba already discloses wherein the normalization for a respective task maps the loss to a range bounded by a lowest loss achieved for the respective task by any of the plurality of candidate sets of hyperparameters and a loss at initialization of the respective task. (Wistuba [Section III Column 2 Paragraph 2];
PNG
media_image14.png
504
352
media_image14.png
Greyscale
wherein the normalization of the error as the new CANE metric being lower bounded by 0 and upper bounded by 1 thus reads on normalization mapping the range by a lowest loss achieved for the respective task (0 if the first try on every candidate data set yields consistently best hyperparameter configurations) to a loss initialized at task initialization (loss scaled according to initialized relation with respective other task data sets))
Regarding Claim 31,
Wistuba/Xu/Oliveira teaches the method of Claim 10 (and thus the rejection of Claim 10 is incorporated). Wistuba already discloses wherein the normalization comprises a linear representation of the loss (Wistuba [Section III Column 2 Paragraph 2];
PNG
media_image15.png
172
367
media_image15.png
Greyscale
wherein the summation of the average normalized errors thus reads on a linear representation of the loss)
Claims 11, 21-27, are rejected under 35 U.S.C. 103 as being unpatentable over Wistuba et al. (“Sequential Model-Free Hyperparameter Tuning” [2015], hereinafter “Wistuba”) in view of Siems et al. (DE102019214500A1, hereinafter “Siems”).
Regarding Claim 11,
Wistuba discloses obtaining, by one or more computing devices, an optimized list of sets of hyperparameters to train an additional model to perform an additional machine learning task; (Wistuba [Section IV Algorithm 1];
PNG
media_image4.png
183
334
media_image4.png
Greyscale
Wistuba [Section IV Subsection A]; “Algorithm 1 will find the best hyperparameter configurations for the meta-training set by sequentially selecting the hyperparameter configurations that minimize Equation 5 given Λ” wherein the optimal ordered Λ computed through algorithm 1 reads on obtaining an optimized sorted list of sets of hyperparameters for model training)
wherein the optimized list of sets of hyperparameters minimizes an aggregate loss over a plurality of different tasks corresponding to a plurality of different neural network architectures; (Wistuba [Section IV Subsection A]; “Algorithm 1 will find the best hyperparameter configurations for the meta-training set by sequentially selecting the hyperparameter configurations that minimize Equation 5 given Λ”,
Wistuba [Section IV Equation 5];
PNG
media_image3.png
138
317
media_image3.png
Greyscale
Wistuba [Section IV Algorithm 1];
PNG
media_image4.png
183
334
media_image4.png
Greyscale
wherein the sequential selection for optimized configuration of the meta-training set reads on an ordered list of sets of hyperparameters λ that, seen in line 3 of Algorithm 1, minimizes the loss function given by Equation 5)
accessing, by the one or more computing devices, training data and training, by the one or more computing devices, the model on the training data and according to at least one set of hyperparameters from the optimized list of sets of hyperparameters (Wistuba [Section VI Figure 2];
PNG
media_image16.png
234
668
media_image16.png
Greyscale
Wistuba [Section VI Subsection B]; “B. Tuning Strategies We are comparing our proposed methods with five different competitor strategies. One is random search (Random) [1], the only fully parallelizable strategy besides A-SMFO. Then we compare to different variations of the SMBO framework. I-GP is using a Gaussian process with squared-exponential kernel as a surrogate model and does not consider any metaknowledge [10]. Surrogate Collaborative Tuning (SCoT) [6] and Gaussian process with multi kernel learning (MKL-GP) [7] are using surrogate models that consider meta-knowledge. Furthermore, we compare to SMAC++, a variation of SMAC [9] that also considers meta-knowledge during the optimization process. To empirically evaluate the influence of the distance function between data sets, we propose Rank Correlationbased Gaussian Process (RC-GP). This is a variation of MKLGP but instead of using the Euclidean distance of metafeatures to estimate the similarity between data sets, we use the distance function defined in Equation 6. Optimal is an artificial tuning strategy that always evaluates the best hyperparameter configuration and is added to some plots for orientation” wherein tuning of the AdaBoost and SVM models using the meta-datasets (optimized lists of sets of hyperparameters) reads on accessing training data and using it for training)
Wistuba fails to disclose but Siems discloses a plurality of tasks corresponding to a plurality of different neural network architectures (Siems [Page 3 Paragraph 7]; “It is further proposed that all, in particular gradient-based, architecture optimizers have the same hyperparameters during the first optimization in order to achieve a more meaningful comparison. The machine learning systems are preferably optimized on the same hardware.
It is further proposed that after the architecture optimizer has been selected, its hyperparameters are adapted as a function of the subset of the training data. The second optimization of the selected machine learning system can be carried out with the selected architecture optimizer, which is parameterized with the adapted hyperparameters.
A hyperparameter of the architecture optimizer can be understood as a parameter which influences a convergence behavior of the determined architecture of the architecture optimizer. For example, a hyperparameter can be a learning rate of the architecture optimizer or a batch size of the training data.
It is very time-consuming and often not expedient to set the architecture optimizer's hyperparameters manually. The advantage of this procedure is that the hyperparameters are set automatically, whereby a better convergence behavior of the architecture optimizer is achieved. This means that the machine learning system can be learned more quickly and more accurately during the second optimization using fewer computer resources.
It is also proposed that the optimization of the hyperparameters of the architecture optimizer is carried out during the first optimization after a predefinable number of optimization steps carried out by the architecture optimizer. The optimization of the hyper parameters is preferably carried out when the architecture optimizer has carried out an optimization step of the architecture and / or the parameters of the machine learning system.” wherein the first and second hyperparameter optimization tasks corresponding to a first pre-optimized architecture and a second post-optimized architecture (optimized by the architecture optimizer) thus reads on a plurality of tasks corresponding to different architectures of which optimized lists of hyperparameters are determined)
It would have been obvious to modify Wistuba’s plurality of tasks over which the plurality of optimized hyperparameters were derived to be correspondent to Siems’ plurality of pre and post-architecture optimization architectures. One would have been motivated to do so is because “The optimization of the hyper parameters is preferably carried out when the architecture optimizer has carried out an optimization step of the architecture and / or the parameters of the machine learning system. This has the advantage that the hyperparameters are constantly adapted to the progress of the creation of the machine learning system. This has an advantageous effect on the convergence behavior” (Siems [Page 3 Paragraph 11]).
Regarding Claim 21,
Wistuba/Siems teaches the method of Claim 11 (and thus the rejection of Claim 11 is incorporated). Wistuba/Siems further discloses wherein training comprises: training, by the one or more computing devices, a plurality of variants of the model separately according to a plurality of sets of hyperparameters from the optimized list of sets of hyperparameters (Wistuba [Section VI Figure 2];
PNG
media_image16.png
234
668
media_image16.png
Greyscale
wherein different variants of the model associated with differing tuning strategies are trained with the meta-datasets comprising optimized lists of sets of hyperparameters)
evaluating, by the one or more computing devices, a respective performance of each variant of the model; (Wistuba [Section VI Figure 2];
PNG
media_image16.png
234
668
media_image16.png
Greyscale
wherein each variant is evaluated by its average rank over a certain number of trials)
selecting, by the one or more computing devices, a first variant of the model based on the respective performances of the variants of the model (Wistuba [Section VI Subsection C]; “To confirm the first hypothesis, consider Figure 2. It shows the development of the average rank among different hyperparameter tuning strategies.1 On the AdaBoost meta-data set all tuning strategies but Random and SMAC++ are close together. Nevertheless, NN-SMFO is one of the best. Especially in the beginning, a larger gap to the other approaches is recognizable. Starting at iteration 40, MKL-GP seems to become very good but this is misleading as the reader will see in Figure 3. The improvement here is less than one may assume, improving the classification error on average at the third decimal place. The good performance of NN-SMFO becomes substantial on the SVM meta-data set. Here it takes about 50 trials until any other tuning strategy gets even close to the performance of NNSMFO.” wherein the designation of the NN-SMFO tuning strategy variant model as one of the best up to 50 trials indicates selection of a first variant of the model based on its average rank performance)
Regarding Claim 22,
Wistuba/Siems teaches the method of Claim 21 (and thus the rejection of Claim 21 is incorporated). Wistuba/Siems further discloses wherein the task performed by the additional machine-learned model is different than the tasks of the plurality of different machine learning tasks (Wistuba [Section VI Subsection A]; “Meta-features are a vital part in most of the competitor strategies. Therefore, we added the meta-features to our metadata that were also used by [6], [7]. First, we are extracting the number of training instances n, the number of classes c and the number of predictors p. The final meta-features are c, log (p) and log (n/p) scaled to [0, 1].” wherein the machine learning task of feature extraction that is performed by the additional machine-learned model is different from the tasks of the plurality of different machine learning tasks)
Regarding Claim 23,
Wistuba/Siems teaches the method of Claim 21 (and thus the rejection of Claim 21 is incorporated). Wistuba/Siems further discloses wherein the task performed by the additional machine-learned model is at least one of the tasks of the plurality of different machine learning tasks (Wistuba [Section VI Figure 2];
PNG
media_image16.png
234
668
media_image16.png
Greyscale
Wistuba [Section VI Subsection A]; “To compare different hyperparameter tuning strategies, we used 25 classification data sets randomly chosen from the UCI repository to create two meta-data sets. We merged existing splits, shuffled all instances and created new splits where 80% was used as train and 20% as test, respectively. One meta-data set was created as proposed by [6] using AdaBoost with decision products as weak learners [19]. This results into two hyperparameters, the number of iterations I and the number of product terms M. The target measure is the classification error. Validation errors are precomputed on a grid with values I ∈ 2, 5, 10, 20, 50, 100, 200, 500, 103, 2 · 103, 5 · 103, 104 and M ∈ {2, 3, 4, 5, 7, 10, 15, 20, 30} resulting into 108 metainstances per data set. The second meta-data set was created using a support vector machine [20]. In this case the hyperparameters are the chosen kernel (linear, polynomial or Gaussian), the trade-off between margin and training error C and kernel specific hyperparameters such as the degree of the polynomial kernel d and the width γ of the Gaussian kernel. Again, the validation errors are precomputed on a grid with values C ∈ 2−5,..., 26 , d ∈ {2,..., 10} and γ ∈ 10−4, 10−3, 10−2, 0.05, 0.1, 0.5, 1, 2, 5, 10, 20, 50, 102, 103 resulting into 288 meta-instances per data set” wherein the additional machine-learned model performs at least one of the plurality of known machine learning operations associated such as generating batches of validation error data, inputting batches of data to receive a classification output, determining parameter updates to improve model performance over trials)
Claims 24, 25, 26, and 27 recite a computing system comprising one or more processors and one or more non-transitory media storing instructions to perform the method of Claims 11, 21, 22, 23, thus
Claims 24, 25, 26, and 27 are rejected for reasons set forth in the rejection of Claims 11, 21, 22, 23 respectively.
Claim 28 is rejected under 35 U.S.C. 103 as being unpatentable over Wistuba et al. (“Sequential Model-Free Hyperparameter Tuning” [2015], hereinafter “Wistuba”) in view of Xu et al. (CN109816116A, hereinafter “Xu”) in view of Teig et al. (US12112254B1, hereinafter “Teig”)
Regarding Claim 28,
Wistuba/Xu teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Wistuba/Xu fails to disclose but Teig discloses sampling, logarithmically, one or more values for the plurality of candidate sets of hyperparameters (Teig [Column 21 Line 36];
PNG
media_image17.png
260
920
media_image17.png
Greyscale
wherein one of the possible loss functions used for determination of hyperparameters is sampled in part through logarithmic means
Teig [Column 27 Paragraph 1];
PNG
media_image18.png
621
294
media_image18.png
Greyscale
wherein the vector of hyperparameters determined through use of loss functions including aforementioned logarithmic loss function thus reads on sampling, logarithmically, a plurality of values for the plurality of candidate hyperparameter sets
Teig [Column 33 Line 7]; “For the loss function, having a discrete set of possible loss functions (e.g., a logarithmic function and a quadratic function) would not be continuously differentiable, as there is no continuous function. However, variables can be defined such that a complete loss function is defined as a first variable (A) multiplied by the logarithmic function summed with a second variable (B) multiplied by the quadratic function. This defines an infinite set of possible loss functions based on the values for variables A and B, and each of these variables can be differentiated with respect to the predictiveness (using the validation set in, e.g., the manner described above). For systems with many possible loss functions, different variables can be defined for each possible loss function, and similar techniques used.
In addition, by iteratively validating the trained network and modifying the loss function based on the validation set (part of which is then incorporated into the training set), not only can the validation system identify an optimized singular loss function, but some embodiments identify an optimal sequence of loss functions that results in the most predictive network. Using the above example, it might be optimal to have a logarithmic loss function for the initial training run, but later in the set use a quadratic loss function (or a combination of both).
Furthermore, while the example above (a linear combination of specific potential loss functions) is simple, some embodiments use a more generalized set of basis functions that allow the loss function optimization algorithm to construct any sufficiently smooth (i.e., differentiable) function. For instance, different embodiments could use a set of basis functions (e.g., Fourier or wavelet basis functions) to construct an optimized loss function (including a loss function that evolves over time).”)
It would have been obvious to use Teig’s method of determining Wistuba/Xu’s plurality of candidate sets of hyperparameter values specifically through Teig’s logarithmic sampling. One would have been motivated to do is because a complete loss function comprising log-loss “defines an infinite set of possible loss functions based on the values for variables A and B, and each of these variables can be differentiated with respect to the predictiveness (using the validation set in, e.g., the manner described above).” (Teig [Column 33 Line 14]) thus allowing logarithmic sampling-based loss functions to be fully differentiable and flexible across an infinite set of possible loss function.
Claim 32 is rejected under 35 U.S.C. 103 as being unpatentable over Wistuba et al. (“Sequential Model-Free Hyperparameter Tuning” [2015], hereinafter “Wistuba”) in view of Xu et al. (CN109816116A, hereinafter “Xu”) in view of Siems et al. (DE102019214500A1, hereinafter “Siems”).
Regarding Claim 32,
Wistuba/Xu/ teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Wistuba/Xu fails to disclose but Siems discloses wherein the plurality of different machine learning tasks comprises a first task for training a first model having a first neural network architecture and a second task for training a second model having a second neural network architecture (Siems [Page 3 Paragraph 7]; “It is further proposed that all, in particular gradient-based, architecture optimizers have the same hyperparameters during the first optimization in order to achieve a more meaningful comparison. The machine learning systems are preferably optimized on the same hardware.
It is further proposed that after the architecture optimizer has been selected, its hyperparameters are adapted as a function of the subset of the training data. The second optimization of the selected machine learning system can be carried out with the selected architecture optimizer, which is parameterized with the adapted hyperparameters.
A hyperparameter of the architecture optimizer can be understood as a parameter which influences a convergence behavior of the determined architecture of the architecture optimizer. For example, a hyperparameter can be a learning rate of the architecture optimizer or a batch size of the training data.
It is very time-consuming and often not expedient to set the architecture optimizer's hyperparameters manually. The advantage of this procedure is that the hyperparameters are set automatically, whereby a better convergence behavior of the architecture optimizer is achieved. This means that the machine learning system can be learned more quickly and more accurately during the second optimization using fewer computer resources.
It is also proposed that the optimization of the hyperparameters of the architecture optimizer is carried out during the first optimization after a predefinable number of optimization steps carried out by the architecture optimizer. The optimization of the hyper parameters is preferably carried out when the architecture optimizer has carried out an optimization step of the architecture and / or the parameters of the machine learning system.” wherein the first and second hyperparameter optimization tasks corresponding to a first pre-optimized architecture and a second post-optimized architecture (optimized by the architecture optimizer) thus reads on a plurality of tasks corresponding to different architectures of which optimized lists of hyperparameters are determined)
It would have been obvious to modify Wistuba/Xu’s plurality of tasks over which the plurality of optimized hyperparameters were derived to be correspondent to Siems’ plurality of pre and post-architecture optimization architectures. One would have been motivated to do so is because “The optimization of the hyper parameters is preferably carried out when the architecture optimizer has carried out an optimization step of the architecture and / or the parameters of the machine learning system. This has the advantage that the hyperparameters are constantly adapted to the progress of the creation of the machine learning system. This has an advantageous effect on the convergence behavior” (Siems [Page 3 Paragraph 11]).
Response to Arguments
The Examiner acknowledges the Applicant’s amendments to Claims 1, 7, 10, and 11.
Applicant’s arguments filed November 21st, 2025, traversing the rejection of claims 1-3, 6-7, 10-11 under 35 U.S.C. § 101 have been fully considered, but are not fully persuasive.
Applicant alleges on Page 9 of the Remarks that Examiner’s rejection of the limitation “wherein the ordered list of sets of hyperparameters minimizes an aggregate loss over the plurality of different machine learning tasks” does not contain any analysis of how all the elements of the claim interact, instead only brief analyzing each of the alleged additional elements in isolation. Examiner’s allegation that the specific feature defining characteristics of an optimization merely indicating a “field of use” is interpretation at a high level of granularity.
Examiner disagrees. Examiner maintains that the element of “wherein the ordered list of sets of hyperparameters minimizes an aggregate loss over the plurality of different machine learning tasks” is a “field of use” additional element even in consideration with the remaining elements of the claim. Examiner has already determined that the act of determining … an ordered list of sets of hyperparameters selected from the plurality of candidate sets of hyperparameter values merely constitutes the evaluation of the candidate sets of hyperparameters to determine an ordered list of the sets, thus corresponding to a mental process which can be done mentally or by pen and paper. Applicant’s recited element of “wherein the ordered list of sets of hyperparameters minimizes an aggregate loss over the plurality of different machine learning tasks” merely expands that the ordered list must be determined for use in minimizing aggregate loss over a plurality of machine learning tasks. In short, “determining … an ordered list of sets of hyperparameters” for the use of “minimizing an aggregate loss”. Thus, such an element is merely the particular technological environment or field of use in which the abstract idea of “determining … an ordered list of sets of hyperparameters” it to be accomplished in.
Applicant alleges on Pages 9-11 of the Remarks that the claimed invention provides technical solutions to a technical problem.
Examiner respectfully disagrees. Although applicant cites their specification and the improvements in technology present in the specification, examiner notes that such improvements in technology are not clearly recited in the present claim language. At present, applicant does not recite the improved techniques described in the specification in the claim language of Claim 1; instead, Claim 1 recites the mental process of evaluating data associated with different machine learning tasks as well as evaluating determined hyperparameter values to determine an optimized, ordered list of sets of hyperparameters. The remainder of elements in Claim 1 simply comprise additional elements that fails to incorporate the aforementioned abstract idea into practical application, and thus Claim 1 does not provide technical solutions to a technical problem.
The rejection of Claim 1 under 35 U.S.C. § 101 has been maintained. Similarly, the rejection of Claim 11 under 35 U.S.C. § 101 has been maintained.
The rejection of Claims 2-3, 6-7 and 10, under 35 U.S.C. § 101, which depend directly or indirectly from Claim 1, have been maintained.
Applicant’s arguments regarding the 35 U.S.C. § 102(a)(1) and 35 U.S.C. § 103 rejection of Claims 1-19 have been considered, but are not fully persuasive.
Applicant alleges on Pages 12 that Claim 1 has been amended to include the language of
previously-pending claim 8 to recite “wherein each of the plurality of candidate sets of hyperparameter values comprises an identification of one of a number of potential optimization algorithms. Examiner has combined previously-cited secondary reference “Xu” with primary reference “Wistuba” to thus disclose the newly amended language. As such, rejection of claim 1 under 35 U.S.C. § 102(a)(1) has been changed to be rejection of claim 1 under 35 U.S.C. § 103 by Wistuba/Xu.
Applicant alleges on Page 13 of the Remarks that Claims 11 have been amended so that the Wistuba does not disclose “wherein the optimized list of sets of hyperparameters minimizes an aggregate loss over a plurality of different tasks corresponding to a plurality of different neural network architectures”. New secondary reference “Siems” has been combined with primary reference “Wistuba” to disclose a plurality of different tasks corresponding to a plurality of different neural network architectures. As such, rejection of claim 11 under 35 U.S.C. § 102(a)(1) has been changed to be rejection of claim 11 under 35 U.S.C. § 103 by Wistuba/Siems.
Applicant alleges on Pages 13-14 of Remarks that Examiner fails to establish prima facie obviousness of Claim 1 for two reasons. First, Examiner fails to teach every element of Claim 1. Specifically, nothing in Xu appears to provide any suggestion of obtaining “candidate sets of hyperparameter values” “wherein each of the plurality of candidate sets of hyperparameter values comprises an identification of one of a number of potential optimization algorithms”.
Examiner respectfully disagrees. Xu discloses a hyperparameter optimization task comprising, in part, a first hyperparameter optimization algorithm of first algorithm type. Hyperparameter optimization inherently reads on obtaining candidate hyperparameter value sets since the hyperparameter optimization process of Xu is essentially systematically adjusting hyperparameter values to determine those that produce the best output, i.e. candidate sets of hyperparameter values. Therefore, Xu’s disclosure of series of hyperparameter optimizations of “first” and “second” algorithms and corresponding algorithm types thus reads on the plurality of candidate sets of hyperparameter values comprising, in part, an identification of one of a number of potential optimization algorithms. Additionally, examiner notes that primary reference Wistuba discloses obtaining ordered list of sets of hyperparameters from the plurality of candidate sets of hyperparameter values, and that Xu only serves to disclose identifiers for the potential optimization algorithms in the candidate sets of hyperparameter values that Wistuba uses to obtain the ordered list of sets of hyperparameters. Thus, Xu needs only disclose “candidate sets of hyperparameter values … wherein each of the plurality of candidate sets of hyperparameter values comprises an identification of one of a number of potential optimization algorithms” since the Wistuba reference originally discloses performing such obtaining of candidate sets of hyperparameter values for constructing an ordered list of sets of hyperparameters.
Applicant alleges that the Office does not articulate the proposed modification of the applied references necessary to arrive at the claimed subject matter.
Examiner has rewritten the proposed modification of the applied references to more clearly emphasize how Wistuba would be modified in order to “use Xu’s candidate hyperparameter identifiers”. Examiner states that Wistuba’s method of obtaining candidate sets of hyperparameter values and determining a subsequent ordered list of sets of hyperparameters from the obtained candidate sets for use in training an additional machine learning model would modify the obtained candidate sets of hyperparameter values to instead be Xu’s obtained candidate sets of hyperparameter values additionally comprising identifications of potential optimization algorithms. One would have been motivated to replace Wistuba’s candidate sets of hyperparameter values with Xu’s candidate sets of hyperparameter values comprising identifiers of potential optimization algorithms because: “the server determines the hyper-parameter optimization task according to the task identifier, and continues to run the hyper-parameter optimization task according to the updated task configuration file” (Xu [Page 7 Paragraph 11]), thus the algorithm hyperparameter identifiers allow the model to identify and subsequently select which algorithm should be used for remaining optimization. As such, the Office now more clearly articulates the “proposed modification of the applied reference(s) necessary to arrive at the claimed subject matter” as required by MPEP § 2142 (C).
The rejection of Claim 1 under 35 U.S.C. § 103 has been maintained. Similarly, the rejection of Claim 11 under 35 U.S.C. § 103 has been maintained.
The rejection of Claims 2-3, 6-7 and 10, under 35 U.S.C. § 103, which depend directly or indirectly from Claim 1, have been maintained.
Conclusion
Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN J KIM whose telephone number is (571)272-0523. The examiner can normally be reached 8-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matt El can be reached on (571) 270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JONATHAN J KIM/Examiner, Art Unit 2141
/MATTHEW ELL/Supervisory Patent Examiner, Art Unit 2141