DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to the claims filed on 01/23/2026. Claims 1, 11, 12, 16, 22-25, 33, 36, 58, 65, 68, 70, 72-75, 78, and 86-88 are pending for examination.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/23/2026 has been entered.
Response to Arguments
Applicant’s arguments with respect to claims rejected under 35 USC 102 and 103 (Remarks, 9-16) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 23, 36, and 78 are rejected under 35 U.S.C. 103 as being unpatentable by Abeliuk et al. (US 20200202241 A1), hereafter referred to as Abeliuk, in view of Yao et al. (Yao, Y., Vehtari, A., Simpson, D., & Gelman, A. (2018). Using stacking to average Bayesian predictive distributions (with discussion).), hereafter referred to as Yao, and in further view of Khiari et al., (US 20190303795 A1), hereafter referred to as Khiari.
Regarding claim 1, Abeliuk teaches:
a system for training a probabilistic predictive model for recommending experiment designs for synthetic biology comprising: (Abeliuk, paragraph 67, “Returning to Fig. 4, at step 402 an objective function is determined based at least in part on the plurality of desired phenotypic attributes. The process of recommending genotypes for experimentation involves maximization of the objective function, as well as maximization of an acquisition function and additional steps, as discussed in greater detail further below”, Abeliuk expressly teaches a predictive-model-based optimization workflow for recommending genotypes for experimentation in synthetic biology based on desired phenotypic attributes, thereby teaching a system for training and using a probabilistic predictive model for recommending experiment designs for synthetic biology.)
non-transitory memory configured to store executable instructions (Abeliuk, paragraph 150, “Specialized computing environment 1100 can be made up of one or more computing devices that include a memory 1101 that is a non-transitory computer readable medium and can be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two.”)
and a hardware processor in communication with the non-transitory memory, the hardware processor programmed by the executable instructions to: (Abeliuk, paragraph 153, “All of the software stored within memory 1101 can be stored as a computer-readable instructions, that when executed by one or more processors 1102, cause the processors to perform the functionality described with respect to Figs. 1-10.”);
receive synthetic biology experimental data (Abeliuk, paragraph 26, “At step 102 genotype information in a plurality of experimental data points corresponding to the set of constraints is encoded as a plurality of experiential genotype vectors, the plurality of experimental data points comprising the genotype information and phenotype information corresponding to the genotype information”, biological experimental data (genotype information in a plurality of experimental data points) is received.);
generate training data from the synthetic biology experimental data, wherein the training data comprise a plurality of training inputs and corresponding reference outputs, (Abeliuk, paragraph 26 “At step 102 genotype information in a plurality of experimental data points corresponding to the set of constraints is encoded as a plurality of experiential genotype vectors, the plurality of experimental data points comprising the genotype information and phenotype information corresponding to the genotype information”,
Abeliuk, paragraph 28, “Each experimental data point can include phenotypic measurements and corresponding genotype data. For example, the experimental data point can include corresponding to a particular genetic sequence, gene, and/or gene fragment and can also include phenotypic measurements that correspond to that particular genetic sequence, gene, and/or gene fragment. The phenotypic measurements can be measurements that were experimentally determined in previous experiments. The experimental data points can be configured to link genotype data with phenotypic measurements in a memory of the database, such as through a relational database, directed graph, or other techniques.”, Abeliuk teaches that the experimental data points include genotype information together with corresponding phenotype information/measurements, and that the genotype information is encoded as experiential genotype vectors. Thus, Abeliuk teaches generating training data from the synthetic biology experimental data, where the genotype information forms the plurality of training inputs and the corresponding phenotype measurements form the corresponding reference outputs.)
wherein each of the plurality of training inputs comprises training values of input variables, (Abeliuk, paragraph 33, “At step 202 the genotypes associated with the identified plurality of experimental data points are encoded as a plurality of experiential genotype vectors”, here, genotype information is treated as the “input variables” for training. Encoding these genotypes as vectors shows that each training input is composed of specific values (the “training values”)
and wherein each of the plurality of reference outputs comprises a reference value of at least one response variable associated with a predetermined response variable objective (Abeliuk, paragraph 28, “Each experimental data point can include phenotypic measurements and corresponding genotype data. For example, the experimental data point can include corresponding to a particular genetic sequence, gene, and/or gene fragment and can also include phenotypic measurements that correspond to that particular genetic sequence, gene, and/or gene fragment. The phenotypic measurements can be measurements that were experimentally determined in previous experiments.”,
Abeliuk, paragraph 67, “Returning to FIG. 4, at step 402 an objective function is determined based at least in part on the plurality of desired phenotypic attributes.”, Abeliuk teaches that the experimentally determined phenotypic measurements are the claimed reference outputs and reference values of the response variable, and further teaches that those phenotypic measurements are evaluated relative to an objective function determined from desired phenotypic attributes, thereby teaching that the response variable is associated with a predetermined response variable objective.);
determine a surrogate function with an input experiment design as an input, the surrogate function comprising an expected value of the at least one response variable determined using the input experiment design, (Abeliuk, paragraph 51, “When ranking candidates, SMBO methods usually use a scalar score that combines the predicted value with an estimation of its uncertainty for each sample.”, Abeliuk teaches that candidate ranking in SMBO uses a scalar score combining a predicted value and uncertainty
Abeliuk, paragraph 67, “Returning to FIG. 4, at step 402 an objective function is determined based at least in part on the plurality of desired phenotypic attributes. The process of recommending genotypes for experimentation involves maximization of the objective function, as well as maximization of an acquisition function”,
Abeliuk, claim 19, “identify the plurality of experimental data points in a database of experimental data points based at least in part on one or more of: at least one available genotype in the plurality of available genotypes and at least one desired phenotypic attribute in the plurality of desired phenotypic attributes; and encode genotypes associated with the identified plurality of experimental data points as the plurality of experiential genotype vectors.”, Abeliuk teaches applying the phenotype prediction model, objective function, or acquisition function to a plurality of new genotype vectors to generate corresponding prediction or acquisition scores. Thus, Abeliuk teaches a surrogate function that takes a candidate experiment design/new genotype vector as input, and the predicted-value portion of that score is the expected value of the response variable for that input experiment design.)
a variance of the value of the at least one response variable determined using the input experiment design, (Abeliuk, paragraph 54, “With this method, the RF prediction is used as an estimation of the statistical mean of the surrogate model’s predictions, for which a gaussian distribution is assumed. The calculation of the variance of the prediction considers RF’s estimators deviation and the leaf training variance for each tree.”, this explicitly teaches how variance is calculated for the predicted response variable.)
and an exploitation- exploration trade-off parameter; and (Abeliuk, paragraph 51, “Also, the consideration of an uncertainty term in the acquisition function promotes the exploration and the diversity of recommendations, helping to avoid local optima. One of the most commonly used is the Expected Improvement (EI).”, a parameter, Expected Improvement, balances predicted values (exploitation) with variance (exploration))
determine, using the surrogate function, a plurality of recommended experiment designs, each comprising recommended values of the input variables, for a next cycle of a synthetic biology experiment for obtaining a predetermined response variable objective associated with the at least one response variable. (Abeliuk, paragraph 148, “Fig. 10 illustrates the global SMBO graph, which shows the most important elements of the procedure performed to find the suggested data points (suggested compounds based upon the predictive model) according to an exemplary embodiment. The set of suggested points are then evaluated and used to define the next experiments that, once recorded, are used to repeat the procedure and suggest a new set of observations.”, here the surrogate function (the acquisition function as described earlier) is applied to propose a set of suggested data points (suggested experimental compounds) to design a next experiment.)
Yao, in the same field of Bayesian ensemble implementation, teaches the following limitations which the above prior art fails to teach:
train, using the training data, a plurality of level-0 learners of a probabilistic predictive model for recommending experiment designs for synthetic biology, wherein an input of each of the plurality of level-0 learners comprises input values of the input variables, and wherein an output of each of the plurality of level-0 learners comprises a predicted value of at least one response variable (Yao, page 3, paragraph 3, “In supervised learning, where the data are ((xi,yi),i=1,...,n) and each model Mk has a parametric form
PNG
media_image1.png
18
81
media_image1.png
Greyscale
, stacking is done in two steps (Ting and Witten, 1999). In the first, baseline-level, step, each model is fitted separately and the leave-one-out (LOO) predictor
PNG
media_image2.png
20
160
media_image2.png
Greyscale
is obtained for each model k and each data point i. In the second, meta-level, step, a weight for each model is obtained by minimizing the mean squared error, treating the leave-one-out predictors from the previous stage as covariates
PNG
media_image3.png
51
244
media_image3.png
Greyscale
”, Yao expressly teaches a multi-model stacked-learning framework in supervised learning where the data are (x_i, y_i), each model M_k has predictive form
PNG
media_image1.png
18
81
media_image1.png
Greyscale
, and, in the first stage, each model is fitted separately. That directly teaches a plurality of level-0 learners rather than a single model. Because each model receives x_i as its input and produces a model-specific y_k, Yao teaches that the input of each level-0 learner comprises input values of the input variables and the output of each level-0 learner comprises a predicted value of the response variable.);
train, using (i) predicted values of the at least one response variable determined using the plurality of level-0 learners for the training inputs of the plurality of training inputs, and (ii) the reference outputs of the plurality of reference outputs correspondence to the training inputs of the plurality of training inputs, a level-1 learner (Yao, page 3, paragraph 3, “In the second, meta-level, step, a weight for each model is obtained by minimizing the mean squared error, treating the leave-one-out predictors from the previous stage as covariates
PNG
media_image3.png
51
244
media_image3.png
Greyscale
”,
Yao, page 7, paragraph 3, “For simplicity, we remove all covariates x in the notation. Suppose we have a set of probabilistic models M=(M1,...,MK); then the goal in stacking is to find an optimal super-model in the convex linear combination with the form
PNG
media_image4.png
17
236
media_image4.png
Greyscale
0”, Yao expressly teaches the claimed second-stage training of a higher-level learner. In Yao, the first-stage models generate predictions for each training point, and the second, meta-level, step determines model weights using the observed yi values while treating the first-stage predictions as covariates. Thus, Yao uses predicted values from the plurality of level-0 learners for the training inputs and also uses the corresponding true observed outputs yi, which are the claimed reference outputs corresponding to the training inputs, to train a distinct higher-level combiner. Yao further characterizes the result of the meta-level training as an “optimal super-model,” which teaches the claimed level-1 learner.)
a level-1 learner of the probabilistic predictive model for recommending experiment designs for synthetic biology comprising a probabilistic ensemble of the plurality of level-0 learners, wherein an output of the level-1 learner comprises a predicted probabilistic distribution of the at least one response variable, wherein the level-1 learner comprises a Bayesian ensemble of the plurality of level-0 learners; (Yao, page 4, paragraph 4, “In this paper, we extend stacking from minimizing the squared error to maximizing scoring rules, hence make stacking applicable to combining a set of Bayesian posterior predictive distribution.”
Yao, page 7, paragraph 3, “For simplicity, we remove all covariates x in the notation. Suppose we have a set of probabilistic models M=(M1,...,MK); then the goal in stacking is to find an optimal super-model in the convex linear combination with the form
PNG
media_image4.png
17
236
media_image4.png
Greyscale
0… Eventually, the combined estimation of the predictive density is
PNG
media_image5.png
46
165
media_image5.png
Greyscale
. When using logarithmic score (corresponding to Kullback-Leibler divergence),we call this stacking of predictive distributions:
PNG
media_image6.png
42
377
media_image6.png
Greyscale
”, Yao is directed specifically to combining Bayesian predictive distributions from a set of probabilistic models into an optimal super-model. The cited passages teach that the level-1 learner is a probabilistic ensemble because it combines predictive densities/distributions from the plurality of level-0 learners, that the output of the level-1 learner is a predicted probabilistic distribution of the response variable in the form of the combined predictive density p̂(ỹ|y), and that the ensemble is Bayesian because the combined component outputs are expressly Bayesian predictive distributions / Bayesian posterior predictive distributions).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the surrogate-model framework of Abeliuk with the stacked Bayesian predictive-distribution technique of Yao. Abeliuk teaches using a predictive surrogate model within a sequential model-based optimization workflow for recommending genotypes for experimentation in synthetic biology, and further teaches that the surrogate may be implemented using machine learning, ensemble-based algorithms, and Bayesian approaches that provide both a prediction value and an estimation of uncertainty. Yao teaches a known two-stage stacking technique in which multiple separately trained models produce first-stage predictions and a higher-level super-model combines those predictions using observed outputs to generate a predictive distribution. A person of ordinary skill in the art would have been motivated to apply Yao’s stacked Bayesian predictive-distribution technique to Abeliuk’s surrogate model in order to improve predictive performance and uncertainty estimation in the synthetic-biology optimization setting, particularly where no single model class is uniformly optimal, while preserving Abeliuk’s existing objective-function, acquisition-function, and next-experiment recommendation framework.
Khiari, in the same field of machine learning ensemble implementation, teaches the following limitations which the above prior art fails to teach:
wherein the plurality of level-0 learners comprise different types of machine learning models; (Khiari, paragraph 3, “Ensemble learning approaches combine hypotheses from different algorithms, and are able to achieve better results than single models that explore only one hypothesis at a time… Alternatively, models can be generated with different induction algorithms, in which case the ensemble is heterogeneous.”, Khiari expressly teaches that ensemble models may be formed from models generated using different induction algorithms, and states that, in that case, the ensemble is heterogeneous. A person of ordinary skill in the art would understand that models produced by different induction algorithms are different types of machine-learning models. Because Khiari’s disclosure is expressly directed to a set of base learners in an ensemble-learning framework, its teaching maps directly to the claimed requirement that the plurality of level-0 learners comprise different types of machine learning models.)
It would have been further obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to implement the plurality of base learners in the Abeliuk/Yao combination using the heterogeneous ensemble approach taught by Khiari. While Yao teaches the two-stage stacked architecture and Bayesian combination of predictive distributions, Khiari expressly teaches that ensemble learning approaches combine hypotheses from different algorithms and that models can be generated with different induction algorithms, in which case the ensemble is heterogeneous. A person of ordinary skill in the art would have been motivated to use Khiari’s heterogeneous base-learner approach in the Abeliuk-Yao stacked surrogate model so that the level-0 learners would comprise different types of machine-learning models, thereby leveraging the known benefit that different model classes capture different hypotheses and can achieve better results than a single model type. Doing so would have predictably yielded a stacked surrogate model for Abeliuk’s synthetic-biology optimization framework in which the base learners are of different model types, the meta-level learner combines their outputs as taught by Yao, and the overall system retains Abeliuk’s surrogate-based recommendation of next experiment designs.
Regarding claim 23, Abeliuk, Yao, and Khiari teaches the system of claim 1,
wherein to train the plurality of level-1 learner, the hardware processor is programmed by the executable instructions to: determine, using the plurality of level-0 learners, the predicted values of the at least one response variable for training inputs of the plurality of training inputs (Abeliuk, paragraph 57, “Random Forests are models of the type ensemble that are based on grouping multiple models of low complexity. They carry this name because they use multiple Decision Trees whose prediction is averaged to obtain the final estimation of the model.”, each tree (a level-0 learner) produces its own predicted value of the response variable. The final model (level-1 learner) uses those predicted values from the trees as inputs to form an aggregate prediction.).
Regarding claim 36, Abeliuk, Yao, and Khiari teaches the system of claim 1,
wherein to train the plurality of level-0 learners, the hardware processor is programmed by the executable instructions to: generate a first subset of the training data (Abeliuk, paragraph 28, “At step 102 genotype information in a plurality of experimental data points corresponding to the set of constraints is encoded as a plurality of experiential genotype vectors, the plurality of experimental data points comprising the genotype information and phenotype information corresponding to the genotype information”, each data point (training input) is tied to a corresponding measured phenotype (reference output).)
and train, using the first subset of the training data, the plurality of level-0 learners (Abeliuk, paragraph 57, “Random Forests are models of the type ensemble that are based on grouping multiple models of low complexity. They carry this name because they use multiple Decision Trees whose prediction is averaged to obtain the final estimation of the model.”, shows that the surrogate (here, a Random Forest) consists of multiple base learners, each decision tree acting as an individual learner (i.e. a “level-0 learner”). Each tree takes input values (the input variables) and outputs a predicted value for at least one response variable, matching the claimed plurality of level-0 learners.
Paragraph 45, “At step 103 a phenotype prediction model is trained based at least in part on the plurality of experiential genotype vectors, the corresponding phenotype information, and the one or more constraints.”, this describes the training of the phenotype prediction model.).
Regarding claim 78, Abeliuk teaches the limitation:
A method for recommending experiment designs for synthetic biology comprising: (Abeliuk, paragraph 67, “Returning to Fig. 4, at step 402 an objective function is determined based at least in part on the plurality of desired phenotypic attributes. The process of recommending genotypes for experimentation involves maximization of the objective function, as well as maximization of an acquisition function and additional steps, as discussed in greater detail further below”, overview of the process described in Abeliuk describes using a probabilistic predictive model “maximizing objective functions” for recommending experiment designs for synthetic biology (genotypes for experimentation))
under control of a hardware processor: (Abeliuk, paragraph 153, “All of the software stored within memory 1101 can be stored as a computer-readable instructions, that when executed by one or more processors 1102, cause the processors to perform the functionality described with respect to Figs. 1-10.”)
wherein an input of each of the plurality of level-0 learners comprises input values of the input variables, (Abeliuk, paragraph 33, “At step 202 the genotypes associated with the identified plurality of experimental data points are encoded as a plurality of experiential genotype vectors”, here, genotype information is treated as the “input variables” for training.)
wherein an output of each of the plurality of level-0 learners comprises a predicted value of at least one response variable, (Abeliuk, paragraph 28, “Each experimental data point can include phenotypic measurements and corresponding genotype data. For example, the experimental data point can include corresponding to a particular genetic sequence, gene, and/or gene fragment and can also include phenotypic measurements that correspond to that particular genetic sequence, gene, and/or gene fragment. The phenotypic measurements can be measurements that were experimentally determined in previous experiments.”,
Abeliuk, paragraph 67, “Returning to FIG. 4, at step 402 an objective function is determined based at least in part on the plurality of desired phenotypic attributes.”, Abeliuk teaches that the experimentally determined phenotypic measurements are the claimed reference outputs and reference values of the response variable, and further teaches that those phenotypic measurements are evaluated relative to an objective function determined from desired phenotypic attributes, thereby teaching that the response variable is associated with a predetermined response variable objective.);
determining a surrogate function comprising an expected value of the level-1 learner, (Abeliuk, paragraph 54, “With this method, the RF prediction is used as an estimation of the statistical mean of the surrogate model’s predictions, for which a gaussian distribution is assumed. The calculation of the variance of the prediction considers RF’s estimators deviation and the leaf training variance for each tree. Both estimations are combined using the Law of total variance.”, the Random Forest model itself comprises the surrogate function used to approximate the individual model’s predictions.
Paragraph 37, “The SMBO (Sequential Model Based Optimization) strategy for biological systems can be applied using different ways to represent data. The most direct way is to just use labels. With the “label representation” the gene variants and promoter sequences are represented with nominal variables, so the model is expected to learn from data how these labels are related to each other and how they affect the outcome.”, the RF model described in this system is of a SMBO strategy, which is trained using supervised learning as shown through its use of labeling. The expected value (or label) obtained from averaging individual tree outputs (the level-0 learners) forms a basis for optimization of the model’s parameters.)
a variance of the level-1 learner, (Abeliuk, paragraph 54, “With this method, the RF prediction is used as an estimation of the statistical mean of the surrogate model’s predictions, for which a gaussian distribution is assumed. The calculation of the variance of the prediction considers RF’s estimators deviation and the leaf training variance for each tree.”, this explicitly teaches how variance is calculated for the predicted response variable.)
and an exploitation-exploration trade-off parameter (Abeliuk, paragraph 51, “Also, the consideration of an uncertainty term in the acquisition function promotes the exploration and the diversity of recommendations, helping to avoid local optima. One of the most commonly used is the Expected Improvement (EI).”, a parameter, Expected Improvement, balances predicted values (exploitation) with variance (exploration));
and determining, using the surrogate function, a plurality of recommended experiment designs, each comprising recommended values of the input variables, for a next cycle of the synthetic biology experiment for achieving a predetermined response variable objective associated with the at least one response variable (Abeliuk, paragraph 148, “Fig. 10 illustrates the global SMBO graph, which shows the most important elements of the procedure performed to find the suggested data points (suggested compounds based upon the predictive model) according to an exemplary embodiment. The set of suggested points are then evaluated and used to define the next experiments that, once recorded, are used to repeat the procedure and suggest a new set of observations.”, here the surrogate function (the acquisition function as described earlier) is applied to propose a set of suggested data points (suggested experimental compounds) to design a next experiment.).
Yao, in the same field of Bayesian ensemble implementation, teaches the following limitations which the above prior art fails to teach:
receiving a probabilistic predictive model for recommending experiment designs for synthetic biology comprising a plurality of level-0 learners (Yao, page 3, paragraph 3, “In supervised learning, where the data are ((xi,yi),i=1,...,n) and each model Mk has a parametric form
PNG
media_image1.png
18
81
media_image1.png
Greyscale
, stacking is done in two steps (Ting and Witten, 1999). In the first, baseline-level, step, each model is fitted separately and the leave-one-out (LOO) predictor
PNG
media_image2.png
20
160
media_image2.png
Greyscale
is obtained for each model k and each data point i. In the second, meta-level, step, a weight for each model is obtained by minimizing the mean squared error, treating the leave-one-out predictors from the previous stage as covariates
PNG
media_image3.png
51
244
media_image3.png
Greyscale
”, Yao expressly teaches a multi-model stacked-learning framework in supervised learning where the data are (x_i, y_i), each model M_k has predictive form
PNG
media_image1.png
18
81
media_image1.png
Greyscale
, and, in the first stage, each model is fitted separately. That directly teaches a plurality of level-0 learners rather than a single model. Because each model receives x_i as its input and produces a model-specific y_k, Yao teaches that the input of each level-0 learner comprises input values of the input variables and the output of each level-0 learner comprises a predicted value of the response variable.);
and a level-1 learner (Yao, page 3, paragraph 3, “In the second, meta-level, step, a weight for each model is obtained by minimizing the mean squared error, treating the leave-one-out predictors from the previous stage as covariates
PNG
media_image3.png
51
244
media_image3.png
Greyscale
”,
Yao, page 7, paragraph 3, “For simplicity, we remove all covariates x in the notation. Suppose we have a set of probabilistic models M=(M1,...,MK); then the goal in stacking is to find an optimal super-model in the convex linear combination with the form
PNG
media_image4.png
17
236
media_image4.png
Greyscale
0”, Yao expressly teaches the claimed second-stage training of a higher-level learner. In Yao, the first-stage models generate predictions for each training point, and the second, meta-level, step determines model weights using the observed yi values while treating the first-stage predictions as covariates. Thus, Yao uses predicted values from the plurality of level-0 learners for the training inputs and also uses the corresponding true observed outputs yi, which are the claimed reference outputs corresponding to the training inputs, to train a distinct higher-level combiner. Yao further characterizes the result of the meta-level training as an “optimal super-model,” which teaches the claimed level-1 learner.)
and the level-1 learner comprises a Bayesian ensemble of the plurality of level-0 learners; (Yao, page 4, paragraph 4, “In this paper, we extend stacking from minimizing the squared error to maximizing scoring rules, hence make stacking applicable to combining a set of Bayesian posterior predictive distribution.”
Yao, page 7, paragraph 3, “For simplicity, we remove all covariates x in the notation. Suppose we have a set of probabilistic models M=(M1,...,MK); then the goal in stacking is to find an optimal super-model in the convex linear combination with the form
PNG
media_image4.png
17
236
media_image4.png
Greyscale
0… Eventually, the combined estimation of the predictive density is
PNG
media_image5.png
46
165
media_image5.png
Greyscale
. When using logarithmic score (corresponding to Kullback-Leibler divergence),we call this stacking of predictive distributions:
PNG
media_image6.png
42
377
media_image6.png
Greyscale
”, Yao is directed specifically to combining Bayesian predictive distributions from a set of probabilistic models into an optimal super-model. The cited passages teach that the level-1 learner is a probabilistic ensemble because it combines predictive densities/distributions from the plurality of level-0 learners, that the output of the level-1 learner is a predicted probabilistic distribution of the response variable in the form of the combined predictive density p̂(ỹ|y), and that the ensemble is Bayesian because the combined component outputs are expressly Bayesian predictive distributions / Bayesian posterior predictive distributions).
wherein the plurality of level-0 learners and the level-1 learner are trained using training data obtained from one or more cycles of a synthetic biology experiment comprising a plurality of training inputs and corresponding reference outputs, wherein each of the plurality of training inputs comprises training values of input variables, and wherein each of the plurality of reference outputs comprises a reference value of at least one response variable associated with a predetermined response variable objective (Yao, page 3, paragraph 3, “In supervised learning, where the data are ((xi,yi),i=1,...,n) and each model Mk has a parametric form
PNG
media_image1.png
18
81
media_image1.png
Greyscale
, stacking is done in two steps (Ting and Witten, 1999). In the first, baseline-level, step, each model is fitted separately and the leave-one-out (LOO) predictor
PNG
media_image2.png
20
160
media_image2.png
Greyscale
is obtained for each model k and each data point i. In the second, meta-level, step, a weight for each model is obtained by minimizing the mean squared error, treating the leave-one-out predictors from the previous stage as covariates
PNG
media_image3.png
51
244
media_image3.png
Greyscale
”, Yao expressly teaches a multi-model stacked-learning framework in supervised learning where the data are (x_i, y_i), each model M_k has predictive form
PNG
media_image1.png
18
81
media_image1.png
Greyscale
, and, in the first stage, each model is fitted separately. That directly teaches a plurality of level-0 learners rather than a single model. Because each model receives x_i as its input and produces a model-specific y_k, Yao teaches that the input of each level-0 learner comprises input values of the input variables and the output of each level-0 learner comprises a predicted value of the response variable.
Yao, page 3, paragraph 3, “In the second, meta-level, step, a weight for each model is obtained by minimizing the mean squared error, treating the leave-one-out predictors from the previous stage as covariates
PNG
media_image3.png
51
244
media_image3.png
Greyscale
”,
Yao, page 7, paragraph 3, “For simplicity, we remove all covariates x in the notation. Suppose we have a set of probabilistic models M=(M1,...,MK); then the goal in stacking is to find an optimal super-model in the convex linear combination with the form
PNG
media_image4.png
17
236
media_image4.png
Greyscale
0”, Yao expressly teaches the claimed second-stage training of a higher-level learner. In Yao, the first-stage models generate predictions for each training point, and the second, meta-level, step determines model weights using the observed yi values while treating the first-stage predictions as covariates. Thus, Yao uses predicted values from the plurality of level-0 learners for the training inputs and also uses the corresponding true observed outputs yi, which are the claimed reference outputs corresponding to the training inputs, to train a distinct higher-level combiner. Yao further characterizes the result of the meta-level training as an “optimal super-model,” which teaches the claimed level-1 learner.);
wherein the level-1 learner comprises a probabilistic ensemble of the plurality of level-0 learners, (Yao, page 3, paragraph 3, “In the second, meta-level, step, a weight for each model is obtained by minimizing the mean squared error, treating the leave-one-out predictors from the previous stage as covariates
PNG
media_image3.png
51
244
media_image3.png
Greyscale
”,
Yao, page 7, paragraph 3, “For simplicity, we remove all covariates x in the notation. Suppose we have a set of probabilistic models M=(M1,...,MK); then the goal in stacking is to find an optimal super-model in the convex linear combination with the form
PNG
media_image4.png
17
236
media_image4.png
Greyscale
0”, Yao expressly teaches the claimed second-stage training of a higher-level learner. In Yao, the first-stage models generate predictions for each training point, and the second, meta-level, step determines model weights using the observed yi values while treating the first-stage predictions as covariates. Thus, Yao uses predicted values from the plurality of level-0 learners for the training inputs and also uses the corresponding true observed outputs yi, which are the claimed reference outputs corresponding to the training inputs, to train a distinct higher-level combiner. Yao further characterizes the result of the meta-level training as an “optimal super-model,” which teaches the claimed level-1 learner.)
wherein an output of the level-1 learner comprises a predicted probabilistic distribution of the at least one response variable (Yao, page 4, paragraph 4, “In this paper, we extend stacking from minimizing the squared error to maximizing scoring rules, hence make stacking applicable to combining a set of Bayesian posterior predictive distribution.”
Yao, page 7, paragraph 3, “For simplicity, we remove all covariates x in the notation. Suppose we have a set of probabilistic models M=(M1,...,MK); then the goal in stacking is to find an optimal super-model in the convex linear combination with the form
PNG
media_image4.png
17
236
media_image4.png
Greyscale
0… Eventually, the combined estimation of the predictive density is
PNG
media_image5.png
46
165
media_image5.png
Greyscale
. When using logarithmic score (corresponding to Kullback-Leibler divergence),we call this stacking of predictive distributions:
PNG
media_image6.png
42
377
media_image6.png
Greyscale
”, Yao is directed specifically to combining Bayesian predictive distributions from a set of probabilistic models into an optimal super-model. The cited passages teach that the level-1 learner is a probabilistic ensemble because it combines predictive densities/distributions from the plurality of level-0 learners, that the output of the level-1 learner is a predicted probabilistic distribution of the response variable in the form of the combined predictive density p̂(ỹ|y), and that the ensemble is Bayesian because the combined component outputs are expressly Bayesian predictive distributions / Bayesian posterior predictive distributions).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the surrogate-model framework of Abeliuk with the stacked Bayesian predictive-distribution technique of Yao. Abeliuk teaches using a predictive surrogate model within a sequential model-based optimization workflow for recommending genotypes for experimentation in synthetic biology, and further teaches that the surrogate may be implemented using machine learning, ensemble-based algorithms, and Bayesian approaches that provide both a prediction value and an estimation of uncertainty. Yao teaches a known two-stage stacking technique in which multiple separately trained models produce first-stage predictions and a higher-level super-model combines those predictions using observed outputs to generate a predictive distribution. A person of ordinary skill in the art would have been motivated to apply Yao’s stacked Bayesian predictive-distribution technique to Abeliuk’s surrogate model in order to improve predictive performance and uncertainty estimation in the synthetic-biology optimization setting, particularly where no single model class is uniformly optimal, while preserving Abeliuk’s existing objective-function, acquisition-function, and next-experiment recommendation framework.
Khiari, in the same field of machine learning ensemble implementation, teaches the following limitations which the above prior art fails to teach:
wherein the plurality of level-0 learners comprise different types of machine learning models (Khiari, paragraph 3, “Ensemble learning approaches combine hypotheses from different algorithms, and are able to achieve better results than single models that explore only one hypothesis at a time… Alternatively, models can be generated with different induction algorithms, in which case the ensemble is heterogeneous.”, Khiari expressly teaches that ensemble models may be formed from models generated using different induction algorithms, and states that, in that case, the ensemble is heterogeneous. A person of ordinary skill in the art would understand that models produced by different induction algorithms are different types of machine-learning models. Because Khiari’s disclosure is expressly directed to a set of base learners in an ensemble-learning framework, its teaching maps directly to the claimed requirement that the plurality of level-0 learners comprise different types of machine learning models.)
It would have been further obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to implement the plurality of base learners in the Abeliuk/Yao combination using the heterogeneous ensemble approach taught by Khiari. While Yao teaches the two-stage stacked architecture and Bayesian combination of predictive distributions, Khiari expressly teaches that ensemble learning approaches combine hypotheses from different algorithms and that models can be generated with different induction algorithms, in which case the ensemble is heterogeneous. A person of ordinary skill in the art would have been motivated to use Khiari’s heterogeneous base-learner approach in the Abeliuk-Yao stacked surrogate model so that the level-0 learners would comprise different types of machine-learning models, thereby leveraging the known benefit that different model classes capture different hypotheses and can achieve better results than a single model type. Doing so would have predictably yielded a stacked surrogate model for Abeliuk’s synthetic-biology optimization framework in which the base learners are of different model types, the meta-level learner combines their outputs as taught by Yao, and the overall system retains Abeliuk’s surrogate-based recommendation of next experiment designs.
Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Abeliuk in view of Yao, Khiari, and Zampieri et al. (Zampieri, G., Vijayakumar, S., Yaneske, E., & Angione, C. (2019). Machine and deep learning meet genome-scale metabolic modeling. PLoS computational biology, 15(7), e1007084. doi.org/10.1371/journal.pcbi.1007084), hereafter referred to as Zampieri.
Regarding claim 22, Abeliuk, Yao, and Khiari teaches the system of claim 1, Zampieri, in the same field of machine learning parameter optimization, teaches the following:
wherein the predetermined response variable objective comprises a maximization objective, a minimization objective, or a specification objective, and/or wherein the predetermined response variable objective comprises maximizing the at least one response variable, minimizing the at least one response variable, or adjusting the at least one response variable to a predetermined value of the at least one response variable (Zampieri, page 6, figure 2, “Constraints are applied to the model to identify a given metabolic goal, represented as the objective function c, and linear or quadratic optimization is used to maximize or minimize this objective”, constraints are applied to the training inputs with an objective function used to maximize or minimize the objective).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the functionalities disclosed by Abeliuk, Yao, and Khiari with the techniques disclosed by Zampieri (i.e., using a maximization or minimization objective to optimize parameters). A motivation for the combination is to accurately model flux reactions, allowing for optimization of metabolic reactions. (Zampieri, page 6, paragraph 2, “Modeling fluxes can be crucial for gaining a better understanding of both metabolic activity and wider biological phenomena [10]. At a reaction and pathway level, flux balance analysis (FBA) is currently the most widely used tool to estimate the flow of metabolites in metabolic networks [46]. FBA allows determination of the flux configuration that yields maximal or minimal rate through one or more target reactions.”).
Claims 11, 12, 16, 65, and 86 are rejected under 35 U.S.C. 103 as being unpatentable over Abeliuk in view of Yao, Khiari and Valentini (Valentini G. (2014). Hierarchical ensemble methods for protein function prediction. ISRN bioinformatics, 2014, 901419. doi.org/10.1155/2014/901419), hereinafter Valentini.
Regarding claim 11, Abeliuk teaches the system of claim 1. Valentini, in the same field of machine learning for optimization of biological processes, teaches the following:
The system of claim 1, wherein the synthetic biology experimental data is sparse (Valentini, page 12, col.1, paragraph 1, line 2, “However, since with uniform coefficients the H-loss can be made small simply by predicting sparse multilabels”, values in the data set can be sparse).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the functionalities disclosed by Abeliuk, Yao, and Khiari with the techniques disclosed by Valentini (i.e., ensemble machine learning of synthetic biology data). A motivation for the combination is to have efficient classification/regression of synthetic biology experimental values (Valentini, page 4, section 3, col. 1, “Empirical studies showed that in both classification and regression problems ensembles improve on single learning machines, and, moreover, large experimental studies compared the effectiveness of different ensemble methods on benchmark data sets”).
Regarding claim 12, Abeliuk, Yao, and Khiari teaches the system of claim 1. Valentini teaches:
The system of claim 1, wherein a number of the plurality of training inputs in the synthetic biology experiment data is a number of experimental conditions, a number of strains, a number of replicates of a strain of the strains, or a combination thereof (Valentini, page 5, section 3.1, paragraph 1, “A gene/gene product 𝑔 can be represented through a vector x ∈ R𝑑 having 𝑑 different features (e.g., gene expression levels across 𝑑 different conditions, sequence similarities with other genes/proteins, or presence or absence of a given domain in the corresponding protein or genetic or physical interaction with other proteins)”, basic notation for the methods disclosed in Valentini, the plurality of training inputs is represented by the vector x, the number of inputs is equal to a number of experimental conditions of a gene/gene product).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the functionalities disclosed by Abeliuk, Yao, and Khiari with the techniques disclosed by Valentini. A motivation for the combination is similarly disclosed above in claim 11.
Regarding claim 16, Abeliuk, Yao, and Khiari teaches the system of claim 1. Valentini teaches:
The system of claim 1, wherein one, or each, of the plurality of input variables and/or the at least one response variable comprises a promoter sequence, an induction time, an induction strength, a ribosome binding sequence, a copy number of a gene, a transcription level of a gene, an epigenetics state of a gene, a level of a protein, a post translation modification state of a protein, a level of a molecule, an identity of a molecule, a level of a microbe, a state of a microbe, a state of a microbiome, a titer, a rate, a yield, or a combination thereof, optionally wherein the molecule comprises an inorganic molecule, an organic molecule, a protein, a polypeptide, a carbohydrate, a sugar, a fatty acid, a lipid, an alcohol, a fuel, a metabolite, a drug, an anticancer drug, a biofuel, a flavoring molecule, a fertilizer molecule, or a combination thereof (Valentini, page 3, section 2.2, col. 1, “These methods usually represent each dataset through an undirected graph 𝐺 = (𝑉, 𝐸) where nodes V ∈ 𝑉 correspond to gene/gene products”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the functionalities disclosed by Abeliuk, Yao, and Khiari with the techniques disclosed by Valentini. A motivation for the combination is similarly disclosed above in claim 11.
Regarding claim 65, Abeliuk, Yao, and Khiari teaches the system of claim 1. Valentini further teaches:
wherein a number of the plurality of recommended experiment designs is a number of experimental conditions or a number of strains for the next cycle of the synthetic biology experiment (Valentini, page 5, section 3.1, paragraph 1, “A gene/gene product 𝑔 can be represented through a vector x ∈ R𝑑 having 𝑑 different features (e.g., gene expression levels across 𝑑 different conditions, sequence similarities with other genes/proteins, or presence or absence of a given domain in the corresponding protein or genetic or physical interaction with other proteins)”, basic notation for the methods disclosed in Valentini, the plurality of training inputs is represented by the vector x, the number of inputs is equal to a number of experimental conditions of a gene/gene product).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the functionalities disclosed by Abeliuk, Yao, and Khiari with the techniques disclosed by Valentini. A motivation for the combination is similarly disclosed above in claim 11.
Claim 86 is substantially similar to claim 12 and as such a similar analysis applies.
Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Abeliuk in view of Yao, Khiari and Cagnini et al. (H. E. L. Cagnini, M. Porto Basgalupp and R. C. Barros, "Increasing Boosting Effectiveness with Estimation of Distribution Algorithms," 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 2018, pp. 1-8, doi: 10.1109/CEC.2018.8477959), hereinafter referred to as Cagnini.
Regarding claim 25, Abeliuk, Yao, and Khiari teaches the system of claim 1. Cagnini, in the same field of ensemble learning implementation, further teaches:
The system of claim 1, wherein parameters of the ensemble of the plurality of level-0 learners comprises (i) a plurality of ensemble weights (Cagnini, page 1, col. 2, paragraph 2, “in the integration phase, predictions from each base learner are aggregated in order to provide final consensus for unseen instances. It is possible to use simple solutions such as standard majority voting (each model is equally important when defining the final outcome), or to adjust the voting weights”, the ensemble of base learners can be weighted so that each model is equally important (equal weight) or have varying degrees of importance (varying weight));
and (ii) an error variable distribution of the ensemble or a standard deviation of the error variable distribution of the ensemble (Cagnini, page 3, col. 2, paragraph 2, “The GM comprises Gaussian distributions that are updated by computing the mean cell value among the elite. The standard deviation is stored into another variable, which is initially set to σ (hyper-parameter), and is iteratively decreased by a factor of τ (another hyper-parameter) at every generation, as recommended by [24].”, A probabilistic graphical model (GM) is used to compute standard deviation for hyper parameter σ, σ being the standard deviation of the Gaussian (normal) distribution within the GM. The “error variable distribution”, as disclosed in paragraph 15 of the specification is normally distributed, thus the Gaussian distributions in the GM represent the error variable distributions).
It would have been obvious to a person of ordinary skill in the art to have incorporated the functionalities disclosed by Abeliuk, Yao, and Khiari with the techniques disclosed by Cagnini (i.e., having a plurality of weights and standard deviation of error variable distribution). A motivation for the combination is to have efficient ensemble learning of synthetic biology experimental values (Cagnini, page 1, abstract, “By using its former voting weights as starting point in a global search carried by an Estimation of Distribution Algorithm, we are capable of improving AdaBoost up to ≈ 11% regarding predictive accuracy in a thorough experimental analysis with multiple public datasets.”).
Claim 33 is rejected under 35 U.S.C. 103 as being unpatentable over Abeliuk in view of Yao, Khiari and Barnes et al. (Barnes, C. P., Silk, D., Sheng, X., & Stumpf, M. P. (2011). Bayesian design of synthetic biological systems. Proceedings of the National Academy of Sciences of the United States of America, 108(37), 15190–15195. https://doi.org/10.1073/pnas.1017972108), hereinafter referred to as Barnes.
Regarding claim 33, Abeliuk, Yao, and Khiari teaches the system of claim 1. Abeliuk, Yao, and Khiari fails to teach the limitations of claim 33. Barnes, in the same field of ensemble learning of synthetic biology data, teaches:
The system of claim 1, wherein to train the level-1 learner, the hardware processor is programmed by the executable instructions to: determine a posterior distribution of the ensemble parameters given the training data or the second subset of the training data (Barnes, page 3, section 2, paragraph 3, “a posterior distribution over possible design parameter values that can be analyzed for parameter sensitivity and robustness and provide credible limits on design parameters”),
wherein to determine the posterior distribution of the ensemble parameters given the training data or the second subset of the training data, the hardware processor is programmed by the executable instructions to: determine (i) a probability distribution of the training data or the second subset of the training data given the ensemble parameters or a likelihood function of the ensemble parameters given the training data of the second subset of the training data, and (ii) a prior distribution of the ensemble parameters (Barnes, page 3, section 2, paragraph 2, “In the Bayesian approach to statistical inference the posterior distribution is the quantity of interest and this is given by the normalized product of the likelihood (function) and the prior (probability)”, the posterior distribution is determined from the product of the likelihood function and the prior distribution),
and wherein to determine the posterior distribution of the ensemble parameters given the training data or the second subset of the training data, the hardware processor is programmed by the executable instructions to: sample a space of the ensemble parameters with a frequency proportional to a desired posterior distribution (Barnes, page 6, paragraph 3, “Our approach does sample parameter space predominantly in regions where the desired behavior is more likely, rather than entirely at random as was done in the previous study; on balance this suggests that the posterior probability for delivering robust oscillations is approximately the same for models 2 and 3. More insight can be gained into this discrepancy by specifying a particular frequency and amplitude of the oscillator as the desired output behavior. Figures 3C and D show the model posterior probability after requiring an amplitude of 0.1 and a frequency of 1.0 Hz on species A and C respectively.”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the functionalities disclosed by Abeliuk, Yao, and Khiari with the techniques disclosed by Barnes (i.e., generate and sample from a posterior distribution). A motivation for the combination is to exploit the advantages of Bayesian learning (Barnes, page 3, section 2, paragraph 3, “a posterior distribution over possible design parameter values that can be analyzed for parameter sensitivity and robustness and provide credible limits on design parameters”, this is one of the advantages listed).
Claims 58, 68, and 72 are rejected under 35 U.S.C. 103 as being unpatentable over Abeliuk in view of Yao, Khiari and Choffin et al. (B. Choffin and N. Ueda, "SCALING BAYESIAN OPTIMIZATION UP TO HIGHER DIMENSIONS: A REVIEW AND COMPARISON OF RECENT ALGORITHMS," 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), Aalborg, Denmark, 2018, pp. 1-6, doi: 10.1109/MLSP.2018.8517011), hereinafter referred to as Choffin.
Regarding claim 58, Abeliuk, Yao, and Khiari teaches the system of claim 1. Abeliuk further teaches:
wherein to determine the plurality of recommended experiment designs, the hardware processor is programmed by the executable instructions to: determine a plurality of possible recommended experiment designs each comprising possible recommended values of the input variables (Abeliuk, paragraph 68, “At step 403 the objective function is iteratively adjusted by repeatedly selecting one or more experiential genotype vectors in the plurality of experiential genotype vectors that maximize an acquisition function of the phenotype prediction model and updating the objective function based at least in part on one or more experimentally-determined phenotypic attributes corresponding to the one or more experiential genotype vectors.”, teaches determining a plurality of candidate experiment designs, i.e., available genotype vectors, each comprising possible recommended values of the input variables.)
Choffin further teaches which Abeliuk, Yao, and Khiari fails to teach:
with surrogate function values, determined using the surrogate function, with a predetermined characteristic (Choffin, page 2, col 2., paragraph 2, “The predictive distribution of f is then used to define u, the acquisition function. The role of u is to balance exploration (sampling where the uncertainty about f is high) and exploitation (sampling where the expected value of f is high) and to indicate interesting parts of the search space. It is optimized instead of f in order to choose the most promising point xt+1. Thus, we sample f at: argmaxx∈Du(x|D1:t). Nowadays, mainly three different acquisition functions are used: Probability of Improvement (PI), Expected Improvement (EI), and Lower Confidence Bound (LCB).”, Choffin teaches that each candidate point/design is evaluated by a surrogate-derived acquisition function value, and that the predetermined characteristic is being promising under that surrogate/acquisition value. Thus, Choffin teaches the candidate designs being associated with surrogate-function values.);
and select the plurality of recommended experiment designs from the plurality of possible recommended experiment designs using an input variable difference factor based on the surrogate function values of the plurality of possible recommended experiment designs (Choffin, page 2, col. 2, paragraph 2, “It is optimized instead of f in order to choose the most promising point xt+1.”, in Bayesian Optimization, the acquisition function defined by the surrogate function is used to determine recommended values (most promising point) from a set of possible recommended values (interesting parts of the search space)).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the functionalities disclosed by Abeliuk, Yao, and Khiari with the techniques disclosed by Choffin (i.e., determine a surrogate). A motivation for the combination is to exploit the advantages of Bayesian learning (Choffin, page 1, section 2, col. 2, paragraph 1, “It is meant to be much cheaper to query and easier to optimize than f.”, one of the advantages of Bayesian optimization of a function).
Regarding claim 68, Abeliuk, Yao, and Khiari teaches the system of claim 1. Choffin further teaches which Abeliuk, Yao, and Khiari fails to teach:
The system of claim 1, wherein to determine the plurality of possible recommended experiment designs, the hardware processor is programmed by the executable instructions to: sample a space of the input variables with a frequency proportional to the surrogate function, or an exponential function of the surrogate function, and a prior distribution of the input variables (Choffin, page 2, col. 1, algorithm 1, “Sample ninit points at random from D…sample yt = f(xt) + t from the objective”, the Bayesian Optimization algorithm shown samples from a space of the input variables (D) with a frequency proportional to the surrogate function (Gaussian Processes) and then samples from the prior distribution of the input variables f(x)).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the functionalities disclosed by Abeliuk, Yao, and Khiari with the techniques disclosed by Choffin. A motivation for the combination is similarly disclosed above in claim 58.
Regarding claim 72, Abeliuk, Yao, and Khiari teaches the system of claim 1. Choffin further teaches:
wherein the hardware processor is programmed by the executable instructions to: receive an upper bound and/or a lower bound for one, or each, of the plurality of input variables, wherein each of the possible recommended values of the input variables is within the upper bound and/or the lower bound of the corresponding input variable (Choffin, page 2, col. 2, paragraph 2, “mainly three different acquisition functions are used: Probability of Improvement (PI), Expected Improvement (EI), and Lower Confidence Bound (LCB).”,
Choffin, page 2, col. 2, paragraph 2, “The predictive distribution of f is then used to define u, the acquisition function. The role of u is to balance exploration (sampling where the uncertainty about f is high) and exploitation (sampling where the expected value of f is high) and to indicate interesting parts of the search space. It is optimized instead of f in order to choose the most promising point xt+1. Thus, we sample f at: argmaxx∈Du(x|D1:t).”, Choffin teaches optimization over a compact domain D for vector input x, and further teaches sampling initial points from D and selecting each subsequent candidate point as xt = argmaxx∈D u(x|D1:t−1). Thus, Choffin teaches that the candidate input values are constrained to remain within the received/search domain bounds, i.e., within upper and/or lower bounds for the input variables.).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the functionalities disclosed by Abeliuk, Yao, and Khiari with the techniques disclosed by Choffin. A motivation for the combination is similarly disclosed above in claim 58.
Claim 70 is rejected under 35 U.S.C. 103 as being unpatentable over Abeliuk in view of Yao, Khiari, and Perrone et al. (Valerio Perrone et al., “Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning,” NeurIPS 2019), hereinafter referred to as Perrone.
Regarding claim 70, Abeliuk, Yao, and Khiari teaches the system of claim 1. Perrone further teaches which Abeliuk, Yao, and Khiari fails to teach:
wherein the hardware processor is programmed by the executable instructions to: determine an upper bound and/or a lower bound for one, or each, of the plurality of input variables based on training values of the corresponding input variable, wherein each of the possible recommended values of the input variables is within the upper bound and/or the lower bound of the corresponding input variable (Perrone, page 1, section 1, “In this work, we automatically design the BO search space, which is a critical input to any BO procedure applied to HPO, based on historical data.”,
Perrone, page 3, section 4.2, “The first instantiation of (3) is a search space defined by a bounding box (or hyperrectangle), which is parameterized by the lower and upper bounds l and u. More formally, ˆ X(θ) = {x ∈Rp|l ≤ x≤u} and θ = (l,u), with k = 2p. A tight bounding box containing all {xt}T t=1 can be obtained as the solution to the following constrained minimization problem:
PNG
media_image7.png
36
305
media_image7.png
Greyscale
”, Perrone teaches determining lower and upper bounds for input variables from prior training/evaluation values and constraining subsequent candidate values to lie within those bounds. Thus, Perrone teaches determining an upper bound and/or a lower bound for one, or each, of the plurality of input variables based on training values of the corresponding input variable, wherein each possible recommended value is within the corresponding bound.)
It would have been obvious to person of ordinary skill in the art before the effective filing date of the claimed invention to combine Abeliuk, Yao, and Khiari with Perrone, with a motivation to improve the efficiency of Bayesian optimization by learning input bounds from prior/training data so that subsequent candidate experiment designs are searched within a reduced, data-driven search space. Perrone expressly teaches “design compact search spaces from historical data” and further teaches that this approach “considerably boosts BO by reducing the size of the search space.” (Perrone, section 1 and 4.2) A person of ordinary skill in the art would have recognized that applying Perrone’s data-driven search-space design to the Abeliuk/Yao/Khiari framework would have predictably improved the experiment-recommendation process by constraining candidate input values to historically supported bounds, thereby reducing unproductive regions of the search space and accelerating identification of promising experiment designs.
Claim 73 is rejected under 35 U.S.C. 103 as being unpatentable over Abeliuk in view of Yao, Khiari and Shahriari et al. (B. Shahriari, K. Swersky, Z. Wang, R. P. Adams and N. de Freitas, "Taking the Human Out of the Loop: A Review of Bayesian Optimization," in Proceedings of the IEEE, vol. 104, no. 1, pp. 148-175, Jan. 2016, doi: 10.1109/JPROC.2015.2494218), hereinafter referred to as Shahriari.
Regarding claim 73, Abeliuk, Yao, and Khiari teaches the limitations of claim 1. Shahriari teaches the following limitation:
The system of claim 1, wherein the hardware processor is programmed by the executable instructions to: determine, using the posterior distribution of the ensemble parameters given the training data or the second subset of the training data, a probability distribution of the at least one response variable for one, or each, of the plurality of recommended experiment designs (Shahriari, page 161, section B, col. 2, paragraph 1, “In the GP (gaussian process) case, since the posterior at any arbitrary point x is a Gaussian, any quantile of the distribution of f(x) is computed”, in Gaussian Processes, the prior distribution of the ensemble parameters is used to compute a distribution of f(x) (the objective function)).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the functionalities disclosed by Abeliuk, Yao, and Khiari with the techniques disclosed by Shahriari (i.e., computing a probability distribution of the objective function from the posterior distribution of ensemble parameters). A motivation for the combination is to exploit the advantages of Bayesian learning for synthetic biology experimental data (Shahriari, page 161, section B, col. 2, paragraph 2, “There are theoretically motivated guidelines for setting and scheduling the hyperparameter Bn to achieve optimal regret [146] and, as with T in the improvement policies, tuning this parameter within these guidelines can offer a performance boost.”).
Claims 74 and 75 are rejected under 35 U.S.C. 103 as being unpatentable over Abeliuk in view of Yao, Khiari and Bittl et al. (Bittl, J. A., & He, Y. (2017). Bayesian analysis: a practical approach to interpret clinical trials and create clinical practice guidelines. Circulation: Cardiovascular Quality and Outcomes, 10(8), e003563.), hereinafter referred to as Bittl.
Regarding claim 74, Abeliuk, Yao, and Khiari teaches the limitations of claim 1. Bittl, in the same field of Bayesian analysis teaches the following limitations which Abeliuk, Yao, and Khiari fail to teach:
The system of claim 1, wherein the hardware processor is programmed by the executable instructions to: determine, using the posterior distribution of the ensemble parameters given the training data or the second subset of the training data, a probability of one, or each, of the plurality of recommended experiment designs achieving the predetermined response variable objective associated with the at least one response variable, optionally wherein the probability of one, or each, of the plurality of recommended experiment designs achieving the predetermined response variable objective associated with the at least one response variable comprises the probability of one, or each, of the plurality of recommended experiment designs being a predetermined percentage closer to achieving the objective relative to the training data (Bittl, page 3, col. 1, paragraph 3, “As outlined in the Table and detailed in Appendix C in the Data Supplement, Bayesian methods combine information from different sources and generate a posterior inference that is a compromise between the prior and the data.1 As shown in Figure 1, the posterior inference contains a maximum (mode)… Bayesian analysis generates direct probability statements about the treatment hypothesis, which is arguably more inter esting than the null. In this instance, the Bayesian approach identifies with 95% probability that mortality is 29% to 52% lower after CABG than it is after PCI. More precisely, the Bayesian approach identifies with 99.9%, 99.9%, and 96.8% probabilities that mortality rates are at least 10%, 20%, or 30% lower after CABG than they are after PCI.”, Bittl teaches using a posterior distribution to determine a probability that a candidate option achieves an objective, including the probability that the outcome is a predetermined percentage closer to the objective. Bittl teaches determining, for one or each recommended experiment design, the probability of achieving the predetermined response variable objective, including the probability of being a predetermined percentage closer to the objective relative to the training data.).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine Abeliuk, Yao, and Khiari with Bittl, with a motivation to use the posterior distribution to quantify the probability that a candidate recommendation achieves not only a target objective, but also a predefined percentage improvement relative to prior data. Bittl teaches that Bayesian analysis can “update what is already known” (Bittl, abstract) using observed data, and more specifically teaches calculating “probabilities that mortality rates are at least 10%, 20%, or 30% lower,” (Bittl, page 3, col. 1, paragraph 3) i.e., explicit posterior probabilities for predetermined percentage-improvement thresholds. A person of ordinary skill in the art would have found it obvious to apply that same Bayesian probability-threshold approach to the Abeliuk/Yao/Khiari ensemble-based experiment-design system so that recommended experiment designs could be evaluated by the probability of achieving the response objective, including the probability of being a predetermined percentage closer to that objective relative to the training data.
Regarding claim 75, Abeliuk, Yao, and Khiari teaches the limitations of claim 1. Shahriari teaches the following limitation:
The system of claim 1, wherein the hardware processor is programmed by the executable instructions to: determine, using the posterior distribution of the ensemble parameters given the training data or the second subset of the training data, a probability of at least one of the plurality of recommended experiment designs achieving the predetermined response variable objective associated with the at least one response variable, optionally wherein the probability of the at least one of the plurality of recommended experiment designs achieving the predetermined response variable objective associated with the at least one response variable comprises the probability of the at least one of the plurality of recommended experiment designs achieving the predetermined response variable objective being a predetermined percentage closer to achieving the objective relative to the training data (Bittl, page 3, col. 1, paragraph 3, “As outlined in the Table and detailed in Appendix C in the Data Supplement, Bayesian methods combine information from different sources and generate a posterior inference that is a compromise between the prior and the data.1 As shown in Figure 1, the posterior inference contains a maximum (mode)… Bayesian analysis generates direct probability statements about the treatment hypothesis, which is arguably more inter esting than the null. In this instance, the Bayesian approach identifies with 95% probability that mortality is 29% to 52% lower after CABG than it is after PCI. More precisely, the Bayesian approach identifies with 99.9%, 99.9%, and 96.8% probabilities that mortality rates are at least 10%, 20%, or 30% lower after CABG than they are after PCI.”, Bittl teaches determining from a posterior distribution the probability that a candidate option satisfies a predefined percentage-improvement threshold. Bittl teaches determining the probability that at least one of the plurality of recommended experiment designs achieves the predetermined response variable objective, including the probability that at least one design is a predetermined percentage closer to the objective relative to the training data.).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the functionalities disclosed by Abeliuk, Yao, and Khiari with the techniques disclosed by Bittle. A motivation for the combination is similarly disclosed above in claim 74.
Claims 87-88 are rejected under 35 U.S.C. 103 as being unpatentable over Abeliuk in view of Yao, Khiari, and Hu et al. (Shahhosseini, M., Hu, G., & Pham, H. (2019). Optimizing Ensemble Weights and Hyperparameters of Machine Learning Models for Regression Problems. arXiv preprint arXiv:1908.05287.), hereafter referred to as Hu.
Regarding claim 87, Abeliuk, Yao, and Khiari teaches the system of claim 1. Abeliuk, Yao, and Khiari do not teach however, Hu teaches:
The method of claim 1, wherein the plurality of level-0 learners comprise a non-probabilistic machine learning model. (Hu, page 11, last paragraph, “Table.2 presents the details of ML models and their hyperparameters settings (All other hyperparameters are set to their default values).”, Hu teaches that the first-level learners in the ensemble are selected from ordinary machine-learning model families. A person of ordinary skill in the art before the effective filing date of the invention would have understood these identified models, such as random forest, KNN, gradient boosting regressor, neural network, regression tree, bagging, and SVM to be non-probabilistic machine learning models.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to implement the heterogeneous plurality of level-0 learners in the Abeliuk-Yao-Khiari system using the specific regression-model families taught by Hu. As set forth, Abeliuk teaches the synthetic-biology surrogate-model optimization framework, Yao teaches a stacked regression/Bayesian predictive-distribution architecture in which multiple first-stage models are combined at a higher level, and Khiari teaches that ensemble learning can use heterogeneous base models and stacking via a meta-regressor trained on base-model outputs. Hu teaches, in the context of regression ensembles, selecting multiple diverse base models for ensemble creation and expressly identifies candidate base models including SVM, KNN, Gaussian Process Regressor, Random Forest, Gradient Boosting Regressor, and Neural network. A person of ordinary skill in the art would have been motivated to use at least two of Hu’s expressly identified regression model families as the level-0 learners in the Abeliuk-Yao-Khiari surrogate model in order to obtain diverse and accurate base learners for predicting continuous response variables, thereby predictably improving ensemble performance while retaining Abeliuk’s synthetic-biology recommendation workflow.
PNG
media_image8.png
210
503
media_image8.png
Greyscale
Table 2 of Hu
Regarding claim 88, Abeliuk, Yao, and Khiari teaches the limitations of claim 1. Abeliuk, Yao, and Khiari do not teach however, Hu teaches:
The system of claim 1, wherein the plurality of level-0 learners comprise two or more machine learning models selected from a group consisting of: a random forest, a neural network, a support vector regressor, a kernel ridge regressor, a K-NN regressor, a Gaussian process regressor, a gradient boosting regressor, and a tree-based pipeline optimization tool (TPOT). (Hu, page 12, paragraph 1, “First is the ensembles constructed with averaging the input base models which we call classical ensembles, and the second one is stacked ensembles with linear regression which we will call stacked regression. The latter benchmark has been widely used as one of the most effective methods to create ensembles and is created with fitting a linear regression model on the predictions made by different base learners.”, Hu expressly teaches a regression ensemble using multiple selected base models, where the base models are chosen from a pool that includes at least random forest, neural network, support vector regressor, K-NN regressor, Gaussian process regressor, and gradient boosting regressor. Because Hu is directed to regression problems, a person of ordinary skill in the art would understand its listed “SVM” base model in this context as a support vector regressor. Thus, Hu teaches that the plurality of level-0 learners comprise two or more machine learning models selected from the recited group.).
The rationale to combine Abeliuk, Yao, and Khiari with Hu is similar to that as applied for claim 87 above.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Gorissen, D., Dhaene, T., & De Turck, F. (2009). Evolutionary model type selection for global surrogate modeling. Journal of Machine Learning Research, 10(9).
Sun, Y., Wang, H., Xue, B., Jin, Y., Yen, G. G., & Zhang, M. (2019). Surrogate-assisted evolutionary deep learning using an end-to-end random forest-based performance predictor. IEEE Transactions on Evolutionary Computation, 24(2), 350-364.
Chung, M., Binois, M., Gramacy, R. B., Bardsley, J. M., Moquin, D. J., Smith, A. P., & Smith, A. M. (2019). Parameter and uncertainty estimation for dynamical systems using surrogate stochastic processes. SIAM Journal on Scientific Computing, 41(4), A2212-A2238.
Swaminathan, A., Poole, W., Pandey, A., Hsiao, V., & Murray, R. M. (2017). Fast and flexible simulation and parameter estimation for synthetic biology using bioscrape. BioRxiv, 121152.
Wang, H., Jin, Y., Sun, C., & Doherty, J. (2018). Offline data-driven evolutionary optimization using selective surrogate ensembles. IEEE Transactions on Evolutionary Computation, 23(2), 203-216.
Lacoste, A., Larochelle, H., Laviolette, F., & Marchand, M. (2014). Sequential model-based ensemble optimization. arXiv preprint arXiv:1402.0796.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HYUNGJUN B YI whose telephone number is (703)756-4799. The examiner can normally be reached M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/H.B.Y./
Examiner, Art Unit 2124
/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146