DETAILED ACTION
1. Claims 1-3, 5, 8-9, 11-15, and 17-23 have been presented for examination.
Claims 4, 6-7, 10, and 16 have been cancelled.
Notice of Pre-AIA or AIA Status
2. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
3. The information disclosure statements (IDS) submitted on 6/11/25 and 9/16/25 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the Examiner has considered the IDS as to the merits.
Response to Arguments
4. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 2/5/26 has been entered.
i) Following Applicants arguments and amendments the 101 rejections are WITHDRAWN
ii) Following Applicants arguments and amendments the prior art rejections are MAINTAINED. Applicants once again argue that Masood “is not oriented to solving multi-objective optimization problem to handle multiple measurements.” However Masood notes in at least equation 23 “a joint multivariate normal distribution”, as well as equation 30 on page 836 ,” Sobol’s sensitivity method is based on the decomposition of V into contributions from the effects of single and/or combined effects of pairs of parameters” As noted in the previous office action see also “Page 839, Section 6.1, Paragraph 2 which recites multiple target measurement values including “CFD analysis is performed on the baseline model to evaluate the difference of relative-tangential velocity at the leading and trailing edge, DvCFD” as well as the power obtained for runner “Prunner”, and the power available in water “Pwater” and the overall efficiency of the turbine Ncfd.” As to Applicants arguments regarding the “whole” turbine unit Applicants are once again encouraged to consider Figure 3 which shows the multi objective aspect of the prior art in the multiple elements being measured including the segmenting of each distinct curve as well as Table 1 showing the multiple parameter valuers. It is further noted that in response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., the plurality of equations and sections of the specification of the instant application) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). Applicants also appear to argue the goal of Masood and not what is being taught, see Applicants arguments on the top of page 19. Therefore the prior art rejection is MAINTAINED.
iii) Applicants further argue that Masood does not teach a target function in conjunction with the argued optimization problem, however as noted in the previous office action and reiterated here Page 845, right column, first paragraph recites “During the training, the auto-encoder minimises this loss function so the output of the decoder is as close to the original input data. The reconstruction error is composed of the sum of MSE and two different regularisation terms; sparsity regularisation and L2 regularisation, which generically speaking helps to avoid over-fitting and local optima by enforcing lower weights to the network during training.” As noted in the previous office action, this section reads on the loss function as claimed. Also Applicants once again appear to argue the intent of their claims in contrast to the prior art rather than what is being taught by the prior art. Therefore the prior art rejection is MAINTAINED.
iv) Applicants further argue that Masood does not teach that the target function is a logistic curve in conjunction with the argued optimization problem. However as noted in the previous office action Masood teaches in Page 845, right column, Paragraph 1-2, “The outcome of the auto-encoder can be similar to that obtained from PCA if it is constructed with a linear activation function. In the present work, both the encoder and decoder part of the used network is composed of four layers, including three internal layers, one input layer of the encoder and one output layer of the decoder. The bottleneck layer consists of four neurons, which, similar to the output of PCA, results in a lower-dimensional subspace with four dimensions. A logistic sigmoid activation function is used at the neurons. The training is performed with scaled conjugate gradient descent algorithm [69]. The NMSE between the original and the reconstructed designs with four latent variables obtained from PCA and auto-encoders is 5.2670% and 5.8801%, respectively. It can be seen that there is no significant difference between PCA and auto-encoder results in term of NMSE.” The Examiner noted that a logistic sigmoid is another name for the logistic function which produces the claimed logistic curve. Applicants once again argue the intended use of this calculation rather than what is being calculated. In this case the calculation is performed in the context of a PCA which reads on the claimed target function. Therefore the prior art rejection is MAINTAINED.
v) Applicants further argue that Shmulevich is for prediction problems and not optimization problems and does not teach the use of an R squared value. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). As noted in the previous office action and reiterated here, Shmulevich in [0021] teaches “A method of inference, based on the coefficient of determination typically produces a number of good candidate predictors for each node (e.g.,, target gene or target protein). Since the COD itself is estimated from the data, there is little reason to rely on one good predictor function. Thus, the general approach is to probabilistically synthesize good predictor functions such that each predictor function's contribution is proportional to its determinative potential. For small sample sizes, the complexity of each predictor function can be limited. As new data make themselves available, the model class naturally allows one to narrow down as needed, effectively reducing the uncertainty for predicting each node (e.g., target gene).” It is noted that an R-squared calculation is read on by the cited “coefficient of determination.” Further the Examiner notes that this rejection was presented as obvious since the calculation would effectively reduce the uncertainty for predicting each node and produce good candidate predictors for each node as per Shmulevich [0021]. Therefore the prior art rejection is MAINTAINED.
vi) Applicants further argue that Masood does not teach training data insufficiency. As noted in the previous office action and reiterated here, Masood in at least Page 835, left column, paragraph 1, teaches “However, it does not guarantee that these samples will be appropriate for reliable and generalised model training. Therefore, in the proposed approach, sampling is initially started with N0 samples and the size of the samples is gradually increased by adding N00 more samples in the initial dataset, i.e., N0 ¼ N0 þ N00 with N00≪N0. In each training iteration, N00 new designs are sampled and their performance is evaluated after projecting back to the original design space. The training is performed with N0 ¼ N0 þ N00 samples and this process is repeated until no notable improvement is observed in the training.” The Examiner noted that this reads on the potential unreliability of the approach and a way to circumvent or resolve such unreliability. Specifically starting with too few samples and gradually increasing them. Therefore the prior art rejection is MAINTAINED.
vii) Applicants further argue that Masood does not teach conducting a global search then conducting a local search as visually depicted in the specification at Figures 9A-9B and 10. First, although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). Second, as noted in the previous office action the result of the searches are as recited in the claim the identifying sub-regions which was shown in at least Page 834, Left Column, Section 3.2.2 “We utilise k-fold cross-validation [60] to validate the accuracy of the GPR prediction model and to avoid over-fitting. The dataset D is partitioned into kc folds/groups. From these folds, one fold is used for validation or testing and the remaining kc _ 1 folds are used as the training dataset to train GPR.” Specifically, the validation or testing fold or partition represents the optimal sub region. Subsequent to this step as claimed the “and then conducting at each said sub-region, using said one or more hardware processors, a local search for evaluating a candidate surrogate function value at one or more sampling points representing respective experimental designs within said sub-region” is shown in at least Page 834, Left Column, Section 3.2.2 “After training, the data in kcth fold is inputted into the trained GPR to predict their performance. Designs in this fold are not used during the training and are used to test the capability of the model for unseen data. The cross-validation MSE between the actual and the predicted performance values for designs in kcth fold is obtained from the trained model. This process is repeated kc times and kc GPR models are trained. The model giving the least cross-validation MSE is selected as the final model.” Specifically this citation at least represents the claimed evaluation. Therefore the prior art rejection is MAINTAINED.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
5. Claims 1-3, 5, 8-9, 11-15, and 17-20 are rejected under 35 U.S.C. 102(a)(1) as being clearly anticipated by Masood, Zahid, Shahroz Khan, and Li Qian. "Machine learning-based surrogate model for accelerating simulation-driven optimisation of hydropower Kaplan turbine." Renewable Energy 173 (2021): 827-848, hereafter Masood.
Regarding Claim 1: The reference discloses A method for multi-objective optimization in design of a physical device, the method comprising:
receiving, at one or more hardware processors, an input data comprising multiple unique historical experimental designs associated with a building of a physical device and corresponding historical performance measurements obtained using the built physical device; (Page 828, Left Column, Paragraph 3, “In this loop, parametric design, along with a shape modification method, is coupled with the optimiser, which explores a given design space. At each iteration, design instances searched by the optimiser are fed to shape modification methods. These deform the baseline design to create new geometries evaluated using physics simulations.” This teaches the claimed historical experimental designs and the corresponding historical performance measurements.)
receiving, at said one or more hardware processors, a specification of at least two target measurement values corresponding to a target objective to be achieved by said built physical device; (Page 836-837, Section 4, “The efficiency of the whole turbine unit depends on the performance of different components like a runner, distributors, oil head, governors, etc. In the present work, to validate the proposed SBO pipeline, we focused on the design and optimisation of the turbine’s runner blade profile, which is constructed based on the design theory proposed in Refs. [4,16] and summarized in this section.” This reads on the at least two target measurement values, specifically performance of different components like a runner, distributors, oil head, governors, etc. and in the context of this prior art also the various values of the runner blade profile as seen in at least Table 1 whereby several sections of the runner blade, P1-P5 are measured along with various corresponding values at each section seen at the left of the table. Another section also reads on this limitation, Page 839, Section 6.1, Paragraph 2 which recites multiple target measurement values including “CFD analysis is performed on the baseline model to evaluate the difference of relative-tangential velocity at the leading and trailing edge, DvCFD” as well as the power obtained for runner “Prunner”, and the power available in water “Pwater” and the overall efficiency of the turbine Ncfd.)
generating, by the one or more hardware processors, a target function for multi-objective optimization, the target function comprising multiple prediction models trained to recommend designs that would achieve the specified at least two target measurement values of the corresponding target objective; (Page 830, Section 2.3, “Machine learning-based surrogate models have been developed to predict the highly-nonlinear behaviour of complex physics phenomena. In this subsection, we discussed some prior applications of machine learning in engineering to predict performance thus overcomes the computational burden caused by physics simulations.”)
obtaining, using one or more hardware processors, an optimal solution to the target function for multi-objective optimization that optimizes the target function and using the optimal solution of said optimized target function to recommend a new design used to build said physical device (Page 840, Section 6.4, Paragraph 1 which recites a “final GPR prediction model”), said recommended new design for simultaneously achieving said at least two target measurement values of the corresponding target objective by said physical device; (Page 840, Section 6.4.1, Paragraph 1, “Comparison between initial and optimal design The flow behaviour across the runner profile of the initial and optimised turbine can be observed in Fig. 13. In the initial design of the runner, little swirls can be observed in water after leaving the runner, which indicates that water still has the potential left for power generation. On the other hand, in optimised design, water has a more streamline flow after it leaves the runner. This means the optimised design is capable of extracting most of the energy from the flow.” This section and Figure 13 read on the new design and simultaneously achieving said at least two target measurement values by said physical device.) wherein the target function for multi-objective optimization includes a model uncertainty quantification term for calculating the prediction uncertainty of the multiple prediction models, (Page 845, right column, first paragraph, “During the training, the auto-encoder minimises this loss function so the output of the decoder is as close to the original input data. The reconstruction error is composed of the sum of MSE and two different regularisation terms; sparsity regularisation and L2 regularisation, which generically speaking helps to avoid over-fitting and local optima by enforcing lower weights to the network during training.” This section recites a plurality of regularization approaches used in the prior art function.) the target function modeling operations embodied as one or more functions each operating on one or more experimental design input variables producing a respective multiple target measurement value output, (Page 836-837, Section 4, “The efficiency of the whole turbine unit depends on the performance of different components like a runner, distributors, oil head, governors, etc. In the present work, to validate the proposed SBO pipeline, we focused on the design and optimisation of the turbine’s runner blade profile, which is constructed based on the design theory proposed in Refs. [4,16] and summarized in this section.” This reads on the at least two target measurement values, specifically performance of different components like a runner, distributors, oil head, governors, etc. and in the context of this prior art also the various values of the runner blade profile as seen in at least Table 1 whereby several sections of the runner blade, P1-P5 are measured along with various corresponding values at each section seen at the left of the table. Another section also reads on this limitation, Page 839, Section 6.1, Paragraph 2 which recites multiple target measurement values including “CFD analysis is performed on the baseline model to evaluate the difference of relative-tangential velocity at the leading and trailing edge, DvCFD” as well as the power obtained for runner “Prunner”, and the power available in water “Pwater” and the overall efficiency of the turbine Ncfd.) and said target function is a logistic curve; (Page 845, right column, Paragraph 1-2, “The outcome of the auto-encoder can be similar to that obtained from PCA if it is constructed with a linear activation function. In the present work, both the encoder and decoder part of the used network is composed of four layers, including three internal layers, one input layer of the encoder and one output layer of the decoder. The bottleneck layer consists of four neurons, which, similar to the output of PCA, results in a lower-dimensional subspace with four dimensions. A logistic sigmoid activation function is used at the neurons. The training is performed with scaled conjugate gradient descent algorithm [69]. The NMSE between the original and the reconstructed designs with four latent variables obtained from PCA and auto-encoders is 5.2670% and 5.8801%, respectively. It can be seen that there is no significant difference between PCA and auto-encoder results in term of NMSE.” A logistic sigmoid is another name for the logistic function which produces the claimed logistic curve.)
and configuring, using the one or more hardware processors, one or more of: structures, materials, or process conditions used for building said physical device according to said recommended new design. (Page 840, Section 6.4.1, Paragraph 1, “Comparison between initial and optimal design The flow behaviour across the runner profile of the initial and optimised turbine can be observed in Fig. 13. In the initial design of the runner, little swirls can be observed in water after leaving the runner, which indicates that water still has the potential left for power generation. On the other hand, in optimised design, water has a more streamline flow after it leaves the runner. This means the optimised design is capable of extracting most of the energy from the flow.” This section and Figure 13 read on the new design which utilizes a differing structure.)
Regarding Claim 2: The reference discloses The method according to Claim 1, wherein said multiple historical designs input data comprises data selected from: choices of materials for said physical device, one or more geometries of aspects of said physical device, a process condition used in the making of said physical device. (Page 840, Section 6.4.1, Paragraph 1, “Comparison between initial and optimal design The flow behaviour across the runner profile of the initial and optimised turbine can be observed in Fig. 13. In the initial design of the runner, little swirls can be observed in water after leaving the runner, which indicates that water still has the potential left for power generation. On the other hand, in optimised design, water has a more streamline flow after it leaves the runner. This means the optimised design is capable of extracting most of the energy from the flow.” This section and Figure 13 read on the new design which utilizes a differing structure and geometry.)
Regarding Claim 3: The reference discloses The method according to Claim 1, further comprising: building, using one or more hardware processors, based on said multiple unique historical experimental designs input data, the machine learned prediction model trained for predicting a new experimental design given multiple target measurement values. (Page 829, Left Column, bottom bullet #2, “Coupling of unsupervised feature extraction technique with a supervised machine-learning method to construct an efficient and reliable reduced-ordered machine learning-based surrogate model to predict the relative-tangential velocity of the turbine.” Page 840, Section 6.4, Paragraph 1 which recites a “final GPR prediction model”)
Regarding Claim 5: The reference discloses The method according to Claim 1, said optimized target function generates a scalar output value used for evaluating said multi-objective optimization. (Page 833, right column, first paragraph, “Above is similar to Multiple Linear Regression (MLR) and assumes that an observation consists of an independent signal term gðtÞ and a noise term ε. However, GPR assumes that gðtÞ is a random variable and follows a particular distribution, which reflects our uncertainty regarding the function [2]. The uncertainty in gðtÞ can be observed based on its output at different t samples.” This section recites the aspect of model uncertainty quantification. Page 845, right column, first paragraph, “During the training, the auto-encoder minimises this loss function so the output of the decoder is as close to the original input data. The reconstruction error is composed of the sum of MSE and two different regularisation terms; sparsity regularisation and L2 regularisation, which generically speaking helps to avoid over-fitting and local optima by enforcing lower weights to the network during training.” This section recites a plurality of regularization approaches used in the prior art function as well as taking into account and minimizing a loss function.)
Regarding Claim 8: The reference discloses The method according to Claim 1, further comprising: determining said uncertainty quantification term by performing one of: uncertainty quantification for a decision tree analysis and a multivariate adaptive regression splines analysis, or a principle component analysis PCA. (Table 6 recites a decision tree. Table 6 shows B spline curves. Page 845, right column, Paragraph 1-2, “The outcome of the auto-encoder can be similar to that obtained from PCA if it is constructed with a linear activation function. In the present work, both the encoder and decoder part of the used network is composed of four layers, including three internal layers, one input layer of the encoder and one output layer of the decoder. The bottleneck layer consists of four neurons, which, similar to the output of PCA, results in a lower-dimensional subspace with four dimensions. A logistic sigmoid activation function is used at the neurons. The training is performed with scaled conjugate gradient descent algorithm [69]. The NMSE between the original and the reconstructed designs with four latent variables obtained from PCA and auto-encoders is 5.2670% and 5.8801%, respectively. It can be seen that there is no significant difference between PCA and auto-encoder results in term of NMSE.” This section shows the use of PCA.)
Regarding Claim 9: The reference discloses A method for multi-objective optimization in design of a physical device, the method comprising:
receiving, at one or more hardware processors, an input data comprising multiple unique historical experimental designs and corresponding historical performance measurements associated with a building of a physical device, said input data being insufficient for reliably training a single prediction function (Page 835, left column, paragraph 1, “However, it does not guarantee that these samples will be appropriate for reliable and generalised model training. Therefore, in the proposed approach, sampling is initially started with N0 samples and the size of the samples is gradually increased by adding N00 more samples in the initial dataset, i.e., N0 ¼ N0 þ N00 with N00≪N0. In each training iteration, N00 new designs are sampled and their performance is evaluated after projecting back to the original design space. The training is performed with N0 ¼ N0 þ N00 samples and this process is repeated until no notable improvement is observed in the training.” The prior art notes the potential unreliability of the approach and a way to circumvent or resolve such unreliability.) performance. to predict a new design of said physical device that simultaneously achieves at least two target measurement values; (Page 828, Left Column, Paragraph 3, “In this loop, parametric design, along with a shape modification method, is coupled with the optimiser, which explores a given design space. At each iteration, design instances searched by the optimiser are fed to shape modification methods. These deform the baseline design to create new geometries evaluated using physics simulations.” This teaches the claimed historical experimental designs and the corresponding historical performance measurements.)
providing a search space of experimental designs; successively partition said search space into a plurality of sub-regions, one or more sub- regions of said plurality comprising a new design candidate for potentially optimizing said surrogate function; (Page 840, Section 6.3, the entire section lays out this approach and more specifically the second paragraph “During the training of each of the surrogate, hyper-parameters are optimised using Besiyan optimisation. After cross-validation, the model with the least crossvalidation MSE is selected. In each iteration, N00 ¼ 20 more designs are sampled and included in the dataset after evaluating their Dv. The sampling process is stopped after 6th iterations, which resulted in the training of the final model with 220 designs.” Page 828, Left Column, Paragraph 3, “In this loop, parametric design, along with a shape modification method, is coupled with the optimiser, which explores a given design space. At each iteration, design instances searched by the optimiser are fed to shape modification methods. These deform the baseline design to create new geometries evaluated using physics simulations.” This teaches the claimed historical experimental designs and the corresponding historical performance measurements.)
iteratively obtaining, using the one or more hardware processors, a sequence of surrogate prediction functions, each surrogate prediction function of said sequence designed to learn a relationship between the input historical experimental design data used to build said physical device and the at least two target measurement values; (Page 840, Section 6.3, the entire section lays out this approach and more specifically the first paragraph notes “After constructing the 4D subspace, it is sampled with highfidelity sampling [61] to create a diverse dataset for training GPR to build a surrogate model. As mentioned before, to reduce the computational cost for creating a training dataset, an adoptive training strategy is utilised in which sampling is performed multiple times to generate a dataset consisting of an appropriate number of samples to construct a reliable prediction model. Initially, training is started with N0 ¼ 100 sampled designs and iterations are performed until a surrogate model with the desired accuracy is achieved.”)
wherein, at each iteration, said obtaining a surrogate prediction function comprises: identifying a set of potentially optimal sub-regions of said plurality of sub- regions; (Page 840, Section 6.3, the entire section lays out this approach and more specifically the first paragraph notes “After constructing the 4D subspace, it is sampled with highfidelity sampling [61] to create a diverse dataset for training GPR to build a surrogate model. As mentioned before, to reduce the computational cost for creating a training dataset, an adoptive training strategy is utilised in which sampling is performed multiple times to generate a dataset consisting of an appropriate number of samples to construct a reliable prediction model. Initially, training is started with N0 ¼ 100 sampled designs and iterations are performed until a surrogate model with the desired accuracy is achieved.”) evaluating, using the one or more processors, a current surrogate prediction function at one or more experimental design data points of a potentially optimal sub- region of said plurality of sub-regions; obtaining, using the one or more processors, an optimal target solution to the current surrogate prediction function based on said evaluating; using said optimal target solution of said current surrogate prediction function to acquire, using the one or more processors, a new experimental design data for successively improving an accuracy of the surrogate prediction function; and repeating said surrogate prediction function evaluating, optimizing and acquiring of new experimental design data to obtain the single surrogate prediction function for optimally predicting said new design; (Page 840, Section 6.3, the entire section lays out this approach and more specifically the second paragraph “During the training of each of the surrogate, hyper-parameters are optimised using Besiyan optimisation. After cross-validation, the model with the least crossvalidation MSE is selected. In each iteration, N00 ¼ 20 more designs are sampled and included in the dataset after evaluating their Dv. The sampling process is stopped after 6th iterations, which resulted in the training of the final model with 220 designs.”)
running, at the one or more hardware processors, one of the successive surrogate prediction functions to optimally predict a new design for simultaneously achieving said at least two target measurement values by said physical device; (Page 840, Section 6.4, Paragraph 1 which recites a “final GPR prediction model.” Section 6.4.1, Paragraph 1, “Comparison between initial and optimal design The flow behaviour across the runner profile of the initial and optimised turbine can be observed in Fig. 13. In the initial design of the runner, little swirls can be observed in water after leaving the runner, which indicates that water still has the potential left for power generation. On the other hand, in optimised design, water has a more streamline flow after it leaves the runner. This means the optimised design is capable of extracting most of the energy from the flow.” This section and Figure 13 read on the new design and simultaneously achieving said at least two target measurement values by said physical device.)
and configuring, using the one or more hardware processors, one or more of: structures, materials, or process conditions used for building said physical device according to said predicted new design. (Page 840, Section 6.4.1, Paragraph 1, “Comparison between initial and optimal design The flow behaviour across the runner profile of the initial and optimised turbine can be observed in Fig. 13. In the initial design of the runner, little swirls can be observed in water after leaving the runner, which indicates that water still has the potential left for power generation. On the other hand, in optimised design, water has a more streamline flow after it leaves the runner. This means the optimised design is capable of extracting most of the energy from the flow.” This section and Figure 13 read on the new design which utilizes a differing structure.)
Regarding Claim 11: The reference discloses The method according to Claim 9, wherein said evaluating a current surrogate prediction function comprises:
defining, by said one or more hardware processors, a search space of experimental designs; (Page 831, right column, Paragraph 2-3, “Afterwards, y is inputted to the optimiser to guide the exploration process towards the promising regions of the design space. As explained earlier, if X is high-dimensional and performance evaluation of the design is time expensive then the whole optimization process can suffer from high computational cost. Therefore, the proposed approach cures this curse of dimensionality with feature extraction techniques to create a lower-dimensional subspace and uses this subspace to construct a surrogate model to bypass the need for designs’ performance evaluation with simulation tools.” This section recites the aspect of a search space as subspaces in the design.)
successively partitioning, using said one or more hardware processors, said search space into a plurality of sub-regions, one or more sub-regions of said plurality comprising a new experimental design candidate for potentially optimizing said surrogate function, wherein said sub-region comprises a hyper-rectangle. (Page 831, right column, Paragraph 2-3, “Afterwards, y is inputted to the optimiser to guide the exploration process towards the promising regions of the design space. As explained earlier, if X is high-dimensional and performance evaluation of the design is time expensive then the whole optimization process can suffer from high computational cost. Therefore, the proposed approach cures this curse of dimensionality with feature extraction techniques to create a lower-dimensional subspace and uses this subspace to construct a surrogate model to bypass the need for designs’ performance evaluation with simulation tools.” This section recites the aspect of a search space as subspaces in the design as well as their use in the optimization. The last paragraph in this same section recites “For the complex problem, evaluation of C requires solving high order integrals, which, if the dimensionality of the design space is sufficiently small, can be solved with techniques like tensor product Gauss-Legendre quadrature. However, for problems like studied in the present work, the estimation of C is evaluated using pseudorandom sampling techniques such as Monte Carlo or Latin hypercube sampling.” The recited hypercube is a type of claimed hyperrectangle.)
Regarding Claim 12: The reference discloses The method according to Claim 11, wherein said successively partitioning said search space is an iterative process comprising, at each iteration:
first conducting, using said one or more hardware processors, a global search for identifying one or more optimal sub-regions that meet a partitioning criteria; (Page 834, Left Column, Section 3.2.2 “We utilise k-fold cross-validation [60] to validate the accuracy of the GPR prediction model and to avoid over-fitting. The dataset D is partitioned into kc folds/groups. From these folds, one fold is used for validation or testing and the remaining kc _ 1 folds are used as the training dataset to train GPR.” The validation or testing fold or partition represents the optimal sub region.)
and then conducting at each said sub-region, using said one or more hardware processors, a local search for evaluating a candidate surrogate function value at one or more sampling points representing respective experimental designs within said sub-region; (Page 834, Left Column, Section 3.2.2 “After training, the data in kcth fold is inputted into the trained GPR to predict their performance. Designs in this fold are not used during the training and are used to test the capability of the model for unseen data. The cross-validation MSE between the actual and the predicted performance values for designs in kcth fold is obtained from the trained model. This process is repeated kc times and kc GPR models are trained. The model giving the least cross-validation MSE is selected as the final model.”)
and determining, based on said evaluating said candidate surrogate function, the optimal target solution at said iteration for a sub-region. (Page 834, Left Column, Section 3.2.2 “After training, the data in kcth fold is inputted into the trained GPR to predict their performance. Designs in this fold are not used during the training and are used to test the capability of the model for unseen data. The cross-validation MSE between the actual and the predicted performance values for designs in kcth fold is obtained from the trained model. This process is repeated kc times and kc GPR models are trained. The model giving the least cross-validation MSE is selected as the final model.” With respect to the claimed surrogate function see also Page 840, Section 6.3, first paragraph notes “After constructing the 4D subspace, it is sampled with highfidelity sampling [61] to create a diverse dataset for training GPR to build a surrogate model. As mentioned before, to reduce the computational cost for creating a training dataset, an adoptive training strategy is utilised in which sampling is performed multiple times to generate a dataset consisting of an appropriate number of samples to construct a reliable prediction model. Initially, training is started with N0 ¼ 100 sampled designs and iterations are performed until a surrogate model with the desired accuracy is achieved.”)
Regarding Claim 13: The reference discloses The method according to Claim 12, wherein said conducting a local search at a sub-region comprises: using a Gaussian process for building a local prediction model over the sub-region, said prediction model comprising a candidate surrogate function approximating said optimal target solution using Bayesian optimization. (Page 833, Left Column, Section 3.2.1. “Gaussian process regression. GPR is a non-parametric Bayesian approach, which has been used in different design applications [47] and proven to be an efficient tool for mapping the nonlinear and globally coupled relationship between inputs and outputs sampled from a theoretically infinite-dimensional normal distribution.”)
Regarding Claim 14: The reference discloses The method according to Claim 12, wherein said conducting a local search at a sub-region further comprises: choosing, within a sub-region, a sampling point representing an experimental design by minimizing an prediction function, said prediction function defined as one of: an expected improvement with respect to a best function value; or an upper confidence bound. (Page 842, Right Column, Paragraph 2, “Unlike other approaches, SVR not just finds a function best fitting on the training sample points. Instead, during training, it finds an ε-insensitive region around the function referred to as ε-tube. This tube reformulates the optimisation problem to find the tube that best approximates the nonlinear function while balancing the complexity of the model and error. Similar to GPR, SVR can be used for both linear and non-linear problems and uses a kernel function to map a nonlinear behaviour between t and Dv.” This section reads on the claimed sampling points as well as the function in the context of a “best function” particularly in view of the broadest reasonable interpretation of the term.)
Regarding Claim 15: The reference discloses A system for multi-objective optimization in design of a physical device, the system comprising:
a hardware processor and a non-transitory computer-readable memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to: receive an input data comprising multiple unique historical experimental designs and corresponding historical performance measurements associated with a building of a physical device, said input data being insufficient for reliably training a single prediction function (Page 835, left column, paragraph 1, “However, it does not guarantee that these samples will be appropriate for reliable and generalised model training. Therefore, in the proposed approach, sampling is initially started with N0 samples and the size of the samples is gradually increased by adding N00 more samples in the initial dataset, i.e., N0 ¼ N0 þ N00 with N00≪N0. In each training iteration, N00 new designs are sampled and their performance is evaluated after projecting back to the original design space. The training is performed with N0 ¼ N0 þ N00 samples and this process is repeated until no notable improvement is observed in the training.” The prior art notes the potential unreliability of the approach and a way to circumvent or resolve such unreliability.) to predict a new design of said physical device that simultaneously achieves at least two target measurement values; (Page 828, Left Column, Paragraph 3, “In this loop, parametric design, along with a shape modification method, is coupled with the optimiser, which explores a given design space. At each iteration, design instances searched by the optimiser are fed to shape modification methods. These deform the baseline design to create new geometries evaluated using physics simulations.” This teaches the claimed historical experimental designs and the corresponding historical performance measurements.)
iteratively obtain a sequence of surrogate prediction functions, each surrogate prediction function of said sequence designed to learn a relationship between the input historical experimental design data used to build said physical device and the at least two target measurement values; (Page 840, Section 6.3, the entire section lays out this approach and more specifically the first paragraph notes “After constructing the 4D subspace, it is sampled with highfidelity sampling [61] to create a diverse dataset for training GPR to build a surrogate model. As mentioned before, to reduce the computational cost for creating a training dataset, an adoptive training strategy is utilised in which sampling is performed multiple times to generate a dataset consisting of an appropriate number of samples to construct a reliable prediction model. Initially, training is started with N0 ¼ 100 sampled designs and iterations are performed until a surrogate model with the desired accuracy is achieved.”)
wherein, at each iteration, said obtaining a surrogate prediction function comprises: identifying a set of potentially optimal sub-regions of said plurality of sub- regions; (Page 840, Section 6.3, the entire section lays out this approach and more specifically the first paragraph notes “After constructing the 4D subspace, it is sampled with highfidelity sampling [61] to create a diverse dataset for training GPR to build a surrogate model. As mentioned before, to reduce the computational cost for creating a training dataset, an adoptive training strategy is utilised in which sampling is performed multiple times to generate a dataset consisting of an appropriate number of samples to construct a reliable prediction model. Initially, training is started with N0 ¼ 100 sampled designs and iterations are performed until a surrogate model with the desired accuracy is achieved.”) evaluating, using the one or more processors, a current surrogate prediction function at one or more experimental design data points of a potentially optimal sub- region of said plurality of sub-regions; obtaining, using the one or more processors, an optimal target solution to the current surrogate prediction function based on said evaluating; using said optimal target solution of said current surrogate prediction function to acquire, using the one or more processors, a new experimental design data for successively improving an accuracy of the surrogate prediction function; and repeating said surrogate prediction function evaluating, optimizing and acquiring of new experimental design data to obtain the single surrogate prediction function for optimally predicting said new design; (Page 840, Section 6.3, the entire section lays out this approach and more specifically the second paragraph “During the training of each of the surrogate, hyper-parameters are optimised using Besiyan optimisation. After cross-validation, the model with the least crossvalidation MSE is selected. In each iteration, N00 ¼ 20 more designs are sampled and included in the dataset after evaluating their Dv. The sampling process is stopped after 6th iterations, which resulted in the training of the final model with 220 designs.”)
providing a search space of experimental designs; successively partition said search space into a plurality of sub-regions, one or more sub- regions of said plurality comprising a new design candidate for potentially optimizing said surrogate function; (Page 840, Section 6.3, the entire section lays out this approach and more specifically the second paragraph “During the training of each of the surrogate, hyper-parameters are optimised using Besiyan optimisation. After cross-validation, the model with the least crossvalidation MSE is selected. In each iteration, N00 ¼ 20 more designs are sampled and included in the dataset after evaluating their Dv. The sampling process is stopped after 6th iterations, which resulted in the training of the final model with 220 designs.” Page 828, Left Column, Paragraph 3, “In this loop, parametric design, along with a shape modification method, is coupled with the optimiser, which explores a given design space. At each iteration, design instances searched by the optimiser are fed to shape modification methods. These deform the baseline design to create new geometries evaluated using physics simulations.” This teaches the claimed historical experimental designs and the corresponding historical performance measurements.)
run one of the successive surrogate prediction functions to optimally predict a new design for simultaneously achieving said at least two target measurement values by said physical device; (Page 840, Section 6.4, Paragraph 1 which recites a “final GPR prediction model.” Section 6.4.1, Paragraph 1, “Comparison between initial and optimal design The flow behaviour across the runner profile of the initial and optimised turbine can be observed in Fig. 13. In the initial design of the runner, little swirls can be observed in water after leaving the runner, which indicates that water still has the potential left for power generation. On the other hand, in optimised design, water has a more streamline flow after it leaves the runner. This means the optimised design is capable of extracting most of the energy from the flow.” This section and Figure 13 read on the new design and simultaneously achieving said at least two target measurement values by said physical device.)
and configure one or more of: structures, materials, or process conditions used for building said physical device according to said predicted new design. (Page 840, Section 6.4.1, Paragraph 1, “Comparison between initial and optimal design The flow behaviour across the runner profile of the initial and optimised turbine can be observed in Fig. 13. In the initial design of the runner, little swirls can be observed in water after leaving the runner, which indicates that water still has the potential left for power generation. On the other hand, in optimised design, water has a more streamline flow after it leaves the runner. This means the optimised design is capable of extracting most of the energy from the flow.” This section and Figure 13 read on the new design which utilizes a differing structure.)
Regarding Claim 17: The reference discloses The system according to Claim 15, wherein to evaluate a current surrogate prediction function, said instructions, when executed by the processor, further cause the processor to:
define a search space of experimental designs; (Page 831, right column, Paragraph 2-3, “Afterwards, y is inputted to the optimiser to guide the exploration process towards the promising regions of the design space. As explained earlier, if X is high-dimensional and performance evaluation of the design is time expensive then the whole optimization process can suffer from high computational cost. Therefore, the proposed approach cures this curse of dimensionality with feature extraction techniques to create a lower-dimensional subspace and uses this subspace to construct a surrogate model to bypass the need for designs’ performance evaluation with simulation tools.” This section recites the aspect of a search space as subspaces in the design.)
successively partition said search space into a plurality of sub-regions, one or more sub- regions of said plurality comprising a new design candidate for potentially optimizing said surrogate function, wherein said sub-region comprises a hyper-rectangle. (Page 831, right column, Paragraph 2-3, “Afterwards, y is inputted to the optimiser to guide the exploration process towards the promising regions of the design space. As explained earlier, if X is high-dimensional and performance evaluation of the design is time expensive then the whole optimization process can suffer from high computational cost. Therefore, the proposed approach cures this curse of dimensionality with feature extraction techniques to create a lower-dimensional subspace and uses this subspace to construct a surrogate model to bypass the need for designs’ performance evaluation with simulation tools.” This section recites the aspect of a search space as subspaces in the design as well as their use in the optimization. The last paragraph in this same section recites “For the complex problem, evaluation of C requires solving high order integrals, which, if the dimensionality of the design space is sufficiently small, can be solved with techniques like tensor product Gauss-Legendre quadrature. However, for problems like studied in the present work, the estimation of C is evaluated using pseudorandom sampling techniques such as Monte Carlo or Latin hypercube sampling.” The recited hypercube is a type of claimed hyperrectangle.)
Regarding Claim 18: The reference discloses The system according to Claim 15, wherein to successively partition said search space, said instructions, when executed by the processor, further cause the processor to perform an iterative process comprising, at each iteration:
first conducting a global search for identifying one or more optimal sub-regions that meet a partitioning criteria; (Page 834, Left Column, Section 3.2.2 “We utilise k-fold cross-validation [60] to validate the accuracy of the GPR prediction model and to avoid over-fitting. The dataset D is partitioned into kc folds/groups. From these folds, one fold is used for validation or testing and the remaining kc _ 1 folds are used as the training dataset to train GPR.” The validation or testing fold or partition represents the optimal sub region.)
and then conduct, at each said sub-region, a local search for evaluating a candidate surrogate function value at one or more sampling points representing respective experimental designs within said sub-region; (Page 834, Left Column, Section 3.2.2 “After training, the data in kcth fold is inputted into the trained GPR to predict their performance. Designs in this fold are not used during the training and are used to test the capability of the model for unseen data. The cross-validation MSE between the actual and the predicted performance values for designs in kcth fold is obtained from the trained model. This process is repeated kc times and kc GPR models are trained. The model giving the least cross-validation MSE is selected as the final model.”)
and determine, based on said evaluating said candidate surrogate function, the optimal target solution at said iteration for a sub-region. (Page 834, Left Column, Section 3.2.2 “After training, the data in kcth fold is inputted into the trained GPR to predict their performance. Designs in this fold are not used during the training and are used to test the capability of the model for unseen data. The cross-validation MSE between the actual and the predicted performance values for designs in kcth fold is obtained from the trained model. This process is repeated kc times and kc GPR models are trained. The model giving the least cross-validation MSE is selected as the final model.” With respect to the claimed surrogate function see also Page 840, Section 6.3, first paragraph notes “After constructing the 4D subspace, it is sampled with highfidelity sampling [61] to create a diverse dataset for training GPR to build a surrogate model. As mentioned before, to reduce the computational cost for creating a training dataset, an adoptive training strategy is utilised in which sampling is performed multiple times to generate a dataset consisting of an appropriate number of samples to construct a reliable prediction model. Initially, training is started with N0 ¼ 100 sampled designs and iterations are performed until a surrogate model with the desired accuracy is achieved.”)
Regarding Claim 19: The reference discloses The system according to Claim 15, wherein to conduct a local search at a sub-region, said instructions, when executed by the processor, further cause the processor to: use a Gaussian process for building a local prediction model over the sub-region, said prediction model comprising a candidate surrogate function approximating said optimal target solution using a Bayesian optimization. (Page 833, Left Column, Section 3.2.1. “Gaussian process regression. GPR is a non-parametric Bayesian approach, which has been used in different design applications [47] and proven to be an efficient tool for mapping the nonlinear and globally coupled relationship between inputs and outputs sampled from a theoretically infinite-dimensional normal distribution.”)
Regarding Claim 20: The reference discloses The system according to Claim 15, wherein to conduct a local search at a sub-region, said instructions, when executed by the processor, further cause the processor to: choose, within a sub-region, a sampling point representing an experimental design by minimizing an prediction function, said prediction function defined as one of: an expected improvement with respect to a best function value;
or an upper confidence bound. (Page 842, Right Column, Paragraph 2, “Unlike other approaches, SVR not just finds a function best fitting on the training sample points. Instead, during training, it finds an ε-insensitive region around the function referred to as ε-tube. This tube reformulates the optimisation problem to find the tube that best approximates the nonlinear function while balancing the complexity of the model and error. Similar to GPR, SVR can be used for both linear and non-linear problems and uses a kernel function to map a nonlinear behaviour between t and Dv.” This section reads on the claimed sampling points as well as the function in the context of a “best function” particularly in view of the broadest reasonable interpretation of the term.)
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103(a) are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
6. Claim(s) 21 is rejected under 35 U.S.C. 103 as being unpatentable over Masood in view of Rangel-Patiño, Francisco Elias, et al. "System margining surrogate-based optimization in post-silicon validation." IEEE Transactions on Microwave Theory and Techniques 65.9 (2017): 3109-3115, hereafter Rangel.
Regarding Claim 21: Masood does not explicitly recite The method of claim 1, wherein the physical device is a microprocessor or computer memory device.
However Rangel recites The method of claim 1, wherein the physical device is a microprocessor or computer memory device.
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to utilize a microprocessor as a physical device as in Rangel in the modeling of Masood as recited in claim 1.
Specifically Masood recites, left column page 829, “GAS advances enhanced resource allocation on refining the significant design features and parameters of design from the early design process stage, thus expediting the entire product development for maximal performance improvement [13]. Studying such an effect can be critical for complex engineering problems… Masood further recites, as per the Abstract, the use of surrogate modeling. Rangel further recites the surrogate based modeling approach but in the context of post silicon validation, which reads on the recited microprocessor and computer memory device, as well as constituting at the most basic level a “complex engineering problem” as noted by Masood. It would therefore have motivated a person having ordinary skill in the art to use the methodology of Masood to solve the complex engineering problem recited in Rangel.
7. Claim(s) 22 is rejected under 35 U.S.C. 103 as being unpatentable over Masood in view of Shmulevich et al. U.S. Patent Publication No. 20030225718, hereafter Shmulevich.
Regarding Claim 22: Masood does not explicitly recite The method of claim 1, wherein the model uncertainty quantification term for calculating the prediction uncertainty of prediction models is based upon an R-squared value calculated for a sub-region.
However Shmulevich discloses The method of claim 1, wherein the model uncertainty quantification term for calculating the prediction uncertainty of prediction models is based upon an R-squared value calculated for a sub-region. (“[0021] A method of inference, based on the coefficient of determination typically produces a number of good candidate predictors for each node (e.g.,, target gene or target protein). Since the COD itself is estimated from the data, there is little reason to rely on one good predictor function. Thus, the general approach is to probabilistically synthesize good predictor functions such that each predictor function's contribution is proportional to its determinative potential. For small sample sizes, the complexity of each predictor function can be limited. As new data make themselves available, the model class naturally allows one to narrow down as needed, effectively reducing the uncertainty for predicting each node (e.g., target gene).”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the R-squared calculation of Shmulevich for the calculation in Masood in order to effectively reduce the uncertainty for predicting each node and produce good candidate predictors for each node. (Shmulevich [0021])
8. Claim(s) 23 is rejected under 35 U.S.C. 103 as being unpatentable over Masood in view of Basu et al. U.S. Patent Publication No. 20180116600, hereafter Basu.
Regarding Claim 23: Masood does not explicitly recite The method of claim 1, wherein the model uncertainty quantification term for calculating the prediction uncertainty of prediction models is based upon a penalty term calculated form a principle component analysis (PCA).
However Basu recites The method of claim 1, wherein the model uncertainty quantification term for calculating the prediction uncertainty of prediction models is based upon a penalty term calculated form a principle component analysis (PCA). (“[0058] Another approach would be to use principal component analysis (PCA) or another dimensionality reduction approach on the set of learned regression parameters w.sub.0 . . . w.sub.N as a low-dimensional characterization of the space of valid regression parameters, and then use the calibration data to fit coefficients for the characterization. The characterization would fit into the penalty term Z, as for instance with PCA the underlying Gaussian model of the parameter space would make for an effective prior or penalty (i.e., using negative log likelihood of this model). With only a small number of measurements, either only a very small number of modes could be used, or there would need to be heavy reliance on the penalty term to prevent overfitting. Furthermore, while in principle a second regression model could be trained mapping x.sub.is to θ.sub.i, there is no guarantee the second regression model would be effective, since the modes would have been fit independently of the static measurements.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the penalty term and PCA analysis of Basu in the calculation of Masood in order to prevent overfitting and produce an effective prior or penalty in the calculation. (Basu. [0058])
Conclusion
9. All Claims are rejected.
10. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
i) de Jesús Leal-Romo, Felipe, José Ernesto Rayas-Sánchez, and José Luis Chávez-Hurtado. "Surrogate-based analysis and design optimization of power delivery networks." IEEE Transactions on Electromagnetic Compatibility 62.6 (2020): 2528-2537.
ii) Yu, Jianbo. "Machine health prognostics using the Bayesian-inference-based probabilistic indication and high-order particle filtering framework." Journal of Sound and Vibration 358 (2015): 97-110.
iii) Qian, Zhiguang, et al. "Building surrogate models based on detailed and approximate simulations." (2006): 668-677.
11. Any inquiry concerning this communication or earlier communications from the examiner should be directed to Saif A. Alhija whose telephone number is (571) 272-8635. The examiner can normally be reached on M-F, 10:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Renee Chavez, can be reached at (571) 270-1104. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300. Informal or draft communication, please label PROPOSED or DRAFT, can be additionally sent to the Examiners fax phone number, (571) 273-8635.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
SAA
/SAIF A ALHIJA/Primary Examiner, Art Unit 2186