DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is a Non-Final Action on the Merits. Claims 1-20 are currently pending and are addressed below.
Preliminary Amendment
The preliminary amendment filed on September 13th, 2024 has been considered and entered. Accordingly, claims 1-15 have been amended. Claims 16-20 have been newly added.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on October 8th, 2024, October 21st, 2024, and February 28th, 2025 has been considered and entered.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to a judicial exception (i.e., an abstract idea) without significantly more.
In sum, claims 1-20 are rejected under 35 U.S.C. §101 because the claimed invention is directed to a judicial exception to patentability (i.e., a law of nature, a natural phenomenon, or an abstract idea) and do not include an inventive concept that is something “significantly more” than the judicial exception under the January 2019 patentable subject matter eligibility guidance (2019 PEG) analysis which follows.
Under the 2019 PEG step 1 analysis, it must first be determined whether the claims are directed to one of the four statutory categories of invention (i.e., process, machine, manufacture, or composition of matter). Applying step 1 of the analysis for patentable subject matter to the claims, it is determined that the claims are directed to the statutory category of a process. Therefore, we proceed to step 2A, Prong 1.
Revised Guidance Step 2A – Prong 1
Under the 2019 PEG step 2A, Prong 1 analysis, it must be determined whether the claims recite an abstract idea that falls within one or more designated categories of patent ineligible subject matter (i.e., organizing human activity, mathematical concepts, and mental processes) that amount to a judicial exception to patentability.
Here, with respect to independent claims 1 and 15, the claims recite the abstract idea of determining improved hyperparameters for autonomous vehicle motion planning, and mentally determine ” generate at least one trial set of hyperparameters using a guidance objective configured to evaluate a quality of trial sets of hyperparameters in dependence on the model; generate at least one trial set of hyperparameters using a guidance objective configured to evaluate a quality of trial sets of hyperparameters in dependence on the model; determine a trial outcome of the motion planner in dependence on the trial set of hyperparameters and predetermined journey data; determine a new utility score of the trial set of hyperparameters, wherein the new utility score is determined in dependence on a comparison of the trial outcome with truth outcome data associated with the predetermined journey data; generate a new data pair comprising the trial set of hyperparameters and the new utility score”, where these claims fall within one or more of the three enumerated 2019 PEG categories of patent ineligible subject matter, specifically, a mental process, that can be performed in the human mind since each of the above steps could alternatively be performed in the human mind or with the aid of pen and paper. This conclusion follows from CyberSource Corp. v. Retail Decisions, Inc., where our reviewing court held that section 101 did not embrace a process defined simply as using a computer to perform a series of mental steps that people, aware of each step, can and regularly do perform in their heads. 654 F.3d 1366, 1373 (Fed. Cir. 2011); see also In re Grams, 888 F.2d 835, 840–41 (Fed. Cir. 1989); In re Meyer, 688 F.2d 789, 794–95 (CCPA 1982); Elec. Power Group, LLC v. Alstom S.A., 830 F. 3d 1350, 1354–1354 (Fed. Cir. 2016) (“we have treated analyzing information by steps people go through in their minds, or by mathematical algorithms, without more, as essentially mental processes within the abstract-idea category”).
Additionally, mental processes remain unpatentable even when automated to reduce the burden on the user of what once could have been done with pen and paper. See CyberSource, 654 F.3d at 1375 (“That purely mental processes can be unpatentable, even when performed by a computer, was precisely the holding of the Supreme Court in Gottschalk v. Benson.”). These limitations, as drafted, are a simple process that under their broadest reasonable interpretation, covers the performance of the limitations of the mind. For example, the claim limitation encompasses mentally determining improved hyperparameters for autonomous vehicle motion planning based off of the information provided by the car’s sensors while traveling, or alternatively, mentally determining improved hyperparameters for autonomous vehicle motion planning based on observations by a human.
For example, a human could mentally and with the aid of pen and paper determine improved hyperparameters for autonomous vehicle motion planning.
Revised Guidance Step 2A – Prong 2
Under the 2019 PEG step 2A, Prong 2 analysis, the identified abstract idea to which the claim is directed does not include limitations that integrate the abstract idea into a practical application, since the additional elements of a processor and memory are merely generic components used as a tool (“apply it”) to implement the abstract idea. (See, e.g., MPEP §2106.05(f)). See Alice, 573 U.S. at 223 (“[T]he mere recitation of a generic computer cannot transform a patent-ineligible abstract idea into a patent-eligible invention.”)
In addition, the limitation “receive data comprising at least one data pair, each data pair comprising a set of hyperparameters and a utility score defining a corresponding utility of a motion planner outcome resulting from the set of hyperparameters; provide a model, based on the at least one data pair, wherein the model defines a relationship between the set of hyperparameters and the corresponding utility score” constitutes insignificant presolution activity that merely gathers data and, therefore, do not integrate the exception into a practical application. See In re Bilski, 545 F.3d 943, 963 (Fed. Cir. 2008) (en banc), aff' d on other grounds, 561 U.S. 593 (2010) (characterizing data gathering steps as insignificant extra-solution activity); see also CyberSource, 654 F.3d at 1371–72 (noting that even if some physical steps are required to obtain information from a database (e.g., entering a query via a keyboard, clicking a mouse), such data-gathering steps cannot alone confer patentability); OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363 (Fed. Cir. 2015) (presenting offers and gathering statistics amounted to mere data gathering). Accord Guidance, 84 Fed. Reg. at 55 (citing MPEP § 2106.05(g)).
In addition, merely “[u]sing a computer to accelerate an ineligible mental process does not make that process patent-eligible.” Bancorp Servs., L.L.C. v. Sun Life Assur. Co. of Canada (U.S.), 687 F.3d 1266, 1279 (Fed. Cir. 2012); see also CLS Bank Int’l v. Alice Corp. Pty. Ltd., 717 F.3d 1269, 1286 (Fed. Cir. 2013) (en banc) (“simply appending generic computer functionality to lend speed or efficiency to the performance of an otherwise abstract concept does not meaningfully limit claim scope for purposes of patent eligibility.”), aff’d, 573 U.S. 208 (2014). Accordingly, the additional element of a processor does not transform the abstract idea into a practical application of the abstract idea.
Revised Guidance Step 2B
Under the 2019 PEG step 2B analysis, the additional elements are evaluated to determine whether they amount to something “significantly more” than the recited abstract idea. (i.e., an innovative concept). Here, the additional elements, such as: a processor and a memory does not amount to an innovative concept since, as stated above in the step 2A, Prong 2 analysis, the claims are simply using the additional elements as a tool to carry out the abstract idea (i.e., “apply it”) on a computer or computing device and/or via software programming. (See, e.g., MPEP §2106.05(f)). The additional elements are specified at a high level of generality to simply implement the abstract idea and are not themselves being technologically improved. (See, e.g., MPEP §2106.05 I.A.). See Alice, 573 U.S. at 223 (“[T]he mere recitation of a generic computer cannot transform a patent-ineligible abstract idea into a patent-eligible invention.”). Thus, these elements, taken individually or together, do not amount to “significantly more” than the abstract ideas themselves.
The additional elements of the dependent claims 2-14 and 16-20 merely refine and further limit the abstract idea of the independent claims and do not add any feature that is an “inventive concept” which cures the deficiencies of their respective parent claim under the 2019 PEG analysis. None of the dependent claims considered individually, including their respective limitations, include an “inventive concept” of some additional element or combination of elements sufficient to ensure that the claims in practice amount to something “significantly more” than patent-ineligible subject matter to which the claims are directed.
The elements of the instant claimed invention, when taken in combination do not offer substantially more than the sum of the functions of the elements when each is taken alone. The claims as a whole, do not amount to significantly more than the abstract idea itself because the claims do not effect an improvement to another technology or technical field; the claims do not amount to an improvement to the functioning of an electronic device itself which implements the abstract idea (e.g., the general purpose computer and/or the computer system which implements the process are not made more efficient or technologically improved); the claims do not perform a transformation or reduction of a particular article to a different state or thing (i.e., the claims do not use the abstract idea in the claimed process to bring about a physical change. See, e.g., Diamond v. Diehr, 450 U.S. 175 (1981), where a physical change, and thus patentability, was imparted by the claimed process; contrast, Parker v. Flook, 437 U.S. 584 (1978), where a physical change, and thus patentability, was not imparted by the claimed process); and the claims do not move beyond a general link of the use of the abstract idea to a particular technological environment (e.g., “for determining improved hyperparameters for use in an autonomous vehicle motion planner . . . processors” claim 1).
Accordingly, claims 1-20 are rejected under 35 USC 101 as being drawn to an abstract idea without significantly more, and thus are ineligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 5-6, 9-17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Guzman (Heteroscedastic Bayesian Optimisation for Stochastic Model Predictive Control) (“Guzman”) (Attached) in view of Jiang (Learning-Based Vehicle Dynamics Residual Correction Model for Autonomous Driving Simulation) (“Jiang”) (Attached).
With respect to claim 1, Guzman teaches an apparatus for determining improved hyperparameters for use in an autonomous vehicle motion planner, the apparatus comprising: one or more processors; and a memory storing data in non-transient form defining a program code executable by the one or more processors (See at least Guzman Page 6 “To assess the effects of real heteroscedastic noise in a physical system, we performed experiments on tuning an MPPI controller for a physical robot.”), wherein the one or more processors execute the program code to cause the apparatus to:
receive data comprising at least one data pair, each data pair comprising a set of hyperparameters and a utility score defining a corresponding utility of a motion planner outcome resulting from the set of hyperparameters (See at least Guzman Pages 4-5 “We optimise the controller with BO, a global optimisation method, by maximising the episodic or cumulative reward g dependent on controller hyper-parameters x to solve x∗ = argmaxg(x). Considering g is stochastic, we maximise the expected cumulative reward ˆgt = E[g(xt)]. The controller hyper-parameters are optimised following Algorithm 2. At each BO iteration, we fit the GP model Mwith observations collected up to the current iteration t … an optimal action a∗ i is returned by the MPC controller configured with x and sent to the system actuators. This returns a reward ri that is accumulated in gj(xt), where j is the current repetition. Finally, the optimal controller hyper-parameters x* correspond to those with maximum expected cumulative after nBO optimisation iterations”);
provide a model, based on the at least one data pair, wherein the model defines a relationship between the set of hyperparameters and the corresponding utility score (See at least Guzman Page 4 “The controller hyper-parameters are optimised following Algorithm 2. At each BO iteration, we fit the GP model Mwith observations collected up to the current iteration t.”);
generate at least one trial set of hyperparameters using a guidance objective configured to evaluate a quality of trial sets of hyperparameters in dependence on the model (See at least Guzman Page 4 “Next, we select controller hyper-parameters xt by maximising the acquisition function h with a global optimisation method.”);
determine a trial outcome of the motion planner in dependence on the trial set of hyperparameters (See at least Guzamn Page 4 “We then compute the expected cumulative reward ˆgt empirically by averaging the cumulative rewards obtained after nr episodes of ne time steps each. At each time-step i, an optimal action a∗ i is returned by the MPC controller configured with x and sent to the system actuators.”);
determine a new utility score of the trial set of hyperparameters; generate a new data pair comprising the trial set of hyperparameters and the new utility score (See at least Guzman Pages 4-5 “This returns a reward ri that is accumulated in gj(xt), where j is the current repetition. Finally, the optimal controller hyper-parameters x* correspond to those with maximum expected cumulative after nBO optimisation iterations”).
Guzman fails to explicitly disclose that a trial outcome of the motion planner in dependence on the trial set of hyperparameters and predetermined journey data and that the new utility score is determined in dependence on a comparison of the trial outcome with truth outcome data associated with the predetermined journey data.
Jiang teaches that that a trial outcome of the motion planner in dependence on the trial set of hyperparameters and predetermined journey data (See at least Jiang Page 785 “Evaluation and Tuning: Evaluation dataset is different from training and validation. Four types of scenarios (left-turn, right-turn, U-turn and zig-zag) are designed and collected from open-loop driving data. Each lasts around 20 seconds. The model performance is evaluated by the similarity between residual corrected trajectories and the ground truth trajectories under the same control commands, i.e., m-ATE. The pipeline utilizes these evaluation metrics to construct a loss function for tuning. We select Bayesian-based optimization to tune eight hyperparameters, such as kernel size, loss weights. Based on past evaluations, Bayesian optimization constructs a posterior distribution of functions that best describes the objective function, hence, can efficiently find the optimal set of hyperparameters. All these tuned models are tracked and versioned so that appropriate ones can be deployed on the simulation platform to provide a vehicle dynamics model that truthfully reflects real-world dynamic characteristics.”) and that the new utility score is determined in dependence on a comparison of the trial outcome with truth outcome data associated with the predetermined journey data (See at least Jiang Page 786 “The overall model performance is defined as its trajectory accuracy improvements compared to its plugged-in dynamics base model in all evaluation scenarios. The overall accuracy improvement compared to LB is denoted as IMPLB, and IMPRB when compared to RB … where Js is the total number of scenarios and δj traj,. is the trajectory residual of model predicted trajectory compared to the ground truth trajectory in scenario j. We choose the mean trajectory error, i.e., m-ATE, to represent the trajectory residual, calculated as … where pm,i is a model predicted two dimensional trajectory point at t = i, while pgt,i is the ground truth trajectory point at the same time t = i. The distance between this pair of points is calculated by function dist(.).”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the apparatus of Guzman to include that a trial outcome of the motion planner in dependence on the trial set of hyperparameters and predetermined journey data and that the new utility score is determined in dependence on a comparison of the trial outcome with truth outcome data associated with the predetermined journey data, as taught by Jiang as disclosed above, in order to ensure an accurate updated hyperparameters (Jiang Page 782 “In this paper, we present a learning-based dynamics residual correction mechanism, which corrects the prediction residual of a dynamics base model to increase the overall model prediction accuracy.”).
With respect to claim 2, and similarly claim 16, Guzman in view of Jiang teach that the model is a probabilistic surrogate function, and wherein the guidance objective is configured to generate the at least one trial set of hyperparameters by: fitting the probabilistic surrogate function to one or more of the at least one data pair; and searching a domain space of hyperparameter inputs in dependence on sampling the probabilistic surrogate function (See at least Guzamn Page 3 “C. Gaussian Processes A Gaussian process [20] represents a probability distribution over a space of functions. A GP prior over a function g : X →R is completely specified by a mean m : X → R and a positive-definite covariance function, k : X × X → R. Under the GP prior, the values of g at a finite collection of points {xi}n i=1 ⊂ X follow a multivariate normal distribution g(X) ∼ N(m,K), where g(X) = [g(x1),...,g(xn)]T, m:=m(X), and K is the n-by-n covariance matrix given by [K]i,j = k(xi,xj). 1) Inference: Now suppose we observe y ∈ Rn, where each yi = g(xi) + νi represents a function evaluation corrupted by jointly Gaussian noise ν ∼ N(0,Σν). The joint distribution of the observations and the function value at a point x ∈ X is then given by … where k(x) := [k(x,x1),...,k(x,xn)]T, Conditioning g(x) on the observations yields a Gaussian predictive distribution g(x)|y ∼ N(µ(x),σ2(x)), where … allowing us to infer function values at unobserved locations. Noise model: In general, observation noise ν is assumed to be homoscedastic, which means its distribution is not dependent on the inputs x. However, many applications present noise with a heteroscedastic behaviour, i.e. the noise distribution varies across the domain X. Under the Gaussian assumption, observation noise is simply another (zero-mean) Gaussian process with covariance function kν : X × X → R, so that [Σν]i,j := kν(xi,xj). In the homoscedastic case, the noise covariance function is simply kν(x,x) := σ2 ν, where σν ∈ R is constant, and kν(x,x) = 0 for x= x, yielding the classic Σν = σ2 νI. More generally, however, kν can be an arbitrary positive-definite covariance function. D. Bayesian Optimisation Consider the problem of searching for the global optimum of a function g : X → R over a given compact search space S ⊂X such as determining x∗ ∈ argmaxx∈S g(x). Assume that g is possibly non-convex and only partially observable via noisy estimates yt = g(xt) + νt with νt ∼ N(0,σ2 νt ). In addition, we can only observe the function up to N times. Bayesian optimisation [6] assumes that g is a random variable itself and models it as a stochastic process, which is usually a GP, indexed by X. To select points at which to observe g, BO uses an acquisition function h(x) as a guide that incorporates prior information provided by the GP model and the observations. Each query point xt ∈ S is then selected by maximising h. After collecting an observation yt, BO updates the GP model with the pair (xt,yt) and starts the next iteration with an improved belief about f. The BO loop repeats until we reach the given budget of N evaluations of the objective function. See Algorithm 1 for a summary. The acquisition function h determines which values to sample next. A common and simple acquisition function is the upper confidence bound (UCB) [26] … where κ ∈ R+ is a balance factor. UCB allows balancing exploration and exploitation by valuing points where there is high uncertainty (exploration) or where the GP predictive mean is high (exploitation). Keeping the balance factor κ biased towards exploration avoids local minima”).
With respect to claim 3, and similarly claim 17, Guzman in view of Jiang teach that the probabilistic surrogate function is a gaussian process model (See at least Guzamn Page 3 “C. Gaussian Processes A Gaussian process [20] represents a probability distribution over a space of functions. A GP prior over a function g : X →R is completely specified by a mean m : X → R and a positive-definite covariance function, k : X × X → R. Under the GP prior, the values of g at a finite collection of points {xi}n i=1 ⊂ X follow a multivariate normal distribution g(X) ∼ N(m,K), where g(X) = [g(x1),...,g(xn)]T, m:=m(X), and K is the n-by-n covariance matrix given by [K]i,j = k(xi,xj). 1) Inference: Now suppose we observe y ∈ Rn, where each yi = g(xi) + νi represents a function evaluation corrupted by jointly Gaussian noise ν ∼ N(0,Σν). The joint distribution of the observations and the function value at a point x ∈ X is then given by … where k(x) := [k(x,x1),...,k(x,xn)]T, Conditioning g(x) on the observations yields a Gaussian predictive distribution g(x)|y ∼ N(µ(x),σ2(x)), where … allowing us to infer function values at unobserved locations. Noise model: In general, observation noise ν is assumed to be homoscedastic, which means its distribution is not dependent on the inputs x. However, many applications present noise with a heteroscedastic behaviour, i.e. the noise distribution varies across the domain X. Under the Gaussian assumption, observation noise is simply another (zero-mean) Gaussian process with covariance function kν : X × X → R, so that [Σν]i,j := kν(xi,xj). In the homoscedastic case, the noise covariance function is simply kν(x,x) := σ2 ν, where σν ∈ R is constant, and kν(x,x) = 0 for x= x, yielding the classic Σν = σ2 νI. More generally, however, kν can be an arbitrary positive-definite covariance function. D. Bayesian Optimisation Consider the problem of searching for the global optimum of a function g : X → R over a given compact search space S ⊂X such as determining x∗ ∈ argmaxx∈S g(x). Assume that g is possibly non-convex and only partially observable via noisy estimates yt = g(xt) + νt with νt ∼ N(0,σ2 νt ). In addition, we can only observe the function up to N times. Bayesian optimisation [6] assumes that g is a random variable itself and models it as a stochastic process, which is usually a GP, indexed by X. To select points at which to observe g, BO uses an acquisition function h(x) as a guide that incorporates prior information provided by the GP model and the observations. Each query point xt ∈ S is then selected by maximising h. After collecting an observation yt, BO updates the GP model with the pair (xt,yt) and starts the next iteration with an improved belief about f. The BO loop repeats until we reach the given budget of N evaluations of the objective function. See Algorithm 1 for a summary. The acquisition function h determines which values to sample next. A common and simple acquisition function is the upper confidence bound (UCB) [26] … where κ ∈ R+ is a balance factor. UCB allows balancing exploration and exploitation by valuing points where there is high uncertainty (exploration) or where the GP predictive mean is high (exploitation). Keeping the balance factor κ biased towards exploration avoids local minima”).
With respect to claim 5, and similarly claim 19, Guzman in view of Jiang teach that the search of the domain space of hyperparameter inputs is guided by an acquisition function configured to determine the quality of trial sets of hyperparameters based at least in part on a predicted uncertainty of a value of the surrogate function resulting from a trial set of hyperparameters (See at least Guzman Page 3 “D. Bayesian Optimisation Consider the problem of searching for the global optimum of a function g : X → R over a given compact search space S ⊂X such as determining x∗ ∈ argmaxx∈S g(x). Assume that g is possibly non-convex and only partially observable via noisy estimates yt = g(xt) + νt with νt ∼ N(0,σ2 νt ). In addition, we can only observe the function up to N times. Bayesian optimisation [6] assumes that g is a random variable itself and models it as a stochastic process, which is usually a GP, indexed by X. To select points at which to observe g, BO uses an acquisition function h(x) as a guide that incorporates prior information provided by the GP model and the observations. Each query point xt ∈ S is then selected by maximising h. After collecting an observation yt, BO updates the GP model with the pair (xt,yt) and starts the next iteration with an improved belief about f. The BO loop repeats until we reach the given budget of N evaluations of the objective function. See Algorithm 1 for a summary. The acquisition function h determines which values to sample next. A common and simple acquisition function is the upper confidence bound (UCB) [26] … where κ ∈ R+ is a balance factor. UCB allows balancing exploration and exploitation by valuing points where there is high uncertainty (exploration) or where the GP predictive mean is high (exploitation). Keeping the balance factor κ biased towards exploration avoids local minima”).
With respect to claim 6, and similarly claim 20, Guzman in view of Jiang teach that the acquisition function comprises one or more of the following functions: an expected utility function, a probability of improvement function, and an upper confidence bound function (See at least Guzman Page 3 “The acquisition function h determines which values to sample next. A common and simple acquisition function is the upper confidence bound (UCB) [26] … where κ ∈ R+ is a balance factor. UCB allows balancing exploration and exploitation by valuing points where there is high uncertainty (exploration) or where the GP predictive mean is high (exploitation). Keeping the balance factor κ biased towards exploration avoids local minima”).
With respect to claim 9, Guzman in view of Jiang teach that the trial outcome is a vehicle trajectory comprising a plurality of vehicle motion decisions corresponding to at least one type of vehicle motion action (See at least Guzman Pages 6-7 “E. Experiments with a physical robot To assess the effects of real heteroscedastic noise in a physical system, we performed experiments on tuning an MPPI controller for a physical robot. The four-wheel-drive skid-steer robot (Fig. 10a) was tasked with following a circular path at a set speed. The cost function was formulated as c(st) = d2 t +(vr −vt)2, where dt represents the robot’s distance to the edge of the circle, vr = 0.2 m/s is a reference linear speed, and vt is the current speed. The robot was localised using a particle filter on a prebuilt map. Internally, MPPI employed a kinematic model of the robot [34] for trajectory rollouts which is challenging for MPC as the model does not simulate the dynamics of skid-steering platforms accurately. The controller was configured with M = 50 rollouts and a time horizon T = 400. Episodes lasted 20 seconds with the robot starting from a fixed initial position. The search space S for BO was set as the box defined by the intervals σ ∈ [0.3,0.5] and λ ∈ [0.01,0.21] … Performance results are in Fig. 10c. We compared BOhetero against BOhomo. Both algorithms are eventually able to find high reward regions. However, due to its uniform noise model, BOhomo is led to a more exploratory behaviour, instead of concentrating on promising regions, as evidenced by the query locations in Fig. 11. As a consequence, we observe a significant drop in performance during the optimisation, as shown in Fig. 10c. In contrast, BOhetero maintains a steady high performance, which means lower tracking error with respect to the circular path specified by the cost function.”) (See at least Jiang Page 785 “) Evaluation and Tuning: Evaluation dataset is different from training and validation. Four types of scenarios (left-turn, right-turn, U-turn and zig-zag) are designed and collected from open-loop driving data. Each lasts around 20 seconds. The model performance is evaluated by the similarity between residual corrected trajectories and the ground truth trajectories under the same control commands, i.e., m-ATE. The pipeline utilizes these evaluation metrics to construct a loss function for tuning. We select Bayesian-based optimization to tune eight hyperparameters, such as kernel size, loss weights. Based on past evaluations, Bayesian optimization constructs a posterior distribution of functions that best describes the objective function, hence, can efficiently find the optimal set of hyperparameters. All these tuned models are tracked and versioned so that appropriate ones can be deployed on the simulation platform to provide a vehicle dynamics model that truthfully reflects real-world dynamic characteristics.”).
With respect to claim 10, Guzman in view of Jiang teach that the corresponding utility score represents the accuracy of a trial outcome compared to the truth outcome data, the truth outcome data comprising human-labelled vehicle motion decisions (See at least Jiang Pages 782-783 “In this paper, we present a learning-based dynamics residual correction mechanism, which corrects the prediction residual of a dynamics base model to increase the overall model prediction accuracy. It achieves a mean average trajectory error (m-ATE) [14] of 3.266 m in different 20s-long scenarios, which is 80.93% trajectory accuracy improvement compared with a rule-based (RB) commercial model widely used in industry, and 52.12% trajectory accuracy improvement compared with our previous learning based (LB) model [15]. This is done by building a residual correction mechanism of vehicle dynamics such as heading angle and speed throughout model structure, training loss design and online inference … In model structure, we extend the output of the residual predictor to include speed difference, heading angle difference and heading angle change rate difference prediction. • In loss function, we include the speed mean squared error (MSE) loss as well. • In online inference, we apply the residual corrections to the additional vehicle dynamics mentioned above instead of applying corrections to vehicle position.”).
With respect to claim 11, Guzman in view of Jiang teach that the corresponding utility score is determined in dependence on an objective function which rewards correct decisions in the trial outcome (See at least Jiang Pages 782-783 “In this paper, we present a learning-based dynamics residual correction mechanism, which corrects the prediction residual of a dynamics base model to increase the overall model prediction accuracy. It achieves a mean average trajectory error (m-ATE) [14] of 3.266 m in different 20s-long scenarios, which is 80.93% trajectory accuracy improvement compared with a rule-based (RB) commercial model widely used in industry, and 52.12% trajectory accuracy improvement compared with our previous learning based (LB) model [15]. This is done by building a residual correction mechanism of vehicle dynamics such as heading angle and speed throughout model structure, training loss design and online inference … In model structure, we extend the output of the residual predictor to include speed difference, heading angle difference and heading angle change rate difference prediction. • In loss function, we include the speed mean squared error (MSE) loss as well. • In online inference, we apply the residual corrections to the additional vehicle dynamics mentioned above instead of applying corrections to vehicle position.”) and/or penalises decisions in the trial outcome which are incorrect and which have previously been determined as correct based on an initial set of hyperparameter inputs in the received data
With respect to claim 12, Guzman in view of Jiang teach that predetermined journey data comprises a plurality of journeys and the one or more processors further execute the program code to cause the apparatus to determine a plurality of trial outcomes, one per journey, using the trial set of hyperparameters; and determine the corresponding utility score based on the plurality of trial outcomes (See at least Jiang Pages 782-783 “In this paper, we present a learning-based dynamics residual correction mechanism, which corrects the prediction residual of a dynamics base model to increase the overall model prediction accuracy. It achieves a mean average trajectory error (m-ATE) [14] of 3.266 m in different 20s-long scenarios, which is 80.93% trajectory accuracy improvement compared with a rule-based (RB) commercial model widely used in industry, and 52.12% trajectory accuracy improvement compared with our previous learning based (LB) model [15]. This is done by building a residual correction mechanism of vehicle dynamics such as heading angle and speed throughout model structure, training loss design and online inference … In model structure, we extend the output of the residual predictor to include speed difference, heading angle difference and heading angle change rate difference prediction. • In loss function, we include the speed mean squared error (MSE) loss as well. • In online inference, we apply the residual corrections to the additional vehicle dynamics mentioned above instead of applying corrections to vehicle position.”).
With respect to claim 13, Guzman in view of Jiang teach that the one or more processors further execute the program code to cause the apparatus to: determine the trial outcome of the motion planner using a static simulator (See at least Guzman Page 5 “A. Control Problem Simulations We conducted experiments on benchmark control problems from OpenAI Gym1 [27] and Mujoco [28]: Acrobot, Cartpole, Half-Cheetah, Pendulum, and Reacher. Each control problem has a particular state reward function r(s, a) shown in Table I. We made slight modifications in Reacher and Half-Cheetah. We reduced the effect of actions and gave more priority to the distance to the target in the case of the Reacher problem. For Half-Cheetah, we added more priority to the inclination, since Half-Cheetah would tend to turn upside down as its speed increases. The actuation is then set to finish when such inclination is greater than π/2 or lower than −π/2. These modifications make the rewards more informative for MPPI, enabling it to solve these two tasks. We can then focus the analysis on tuning the controller. The expected cumulative reward represents the expected time the pendulum stays in an upright position in the Acrobot, Cartpole, and Pendulum. It represents the distance traversed in Half-Cheetah, and the speed to reach the target in Reacher. High expected cumulative rewards are the result of motions that increased the reward accordingly, e.g. Half-Cheetah would be expected to reach farther distances. Now, to evaluate the expected cumulative rewards for each problem, we determined fixed values for time horizon T, number of trajectory rollouts M, and MPPI hyper-parameter intervals are shown in Table II. These were found by narrowing down large-enough intervals from near-zero values to 500, taking into account usual values for these hyper-parameters that tend to be close to 0. These are typical in several applications [17], [18]. The table also shows optimal values found within these narrowed intervals via grid search.”) (See at least Jiang Pages 785-786 “2) Data Balancing and Labeling: To generate and balance data, features are first aggregated based on the configurable history segment length (N) and over lapping size. The history segment length indicates the number of consecutive data points from the past in a training input, whereas overlapping size specifies the amount of the same data points to use in the next training input. In each training input, there are N historical datapoints, but only inputs from control commands and the initial vehicle states are from real-world driving data (aka. The ground truth data). The rest of the vehicle states are from the vehicle dynamics model that predicts over ground truth control commands sequences, as shown in Figure 5. A base vehicle dynamics model, the yellow block, updates vehicle dynamics under control command sequences at each step. The vehicle dynamics sequence and the control commands sequence form a N×7matrix as model inputs, blue block in Figure 5. Training output is the residual error between ground truth values and the base model predicted outputs … Labeled data are then classified into different categories based on 50 percentile values of those features in speed, throttle, steering, and brake. Each area is broken down into four to seven sub-categories to reflect the characteristics of different scenarios. For instance, different values of speed can map to very low, low, medium, and high speed scenarios. And steering angles are broken down into seven kinds of turning: slight left/right-turn, left/right-turn, sharp left/right turn or go straight. The combination of the three control commands plus speed suggests the category of a sample. In order to evenly distribute the samples, 8000 is the maximum number of samples in each category, extra ones are discarded. We obtained the training dataset and validation dataset by randomly splitting these samples in each category with a 9:1 ratio.”).
With respect to claim 14, Guzman in view of Jiang teach that the one or more processors further execute the program code to cause the apparatus to: repeatedly perform the following processes: generating at least one trial set of hyperparameters; determining a new utility score; and wherein the received data comprises a previously generated new data pair (See at least Guzman Pages 4-5 “We optimise the controller with BO, a global optimisation method, by maximising the episodic or cumulative reward g dependent on controller hyper-parameters x to solve x∗ = argmaxg(x). Considering g is stochastic, we maximise the expected cumulative reward ˆgt = E[g(xt)]. The controller hyper-parameters are optimised following Algorithm 2. At each BO iteration, we fit the GP model M with observations collected up to the current iteration t … an optimal action a∗ i is returned by the MPC controller configured with x and sent to the system actuators. This returns a reward ri that is accumulated in gj(xt), where j is the current repetition. Finally, the optimal controller hyper-parameters x* correspond to those with maximum expected cumulative after nBO optimisation iterations”).
With respect to claim 15, Guzman teaches an method for determining improved hyperparameters for use in an autonomous vehicle motion planner, the method which applied to an electronic apparatus comprising:
receive data comprising at least one data pair, each data pair comprising a set of hyperparameters and a utility score defining a corresponding utility of a motion planner outcome resulting from the set of hyperparameters (See at least Guzman Pages 4-5 “We optimise the controller with BO, a global optimisation method, by maximising the episodic or cumulative reward g dependent on controller hyper-parameters x to solve x∗ = argmaxg(x). Considering g is stochastic, we maximise the expected cumulative reward ˆgt = E[g(xt)]. The controller hyper-parameters are optimised following Algorithm 2. At each BO iteration, we fit the GP model M with observations collected up to the current iteration t … an optimal action a∗ i is returned by the MPC controller configured with x and sent to the system actuators. This returns a reward ri that is accumulated in gj(xt), where j is the current repetition. Finally, the optimal controller hyper-parameters x* correspond to those with maximum expected cumulative after nBO optimisation iterations”);
provide a model, based on the at least one data pair, wherein the model defines a relationship between the set of hyperparameters and the corresponding utility score (See at least Guzman Page 4 “The controller hyper-parameters are optimised following Algorithm 2. At each BO iteration, we fit the GP model Mwith observations collected up to the current iteration t.”);
generate at least one trial set of hyperparameters using a guidance objective configured to evaluate a quality of trial sets of hyperparameters in dependence on the model (See at least Guzman Page 4 “Next, we select controller hyper-parameters xt by maximising the acquisition function h with a global optimisation method.”);
determine a trial outcome of the motion planner in dependence on the trial set of hyperparameters (See at least Guzamn Page 4 “We then compute the expected cumulative reward ˆgt empirically by averaging the cumulative rewards obtained after nr episodes of ne time steps each. At each time-step i, an optimal action a∗ i is returned by the MPC controller configured with x and sent to the system actuators.”);
determine a new utility score of the trial set of hyperparameters; generate a new data pair comprising the trial set of hyperparameters and the new utility score (See at least Guzman Pages 4-5 “This returns a reward ri that is accumulated in gj(xt), where j is the current repetition. Finally, the optimal controller hyper-parameters x* correspond to those with maximum expected cumulative after nBO optimisation iterations”).
Guzman fails to explicitly disclose that a trial outcome of the motion planner in dependence on the trial set of hyperparameters and predetermined journey data and that the new utility score is determined in dependence on a comparison of the trial outcome with truth outcome data associated with the predetermined journey data.
Jiang teaches that that a trial outcome of the motion planner in dependence on the trial set of hyperparameters and predetermined journey data (See at least Jiang Page 785 “Evaluation and Tuning: Evaluation dataset is differ ent from training and validation. Four types of scenarios (left-turn, right-turn, U-turn and zig-zag) are designed and collected from open-loop driving data. Each lasts around 20 seconds. The model performance is evaluated by the similarity between residual corrected trajectories and the ground truth trajectories under the same control commands, i.e., m-ATE. The pipeline utilizes these evaluation metrics to construct a loss function for tuning. We select Bayesian-based optimiza tion to tune eight hyperparameters, such as kernel size, loss weights. Based on past evaluations, Bayesian optimization constructs a posterior distribution of functions that best describes the objective function, hence, can efficiently find the optimal set of hyperparameters. All these tuned models are tracked and versioned so that appropriate ones can be deployed on the simulation platform to provide a vehicle dynamics model that truthfully reflects real-world dynamic characteristics.”) and that the new utility score is determined in dependence on a comparison of the trial outcome with truth outcome data associated with the predetermined journey data (See at least Jiang Page 786 “The overall model performance is defined as its trajectory accuracy improvements compared to its plugged-in dynamics base model in all evaluation scenarios. The overall accuracy improvement compared to LB is denoted as IMPLB, and IMPRB when compared to RB … where Js is the total number of scenarios and δj traj,. is the trajectory residual of model predicted trajectory compared to the ground truth trajectory in scenario j. We choose the mean trajectory error, i.e., m-ATE, to represent the trajectory residual, calculated as … where pm,i is a model predicted two dimensional trajectory point at t = i, while pgt,i is the ground truth trajectory point at the same time t = i. The distance between this pair of points is calculated by function dist(.).”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Guzman to include that a trial outcome of the motion planner in dependence on the trial set of hyperparameters and predetermined journey data and that the new utility score is determined in dependence on a comparison of the trial outcome with truth outcome data associated with the predetermined journey data, as taught by Jiang as disclosed above, in order to ensure an accurate updated hyperparameters (Jiang Page 1 “In this paper, we present a learning-based dynamics residual correction mechanism, which corrects the prediction residual of a dynamics base model to increase the overall model prediction accuracy.”).
Claims 4, 7-8, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Guzman (Heteroscedastic Bayesian Optimisation for Stochastic Model Predictive Control) (“Guzman”) (Attached) in view of Jiang (Learning-Based Vehicle Dynamics Residual Correction Model for Autonomous Driving Simulation) (“Jiang”) (Attached) further in view of Zhang (On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning) (“Zhang”) (Attached).
With respect to claim 4, and similarly claim 18, Guzman in view of Jiang fail to explicitly disclose that the probabilistic surrogate function is a gaussian mixture model formed from the combination of a plurality of neural networks.
Zhang, however, teaches that the probabilistic surrogate function is a gaussian mixture model formed from the combination of a plurality of neural networks (See at least Zhang Pages 4-5 “Optimizee To demonstrate the importance of HPO for MBRL we use the current state-of-the-art Probabilistic Ensembles With Trajectory Sampling (PETS) (Chua et al., 2018) algorithm as optimizee. PETS uses an ensemble of neural networks to learn a model of the environment which provides aleatoric and epistemic uncertainty estimates. In PETS, the dynamics model is chosen to be an ensemble of neural networks whose outputs parameterize anisotropic Gaussians. Model Predictive Control (MPC) is then used to get the pol icy, by directly optimizing the expected sum of rewards over a fixed planning horizon. PETS, in particular, performs model predictive control with a cross-entropy method (CEM) optimizer for action selection. To eval uate an action sequence, PETS first samples a model from the ensemble, and rolls out the action sequence using the selected model, and computes the sum of re wards. Action sequences are then evaluated by perform ing this process multiple times iteratively and averaging the simulated returns over the ensemble members. Environments For the experiments, we consider four test environments: Pusher, Reacher, Hopper, Halfcheetah from MuJoCo (Todorov et al., 2012) and a simulation environment of Daisy, a robot hexapod to accomplish locomotion tasks. The reward signal for Daisy is similar to Hopper and HalfCheetah, where we use the forward speed as the reward signal. The number of trials is fixed to 80 for pusher and 300 for the rest, with rollout lengths of 150 and 1000 steps, respectively. The actions in the initial trial are sampled randomly to collect data to train the model before it is used by PETS in future trials. We use Hopper and HalfCheetah for illustrative plots in the main paper; quantitative results for the other three tasks can be found in Appendix A.5. Configuration Space We split the hyperparame ters of PETS into two groups: (i) Model Training and (ii) CEM Optimizer; to clearly differentiate the influence these parameter spaces have in the MBRL setting. This allows optimizers to learn interaction effects within each group. When optimizing one group of hyperparameters, the others are set to the default value of the best manually tuned PETS hyperparameters as reported by Chua et al. (2018). We also optimized all hyperparameters together in Appendix A.4. The full configuration spaces are given in Appendix A.3. HPOObjective We use the average returns of the 3 most recent trials as the objective for all HPO methods. This gives us a better estimate of the noisy reward in the MBRL tasks. As discussed in Section 4, we consider two scenarios: 1) the transferability of the hyperparameter schedule (static or dynamic) learned by the HPO methods across environments; and 2) where we are interested in the final learned model and policy reward across multiple runs. In the first scenario, we consider the mean performance of the top 5 members during the search. For the latter scenario, we use the best found schedules of each HPO method and report the performance of PETS over 5 seeds”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the apparatus of Guzman in view of Jiang to include that the probabilistic surrogate function is a gaussian mixture model formed from the combination of a plurality of neural networks, as taught by Zhang as disclosed above, in order to ensure an accurate generation of the trial set of hyperparameters (Zhang Page 1 “Finally, our experiments provide valuable insights into the effects of several hyperparameters, such as plan horizon or learning rate and their influence on the stability of training and resulting rewards.”).
With respect to claim 7, Guzman in view of Jiang fail to explicitly disclose that the search of the domain space of hyperparameter inputs is guided by an evolutionary algorithm.
Zhang, however, teaches that the search of the domain space of hyperparameter inputs is guided by an evolutionary algorithm (See at least Zhang Page 3 “3.1 Population Based Training Population based training (PBT) is an evolutionary approach for dynamic HPO and allows to optimize hyperparameters during the training of the members of its population. PBT starts out with a randomly initialized population. As a result, all members of its population start from different regions in the hyperparameter configuration space. The members are ranked according to their current performance at regular intervals. The worst performing members in the population are then replaced by the best ones, by copying over their parameters (network weights, in case of neural networks NNs)), as well as their hyperparameters in an exploitation step. It is unclear from the original PBT paper, however, whether the data on which the members are trained are also copied over. We discuss ablations of this setting in the experiments section. To allow searching for potentially better performing hyperparameter configurations, the copied hyperparameters are perturbed to allow for small steps in their immediate neighborhoods in an exploration step. Over time, different configurations are evaluated during the training process and potentially kept if they improve performance. PBT comes with its own hyperparameters. In the original paper, members from the population are selected using truncation during the exploitation step. Thereby, agents from the bottom 20% of the population are replaced by agents from the top 20%. Additionally, continuous hyperparameter values are multiplied at random by 0.8 or 1.2 in the exploration step”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the apparatus of Guzman in view of Jiang to include that the search of the domain space of hyperparameter inputs is guided by an evolutionary algorithm, as taught by Zhang as disclosed above, in order to ensure an accurate generation of the trial set of hyperparameters (Zhang Page 1 “Finally, our experiments provide valuable insights into the effects of several hyperparameters, such as plan horizon or learning rate and their influence on the stability of training and resulting rewards.”).
With respect to claim 8, Guzman in view of Jiang in view of Zhang teach that the model is the motion planner, and the guidance objective is configured to generate the at least one trial set of hyperparameters using an evolutionary algorithm to stochastically determine new data pairs in dependence on evaluating a utility score of one or more new sets of hyperparameters (See at least Zhang Page 3 “3.1 Population Based Training Population based training (PBT) is an evolutionary approach for dynamic HPO and allows to optimize hyperparameters during the training of the members of its population. PBT starts out with a randomly initialized population. As a result, all members of its population start from different regions in the hyperparameter configuration space. The members are ranked according to their current performance at regular intervals. The worst performing members in the population are then replaced by the best ones, by copying over their parameters (network weights, in case of neural networks NNs)), as well as their hyperparameters in an exploitation step. It is unclear from the original PBT paper, however, whether the data on which the members are trained are also copied over. We discuss ablations of this setting in the experiments section. To allow searching for potentially better performing hyperparameter configurations, the copied hyperparameters are perturbed to allow for small steps in their immediate neighborhoods in an exploration step. Over time, different configurations are evaluated during the training process and potentially kept if they improve performance. PBT comes with its own hyperparameters. In the original paper, members from the population are selected using truncation during the exploitation step. Thereby, agents from the bottom 20% of the population are replaced by agents from the top 20%. Additionally, continuous hyperparameter values are multiplied at random by 0.8 or 1.2 in the exploration step”).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IBRAHIM ABDOALATIF ALSOMAIRY whose telephone number is (571)272-5653. The examiner can normally be reached M-F 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Faris Almatrahi can be reached at 313-446-4821. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/IBRAHIM ABDOALATIF ALSOMAIRY/ Examiner, Art Unit 3667 /KENNETH J MALKOWSKI/Primary Examiner, Art Unit 3667