Last updated: April 18, 2026
Application No. 18/988,972
METHOD FOR ROBOT DAMAGE RECOVERY BASED ON MULTI-OBJECTIVE MAP-ELITES

Non-Final OA §102
Filed
Dec 20, 2024
Examiner
CAMERON, ATTICUS A
Art Unit
3658
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
South China University Of Technology
OA Round
1 (Non-Final)
Interview Optional

— +11.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 58 resolved cases, 2023–2026
Examiner Intelligence

CAMERON, ATTICUS A View full profile →
Grants 84% — above average
Career Allow Rate
49 granted / 58 resolved
+32.5% vs TC avg
Moderate +11% lift
Without
With
+11.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
58 currently pending
Career history
116
Total Applications
across all art units
Statute-Specific Performance

§101
13.6%
-26.4% vs TC avg
§103
48.0%
+8.0% vs TC avg
§102
30.8%
-9.2% vs TC avg
§112
5.9%
-34.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 58 resolved cases
Office Action

§102
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .	
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Joint Inventors
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). A certified copy of this document has been placed in the file wrapper. As such, the effective filing date of the instant application is considered 05/17/2024, coinciding with the filing date of the People’s Republic of China application to which foreign priority was requested.
Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-10 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Cully et al. (‘Robots that can adapt like animals’, referred to as Cully).

Regarding claim 1: Cully discloses: A method for robot damage recovery based on multi-objective MAP-Elites, the robot being a multi-legged robot, wherein the method comprises a behavior map construction phase and a damage adaptation phase, which respectively correspond to an undamaged environment and a damaged environment of the robot, both the undamaged environment and the damaged environment are simulated environments, and there is at least one damaged environment; ([pg. 1, col. 1, lines 1-24] As robots leave the controlled environments of factories to autonomously function in more complex, natural environments1,2,3, they will have to respond to the inevitable fact that they will become damaged4,5. However, while animals can quickly adapt to a wide variety of injuries, current robots cannot “think outside the box” to find a compensatory behavior when damaged: they are limited to their pre-specified self sensing abilities, can diagnose only anticipated failure modes6 , and require a pre-programmed contingency plan for every type of potential damage, an impracticality for complex robots4,5 . Here we introduce an intelligent trial and error algorithm that allows robots to adapt to damage in less than two minutes, without requiring self-diagnosis or pre-specified contingency plans. Before deployment, a robot exploits a novel algorithm to create a detailed map of the space of high-performing behaviors: This map represents the robot’s intuitions about what behaviors it can perform and their value. If the robot is damaged, it uses these intuitions to guide a trial-and-error learning algorithm that conducts intelligent experiments to rapidly discover a compensatory behavior that works in spite of the damage. Experiments reveal successful adaptations for a legged robot injured in five different ways, including damaged, broken, and missing legs, and for a robotic arm with joints broken in 14 different ways.) the behavior map construction phase comprises the following steps: T1, initializing a behavior map, the behavior map comprising a plurality of grids, with at least one grid storing one controller parameter; T2, picking one of the controller parameters from the behavior map as a parent controller parameter; obtaining a plurality of sample controller parameters by sampling around the parent controller parameter; interacting each of these sample controller parameters with the undamaged environment, respectively, and obtaining an evaluation result by evaluating the interaction, the evaluation result comprising a behavior characteristic, a distance fitness value, and a cost fitness value; obtaining gradient direction information by calculating an evolutionary gradient based on the evaluation result; wherein, the ‘interaction’ specifically refers to controlling the robot to execute an episode in the simulated environment, which consists of a sequence of steps facilitated by application of the controller parameter; T3, obtaining a child controller parameter by evolving the parent controller parameter according to the gradient direction information; obtaining an evaluation result by interacting the child controller parameter with the undamaged environment, and identifying an objective grid by locating a position of the child controller parameter within the behavior map based on the behavior characteristic; T4, comparing a dominance relationship between all the controller parameters within the objective grid and the child controller parameter according to the distance fitness value and the cost fitness value; according to the dominance relationship, storing the child controller parameter to the objective grid, or replacing the controller parameter within the objective grid, or discarding the child controller parameter; and T5, repeatedly executing steps T2 to T4 until a preset first iteration stop condition is reached; the damage adaptation phase comprises the following steps: T6, selecting one damaged environment, initializing a damage recovery model using a map-based Bayesian optimization algorithm and the behavior map; obtaining an optimal controller parameter by adjusting and searching the damage recovery model; causing the multi-legged robot to recover in the damaged environment by using the optimal controller parameter; and T7, repeating step T6 until all the damaged environments are simulated. ([pg. 15, col. 2, lines 21-31] There are 6 parameters for each leg (αi1 , αi2 , φi1 , φi2 , τi1 , τi2 ), therefore each controller is fully described by 36 parameters. Each parameter can have one of these possible values: 0, 0.05, 0.1, ... 0.95, 1. Different values for these 36 parameters can produce numerous different gaits, from purely quadruped gaits to classic tripod gaits. This controller is designed to be simple enough to show the performance of the algorithm in an intuitive setup. Nevertheless, the algorithm will work with any type of controller, including bio-inspired central pattern generators48 and evolved neural networks49,50,51,52 .)

Regarding claim 2: Cully discloses: The method for robot damage recovery based on multi-objective MAP-Elites according to claim 1, 


Cully further discloses: wherein T1 comprises the following specific steps: T11, initializing a behavior space, which has a storage capacity to store Num controller parameters; converting the behavior space into the behavior map with the plurality of grids according to a preset discrete value Dis; and T12, obtaining the controller parameters by randomly initializing a neural network model parameters based on a fully connected neural network; controlling the robot to interact with the undamaged environment by using the controller parameters and obtaining an evaluation result by evaluating the interaction; locating the controller parameters according to the behavior characteristic and storing them in the grids. ([pg. 15, col. 2, lines 21-31] There are 6 parameters for each leg (αi1 , αi2 , φi1 , φi2 , τi1 , τi2 ), therefore each controller is fully described by 36 parameters. Each parameter can have one of these possible values: 0, 0.05, 0.1, ... 0.95, 1. Different values for these 36 parameters can produce numerous different gaits, from purely quadruped gaits to classic tripod gaits. This controller is designed to be simple enough to show the performance of the algorithm in an intuitive setup. Nevertheless, the algorithm will work with any type of controller, including bio-inspired central pattern generators48 and evolved neural networks49,50,51,52 . [pg. 17, col. 2, lines 2-4] For the behavior-performance map, this rectangle is discretized into a grid composed of 20000 square cells (200 × 100).)

Regarding claim 3: Cully discloses: The method for robot damage recovery based on multi-objective MAP-Elites according to claim 2,

Cully further discloses: wherein, in T11, a preset dimension value Dim is also comprised, wherein the behavior space is uniformly discretized into Dis parts along each dimension according to the discrete value Dis to obtain the behavior map whose number of the grids is DisDim; the number of the controller parameters that each of the grids accommodates is NumDisDim. ([pg. 15, col. 2, lines 21-31] There are 6 parameters for each leg (αi1 , αi2 , φi1 , φi2 , τi1 , τi2 ), therefore each controller is fully described by 36 parameters. Each parameter can have one of these possible values: 0, 0.05, 0.1, ... 0.95, 1. Different values for these 36 parameters can produce numerous different gaits, from purely quadruped gaits to classic tripod gaits. This controller is designed to be simple enough to show the performance of the algorithm in an intuitive setup. Nevertheless, the algorithm will work with any type of controller, including bio-inspired central pattern generators48 and evolved neural networks49,50,51,52 . [pg. 17, col. 2, lines 2-4] For the behavior-performance map, this rectangle is discretized into a grid composed of 20000 square cells (200 × 100).)

Regarding claim 4: Cully discloses: The method for robot damage recovery based on multi-objective MAP-Elites according to claim 1,

Cully further discloses: wherein the behavior characteristic is a multi-dimensional vector, each dimension of this vector represents the proportion of time that a given foot of the robot is in contact with the ground during each episode of steps, with a value ranging from 0 to 1; each of the grids corresponds to a unique first identifier, the first identifier isa multi-dimensional array or a multi-dimensional vector and comprises a plurality of index values; and dimensions of the behavior map, the behavior characteristic, and the first identifier are the same; the locating process is performed by dividing the value range into a plurality of intervals according to the dimension, and mapping and converting each parameter of the behavior characteristic into the index value based on its sequential position according to the intervals, thereby obtaining the corresponding first identifier. ([pg. 15, col. 2, lines 21-31] There are 6 parameters for each leg (αi1 , αi2 , φi1 , φi2 , τi1 , τi2 ), therefore each controller is fully described by 36 parameters. Each parameter can have one of these possible values: 0, 0.05, 0.1, ... 0.95, 1. Different values for these 36 parameters can produce numerous different gaits, from purely quadruped gaits to classic tripod gaits. This controller is designed to be simple enough to show the performance of the algorithm in an intuitive setup. Nevertheless, the algorithm will work with any type of controller, including bio-inspired central pattern generators48 and evolved neural networks49,50,51,52 . [pg. 17, col. 2, lines 2-4] For the behavior-performance map, this rectangle is discretized into a grid composed of 20000 square cells (200 × 100).)

Regarding claim 5: Cully discloses: The method for robot damage recovery based on multi-objective MAP-Elites according to claim 1,

Cully further discloses: wherein T2 comprises the following specific steps: T21, in the behavior map, taking a sum of the distance fitness value and the cost fitness value as an equal-weight overall fitness value; selecting the grid where the controller parameter with the maximum equal-weight overall fitness value is located; or for the most recent a grids where the controller parameters have been stored, ranking in a descending order based on the equal-weight overall fitness value, randomly selecting one of the top b grids, wherein a and b are predefined integer values, and a≥b; T22, randomly selecting one of the controller parameters from the selected grid as the parent controller parameter; T23, constructing an isotropic multivariate Gaussian distribution based on the parent controller parameter, generating the plurality of sample controller parameters by randomly sampling in the multivariate Gaussian distribution; T24, controlling the robot to interact with the undamaged environment using the sample controller parameters and obtaining the evaluation result by performing evaluation; and T25, assigning different weights to the distance fitness value and the cost fitness value of the sample controller parameter, respectively, calculating a weighted overall fitness value; obtaining the gradient direction information by performing gradient estimation on the weighted overall fitness value using a stochastic gradient ascent method. ([pg. 15, col. 2, lines 21-31] There are 6 parameters for each leg (αi1 , αi2 , φi1 , φi2 , τi1 , τi2 ), therefore each controller is fully described by 36 parameters. Each parameter can have one of these possible values: 0, 0.05, 0.1, ... 0.95, 1. Different values for these 36 parameters can produce numerous different gaits, from purely quadruped gaits to classic tripod gaits. This controller is designed to be simple enough to show the performance of the algorithm in an intuitive setup. Nevertheless, the algorithm will work with any type of controller, including bio-inspired central pattern generators48 and evolved neural networks49,50,51,52 . [pg. 17, col. 2, lines 2-4] For the behavior-performance map, this rectangle is discretized into a grid composed of 20000 square cells (200 × 100). ([pg. 1-2, col. 2-1, lines 47-16] A low confidence is assigned to the predicted performance of behaviors stored in this behavior-performance map because they have not been tried in reality (Fig. 2B and Extended Data Fig. 1). During the robot’s mission, if it senses a performance drop, it selects the most promising behavior from the behavior performance map, tests it, and measures its performance. The robot subsequently updates its prediction for that behavior and nearby behaviors, assigns high confidence to these predictions (Fig. 2C and Extended Data Fig. 1), and continues the selection/test/update process until it finds a satisfactory compensatory behavior (Fig. 2D and Extended Data Fig. 1). All of these ideas are technically captured via a Gaussian process model21, which approximates the performance function with already acquired data, and a Bayesian optimization procedure22,23, which exploits this model to search for the maximum of the performance function (Methods). The robot selects which behaviors to test by maximizing an information acquisition function that balances exploration (selecting points whose performance is uncertain) and exploitation (selecting points whose performance is expected to be high) (Methods). The selected behavior is tested on the physical robot and the actual performance is recorded. The algorithm updates the expected performance of the tested behavior and lowers the uncertainty about it. These updates are propagated to neighboring solutions in the behavioral space by updating the Gaussian process (Methods). These updated performance and confidence distributions affect which behavior is tested next. This select-test-update loop repeats until the robot finds a behavior whose measured performance is greater than 90% of the best performance predicted for any behavior in the behavior-performance map (Methods). [pg. 14, lines 8-22] procedure MAP-ELITES ALGORITHM (P ← ∅, C ← ∅) . Creation of an empty behavior-performance map (empty N-dimensional grid). for iter = 1 → I do . Repeat during I iterations (here we choose I = 40 million iterations). if iter < 400 then c 0 ← random_controller() . The first 400 controllers are generated randomly. else . The next controllers are generated using the map. c ← random_selection(C) . Randomly select a controller c in the map. c 0 ← random_variation(c) . Create a randomly modified copy of c. x 0 ←behavioral_descriptor(simu(c 0 )) . Simulate the controller and record its behavioral descriptor. p 0 ←performance(simu(c 0 )) . Record its performance. if P(x 0 ) = ∅ or P(x 0 ) < p0 then . If the cell is empty or if p 0 is better than the current stored performance. P(x 0 ) ← p 0 . Store the performance of c 0 in the behavior-performance map according . to its behavioral descriptor x 0 . C(x 0 ) ← c 0 . Associate the controller with its behavioral descriptor. return behavior-performance map (P and C))))

Regarding claim 6: Cully discloses: The method for robot damage recovery based on multi-objective MAP-Elites according to claim 5,

Cully further discloses: wherein the distance fitness value, the cost fitness value and the weighted overall fitness value are represented by D(θ), Cθand F(θ), respectively; the corresponding weights for the distance fitness value and the cost fitness value are ∝, and β, respectively; in each iteration, the process of calculating the weighted overall fitness value is as follows: ∝=∝-2×w-RN, β=1-∝, Fθ=∝Dθ+βC(θ); wherein, Ris a weight range control parameter; w is a distance function initial weight, the weight range corresponding to the distance fitness value is [R-w,w]; N is the number of iterations, θis the controller parameter. ([pg. 1-2, col. 2-1, lines 47-16] A low confidence is assigned to the predicted performance of behaviors stored in this behavior-performance map because they have not been tried in reality (Fig. 2B and Extended Data Fig. 1). During the robot’s mission, if it senses a performance drop, it selects the most promising behavior from the behavior performance map, tests it, and measures its performance. The robot subsequently updates its prediction for that behavior and nearby behaviors, assigns high confidence to these predictions (Fig. 2C and Extended Data Fig. 1), and continues the selection/test/update process until it finds a satisfactory compensatory behavior (Fig. 2D and Extended Data Fig. 1). All of these ideas are technically captured via a Gaussian process model21, which approximates the performance function with already acquired data, and a Bayesian optimization procedure22,23, which exploits this model to search for the maximum of the performance function (Methods). The robot selects which behaviors to test by maximizing an information acquisition function that balances exploration (selecting points whose performance is uncertain) and exploitation (selecting points whose performance is expected to be high) (Methods). The selected behavior is tested on the physical robot and the actual performance is recorded. The algorithm updates the expected performance of the tested behavior and lowers the uncertainty about it. These updates are propagated to neighboring solutions in the behavioral space by updating the Gaussian process (Methods). These updated performance and confidence distributions affect which behavior is tested next. This select-test-update loop repeats until the robot finds a behavior whose measured performance is greater than 90% of the best performance predicted for any behavior in the behavior-performance map (Methods). [pg. 14, lines 8-22] procedure MAP-ELITES ALGORITHM (P ← ∅, C ← ∅) . Creation of an empty behavior-performance map (empty N-dimensional grid). for iter = 1 → I do . Repeat during I iterations (here we choose I = 40 million iterations). if iter < 400 then c 0 ← random_controller() . The first 400 controllers are generated randomly. else . The next controllers are generated using the map. c ← random_selection(C) . Randomly select a controller c in the map. c 0 ← random_variation(c) . Create a randomly modified copy of c. x 0 ←behavioral_descriptor(simu(c 0 )) . Simulate the controller and record its behavioral descriptor. p 0 ←performance(simu(c 0 )) . Record its performance. if P(x 0 ) = ∅ or P(x 0 ) < p0 then . If the cell is empty or if p 0 is better than the current stored performance. P(x 0 ) ← p 0 . Store the performance of c 0 in the behavior-performance map according . to its behavioral descriptor x 0 . C(x 0 ) ← c 0 . Associate the controller with its behavioral descriptor. return behavior-performance map (P and C)))

Regarding claim 7: Cully discloses: The method for robot damage recovery based on multi-objective MAP-Elites according to claim 1,

Cully further discloses: wherein, in T4, the dominance relationship comprises a completely dominating relationship, a completely dominated relationship, and a non-dominance relationship; in response to determining that the child controller parameter outperforms one or more controller parameters within the objective grid in terms of both the distance fitness value and the cost fitness value, this situation is classified as the completely dominating relationship; in this case, all the controller parameters within the objective grid that are fully dominated by the child controller parameter are removed, and the child controller parameter is stored in the objective grid; in response to determining that at least one controller parameter within the objective grid completely dominates the child controller parameter, it is considered as the completely dominated relationship, and the child controller parameter is discarded; and in response to determining that the dominance relationship between the child controller parameter and the controller parameters within the objective grid is neither the completely dominating relationship nor the completely dominated relationship, it is considered as the non-dominance relationship; in this case, it is judged whether a storage space of the objective grid has reached the maximum capacity: if not, the child controller parameter is directly stored in the objective grid; otherwise, one controller parameter within the objective grid is randomly selected and replaced with the child controller parameter. ([pg. 1-2, col. 2-1, lines 47-16] A low confidence is assigned to the predicted performance of behaviors stored in this behavior-performance map because they have not been tried in reality (Fig. 2B and Extended Data Fig. 1). During the robot’s mission, if it senses a performance drop, it selects the most promising behavior from the behavior performance map, tests it, and measures its performance. The robot subsequently updates its prediction for that behavior and nearby behaviors, assigns high confidence to these predictions (Fig. 2C and Extended Data Fig. 1), and continues the selection/test/update process until it finds a satisfactory compensatory behavior (Fig. 2D and Extended Data Fig. 1). All of these ideas are technically captured via a Gaussian process model21, which approximates the performance function with already acquired data, and a Bayesian optimization procedure22,23, which exploits this model to search for the maximum of the performance function (Methods). The robot selects which behaviors to test by maximizing an information acquisition function that balances exploration (selecting points whose performance is uncertain) and exploitation (selecting points whose performance is expected to be high) (Methods). The selected behavior is tested on the physical robot and the actual performance is recorded. The algorithm updates the expected performance of the tested behavior and lowers the uncertainty about it. These updates are propagated to neighboring solutions in the behavioral space by updating the Gaussian process (Methods). These updated performance and confidence distributions affect which behavior is tested next. This select-test-update loop repeats until the robot finds a behavior whose measured performance is greater than 90% of the best performance predicted for any behavior in the behavior-performance map (Methods). [pg. 14, lines 8-22] procedure MAP-ELITES ALGORITHM (P ← ∅, C ← ∅) . Creation of an empty behavior-performance map (empty N-dimensional grid). for iter = 1 → I do . Repeat during I iterations (here we choose I = 40 million iterations). if iter < 400 then c 0 ← random_controller() . The first 400 controllers are generated randomly. else . The next controllers are generated using the map. c ← random_selection(C) . Randomly select a controller c in the map. c 0 ← random_variation(c) . Create a randomly modified copy of c. x 0 ←behavioral_descriptor(simu(c 0 )) . Simulate the controller and record its behavioral descriptor. p 0 ←performance(simu(c 0 )) . Record its performance. if P(x 0 ) = ∅ or P(x 0 ) < p0 then . If the cell is empty or if p 0 is better than the current stored performance. P(x 0 ) ← p 0 . Store the performance of c 0 in the behavior-performance map according . to its behavioral descriptor x 0 . C(x 0 ) ← c 0 . Associate the controller with its behavioral descriptor. return behavior-performance map (P and C))

Regarding claim 8: Cully discloses: The method for robot damage recovery based on multi-objective MAP-Elites according to claim 1,

Cully further discloses: wherein each leg of the multi-legged robot comprises at least two joints, which are in either a damaged or normal state; in T6, for the controller parameter, a performance value is obtained by adding a product of the distance fitness value with a custom weight and a product of the cost fitness value with another custom weight; wherein the performance values calculated for the controller parameter after interacting with the undamaged environment and the damaged environment are respectively a first performance value and a second performance value; T6 comprises the following specific steps: T61, setting an arbitrary number of the joints at arbitrary positions to the damaged state to simulate the damaged environment; T62, in the behavior map, calculating the first performance values for all the controller parameters, wherein the controller parameter with the maximum first performance value is the first objective parameter; constructing a Gaussian process model by adopting the map-based Bayesian optimization algorithm and using the behavior characteristics and the first performance values of all the controller parameters from the behavior map, wherein the Gaussian process model is used to predict the performance of the controller parameters; the Gaussian process model is structured as a dictionary: its keys are unique second identifiers corresponding one-to-one with the controller parameters within the behavior map; the value of the dictionary is a tuple consisting of a mean μ and a variance σ2, wherein the mean represents an estimated performance of the controller parameter; T63, constructing an acquisition function by utilizing the mean and variance; calculating a function value for the controller parameter using the acquisition function and selecting the controller parameter with the maximum function value as a second objective parameter; T64, obtaining an evaluation result by interacting the second objective parameter with the damaged environment; updating the Gaussian process model by using the characteristic and the second performance value of the second objective parameter; T65, repeating steps T63 and T64 until a preset second iteration stop condition is met; and T66, selecting the controller parameter with the maximum estimated performance as a third objective parameter; obtaining an evaluation result by interacting both the first objective parameter and the third objective parameter with the damaged environment, and selecting the one with the maximum second performance value as the optimal controller parameter. ([pg. 1-2, col. 2-1, lines 47-16] A low confidence is assigned to the predicted performance of behaviors stored in this behavior-performance map because they have not been tried in reality (Fig. 2B and Extended Data Fig. 1). During the robot’s mission, if it senses a performance drop, it selects the most promising behavior from the behavior performance map, tests it, and measures its performance. The robot subsequently updates its prediction for that behavior and nearby behaviors, assigns high confidence to these predictions (Fig. 2C and Extended Data Fig. 1), and continues the selection/test/update process until it finds a satisfactory compensatory behavior (Fig. 2D and Extended Data Fig. 1). All of these ideas are technically captured via a Gaussian process model21, which approximates the performance function with already acquired data, and a Bayesian optimization procedure22,23, which exploits this model to search for the maximum of the performance function (Methods). The robot selects which behaviors to test by maximizing an information acquisition function that balances exploration (selecting points whose performance is uncertain) and exploitation (selecting points whose performance is expected to be high) (Methods). The selected behavior is tested on the physical robot and the actual performance is recorded. The algorithm updates the expected performance of the tested behavior and lowers the uncertainty about it. These updates are propagated to neighboring solutions in the behavioral space by updating the Gaussian process (Methods). These updated performance and confidence distributions affect which behavior is tested next. This select-test-update loop repeats until the robot finds a behavior whose measured performance is greater than 90% of the best performance predicted for any behavior in the behavior-performance map (Methods). [pg. 14, lines 8-22] procedure MAP-ELITES ALGORITHM (P ← ∅, C ← ∅) . Creation of an empty behavior-performance map (empty N-dimensional grid). for iter = 1 → I do . Repeat during I iterations (here we choose I = 40 million iterations). if iter < 400 then c 0 ← random_controller() . The first 400 controllers are generated randomly. else . The next controllers are generated using the map. c ← random_selection(C) . Randomly select a controller c in the map. c 0 ← random_variation(c) . Create a randomly modified copy of c. x 0 ←behavioral_descriptor(simu(c 0 )) . Simulate the controller and record its behavioral descriptor. p 0 ←performance(simu(c 0 )) . Record its performance. if P(x 0 ) = ∅ or P(x 0 ) < p0 then . If the cell is empty or if p 0 is better than the current stored performance. P(x 0 ) ← p 0 . Store the performance of c 0 in the behavior-performance map according . to its behavioral descriptor x 0 . C(x 0 ) ← c 0 . Associate the controller with its behavioral descriptor. return behavior-performance map (P and C)))

Regarding claim 9: Cully discloses: 

Cully further discloses: wherein, in T61, an updated container is also constructed; in T63, the second objective parameter is also stored in the updated container; in T62, mean and variance initialization is performed for all the controller parameters within the behavior map; wherein, the first performance value of each controller parameter is normalized, the first performance value is converted to a decimal value within the range of [0,1], which serves as an initial value of the mean; an initial value of the variance is calculated as: σi2=M(BCi,BCi); wherein, i is the second identifier, representing the i-th controller parameter within the behavior map; BCi represents the characteristic of the i-th controller parameter; M(x,y) is a kernel function, and the construction formula of the kernel function is: Mx,y=1+5dv+5d23v2×exp⁡-5dv; wherein, x and y represent two behavior characteristics, d represents an Euclidean distance between x and y, and v is a preset length scale parameter; exp⁡-5dv is an exponential function, representing e raised to the power of -5dv; the kernel function is used to calculate the correlation between two behavior characteristics; in T64, all the means and variances of the Gaussian process model are updated; wherein, the process for updating the means is as follows: T641, constructing a performance difference vector Pdiff; for all the second objective parameters in the updated container, calculating a difference of the first performance value and second performance value of each second objective parameter, and storing the difference into the performance difference vector; T642, for all the second objective parameters within the updated container, calculating the correlation between the behavior characteristics of any two controller parameters by adopting the kernel function, and obtaining a covariance matrix K after adding Gaussian white noise with a variance of 0.01; T643, for all the controller parameters within the behavior map, obtaining a covariance matrix k by adopting the kernel function to calculate the correlation of the behavior characteristics between all the controller parameters within the behavior map and all the second objective parameters within the updated container; T644, in the Gaussian process model, updating the mean according to the following formula: μi=pundamaged_i+kT∙(K-1∙Pdiff); wherein, pundamaged_i represents the first performance value of the i-th controller parameter; the process for updating the variance is as follows: T645, updating the variance corresponding to the i-th controller parameter according to the following formula: σi2=MBCi,BCi-kT∙K-1∙k; that is, an autocorrelation metric of the behavior characteristic minus the dot product of each covariance matrices. ([pg. 1-2, col. 2-1, lines 47-16] A low confidence is assigned to the predicted performance of behaviors stored in this behavior-performance map because they have not been tried in reality (Fig. 2B and Extended Data Fig. 1). During the robot’s mission, if it senses a performance drop, it selects the most promising behavior from the behavior performance map, tests it, and measures its performance. The robot subsequently updates its prediction for that behavior and nearby behaviors, assigns high confidence to these predictions (Fig. 2C and Extended Data Fig. 1), and continues the selection/test/update process until it finds a satisfactory compensatory behavior (Fig. 2D and Extended Data Fig. 1). All of these ideas are technically captured via a Gaussian process model21, which approximates the performance function with already acquired data, and a Bayesian optimization procedure22,23, which exploits this model to search for the maximum of the performance function (Methods). The robot selects which behaviors to test by maximizing an information acquisition function that balances exploration (selecting points whose performance is uncertain) and exploitation (selecting points whose performance is expected to be high) (Methods). The selected behavior is tested on the physical robot and the actual performance is recorded. The algorithm updates the expected performance of the tested behavior and lowers the uncertainty about it. These updates are propagated to neighboring solutions in the behavioral space by updating the Gaussian process (Methods). These updated performance and confidence distributions affect which behavior is tested next. This select-test-update loop repeats until the robot finds a behavior whose measured performance is greater than 90% of the best performance predicted for any behavior in the behavior-performance map (Methods). [pg. 7, col. 2, lines 18-30] A Gaussian process is therefore a generalization of a n-variate normal distribution, where n is the number of observations. The covariance matrix is what relates one observation to another: two observations that correspond to nearby values of χ1 and χ2 are likely to be correlated (this is a prior assumption based on the fact that functions tend to be smooth, and is injected into the algorithm via a prior on the likelihood of functions), two observations that correspond to distant values of χ1 and χ2 should not influence each other (i.e. their distributions are not correlated). Put differently, the covariance matrix represents that distant samples are almost uncorrelated and nearby samples are strongly correlated. This covariance matrix is defined via a kernel function, called k(χ1, χ2), which is usually based on the Euclidean distance between χ1 and χ2 (see the “kernel function” sub-section below).)

Regarding claim 10: Cully discloses: The method for robot damage recovery based on multi-objective MAP-Elites according to claim 8,

Cully further discloses: wherein the acquisition function is expressed asUCBi=μi+κσi2, wherein κ is an exploration parameter, and i is the second identifier. ([pg. 1-2, col. 2-1, lines 47-16] A low confidence is assigned to the predicted performance of behaviors stored in this behavior-performance map because they have not been tried in reality (Fig. 2B and Extended Data Fig. 1). During the robot’s mission, if it senses a performance drop, it selects the most promising behavior from the behavior performance map, tests it, and measures its performance. The robot subsequently updates its prediction for that behavior and nearby behaviors, assigns high confidence to these predictions (Fig. 2C and Extended Data Fig. 1), and continues the selection/test/update process until it finds a satisfactory compensatory behavior (Fig. 2D and Extended Data Fig. 1). All of these ideas are technically captured via a Gaussian process model21, which approximates the performance function with already acquired data, and a Bayesian optimization procedure22,23, which exploits this model to search for the maximum of the performance function (Methods). The robot selects which behaviors to test by maximizing an information acquisition function that balances exploration (selecting points whose performance is uncertain) and exploitation (selecting points whose performance is expected to be high) (Methods). The selected behavior is tested on the physical robot and the actual performance is recorded. The algorithm updates the expected performance of the tested behavior and lowers the uncertainty about it. These updates are propagated to neighboring solutions in the behavioral space by updating the Gaussian process (Methods). These updated performance and confidence distributions affect which behavior is tested next. This select-test-update loop repeats until the robot finds a behavior whose measured performance is greater than 90% of the best performance predicted for any behavior in the behavior-performance map (Methods).)

Conclusion
	The prior art made of record, and not relied upon, considered pertinent to applicant' s disclosure or directed to the state of art is listed on the enclosed PTO-892.  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATTICUS A CAMERON whose telephone number is 703-756-4535. The examiner can normally be reached M-F 8:30 am - 4:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thomas Worden can be reached on 571-272-4876. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ATTICUS A CAMERON/
Examiner, Art Unit 3658A                                                                    /JASON HOLLOWAY/                                                                                                                 Primary Examiner, Art Unit 3658
Read full office action
Prosecution Timeline

Dec 20, 2024
Application Filed
Apr 04, 2026
Non-Final Rejection — §102 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/234,944
Patent 12583445
VEHICLE CONTROLLER, METHOD, AND COMPUTER PROGRAM FOR VEHICLE CONTROL
2y 5m to grant Granted Mar 24, 2026
18/463,190
Patent 12586473
SYSTEM AND METHOD TO BUILD A FLYABLE HOLDING PATTERN ENTRY TRAJECTORY WHEN THE AVAILABLE SPACE IS LIMITED
2y 5m to grant Granted Mar 24, 2026
17/622,160
Patent 12544937
ROBOTIC HAND SYSTEM AND METHOD FOR CONTROLLING ROBOTIC HAND
2y 5m to grant Granted Feb 10, 2026
18/476,393
Patent 12528448
HYBRID ELECTRIC VEHICLE ENERGY MANAGEMENT DURING EXTREME OPERATING CONDITIONS
2y 5m to grant Granted Jan 20, 2026
17/848,835
Patent 12521883
SAFETY SYSTEM FOR INTEGRATED HUMAN/ROBOTIC ENVIRONMENTS
2y 5m to grant Granted Jan 13, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
84%
Grant Probability
96%
With Interview (+11.4%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 58 resolved cases by this examiner. Grant probability derived from career allow rate.