Last updated: April 19, 2026
Application No. 18/658,153
RAPID DESIGN AND ANIMATION OF FREELY-WALKING ROBOTIC DEVICES

Final Rejection §103
Filed
May 08, 2024
Examiner
CAIN, AARON G
Art Unit
3656
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Disney Enterprises Inc.
OA Round
2 (Final)
Interview Optional

— +26.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 130 resolved cases, 2023–2026
Examiner Intelligence

CAIN, AARON G View full profile →
Grants 40% of resolved cases
Career Allow Rate
52 granted / 130 resolved
-12.0% vs TC avg
Strong +26% interview lift
Without
With
+26.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
42 currently pending
Career history
172
Total Applications
across all art units
Statute-Specific Performance

§101
4.3%
-35.7% vs TC avg
§103
57.4%
+17.4% vs TC avg
§102
19.7%
-20.3% vs TC avg
§112
17.7%
-22.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 130 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The Office Action is in response to the application filed 02/06/2026. Claims 1-22 are presently pending and are presented for examination. 

Response to Arguments
Applicant’s arguments, see pages 10-11, filed 02/06/2026, with respect to the rejection of claims 3, 11, 20, and 22 under 35 U.S.C. 101 and 112(a), (b), have been fully considered and are persuasive. The amendments to the claims, and the removal of the term “perpetual” has overcome the rejections. The rejection of claims 3, 11, 20, and 22 under 35 U.S.C. 101 and 112(a), (b) has been withdrawn. 
Applicant’s arguments, see page 11, filed 02/06/2026, with respect to the rejection(s) of claim(s) 1, 5, 14, and 22 under 35 U.S.C. 102 in view of Cassero et al. US 20230050174 A1 (“Cassero”) have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made under 35 U.S.C. 103 in view of Cassero et al. US 20230050174 A1 (“Cassero”) in combination with Bodnar et al. US 11571809 B1 (“Bodnar”).
Applicant’s arguments, see page 12, filed 02/06/2026, with respect to the rejection(s) of claim(s) 6-8 and 15-16 under 35 U.S.C. 102 in view of Cassero et al. US 20230050174 A1 (“Cassero”) have been fully considered and are persuasive. The amendments to the claims have overcome the rejection. Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made under 35 U.S.C. 103 in view of Cassero et al. US 20230050174 A1 (“Cassero”) in combination with Bodnar et al. US 11571809 B1 (“Bodnar”) and Breazeal et al. US 20090319459 A1 (“Breazeal”).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-5, 9-14, and 17-21 are rejected under 35 U.S.C. 103 as being unpatentable over Cassero et al. US 20230050174 A1 (“Cassero”) in combination with Bodnar et al. US 11571809 B1 (“Bodnar”).
	Regarding Claim 1. Cassero teaches a method of training a robotic device comprising:
parameterizing, via a processing element, an input to the robotic device, wherein the parameterizing comprises defining a range of values of the input (FIG. 3 shows an example process for generating a specific robotic control plan from a template robotic control plan. The system obtains the template robotic control plan (step 302). The template robotic control plan is configurable for multiple different robotics applications, e.g., multiple different robotic tasks, multiple different robotic execution environments, multiple different sets of robotic components, and/or multiple different sets of execution constraints. The template robotic control plan includes data defining (i) an adaptation procedure and (ii) a set of one or more open parameters [paragraph 94]. The system obtains a user input that defines a respective value or range of values for each open parameter in the set of open parameters (step 304). The user input characterizes a specific robotics application for which the template robotic control plan can be configured. In some implementations, the template robotic control plan defines a set of multiple different adaptation procedures, and the user input identifies a particular adaptation procedure from the set of multiple different adaptation procedures [paragraph 96]); 
generating, via the processing element, a plurality of samples of the parameterized input from within the range of values (In some implementations, the template robotic control plan defines a default value for a particular open parameter in the set of open parameters. If the user input does not explicitly identify a value or range of values for the particular open parameter, then the system can determine to use the default value for the particular open parameter in the specific robotic control plan [paragraph 97]. Step 306 in particular talks about using the obtained values for the set of open parameters, the adaptation procedure to generate the specific robotic control plan from the template robotic control plan [paragraph 98]); 
training a control policy, via the processing element, wherein the training comprises: 
providing the plurality of samples to the control policy, wherein the control policy is adapted to operate an actuator of the robotic device (FIG. 4 describes how a learnable robotic plan includes defining a finite state machine that includes one or more learning states (402). The learnable robotic control plan includes data defining a state machine that includes multiple state and multiple transitions between states, where one or more of the states are learnable states. Each learnable state can include data defining (i) one or more learnable parameters of the learnable state and (ii) a machine learning procedure for automatically learning a respective value for each learnable parameter of the learnable state [paragraph 102]. The software stack can include actuator feedback controllers. An actuator feedback controller can include control logic for controlling multiple robot components through their respective motor feedback controllers [paragraph 113]), and 
generating, via the processing element, a policy action using the control policy; 
transmitting the policy action to a robotic model (As another example, at least one of the machine learning procedures of the learnable robotic control plan 164 can be a supervised learning procedure. The training system 130 can obtain a labeled training data set that includes multiple training examples that each include (i) a training input to the supervised learning model and (ii) a label that identifies a ground-truth output that the supervised learning model should generated in response to processing the training input. For example, each training input can represent a respective different configuration for the execution environment 170, and the supervised learning model can be configured to generate a model output that identifies one or more parameters for the execution of the specific robotic control plan 124 [paragraph 73]. A joint collection controller can handle issuing of command and status vectors that are exposed as a set of part abstractions. Each part can include a kinematic model, e.g., for performing inverse kinematic calculations, limit information, as well as a joint status vector and a joint command vector. For example, a single joint collection controller can be used to apply different sets of policies to different subsystems in the lower levels [paragraph 117]); and 
deploying the trained control policy to an on-board controller for the robotic device (step 406 of FIG. 4).
	Cassero does not teach: 
	randomly generating, a plurality of samples of the parameterized input from within the range of values of the input.
	However, Bodnar teaches:
	randomly generating, a plurality of samples of the parameterized input from within the range of values of the input (Robots 180A, 180B, and/or other robots may be utilized to perform a large quantity of grasp episodes and data associated with the grasp episodes can be stored in offline episode data database 150 and/or provided for inclusion in online buffer 112 (of replay buffer(s) 110). Robots 180A and 180B can optionally initially perform grasp episodes (or other task episodes) according to a scripted exploration policy, in order to bootstrap data collection. The scripted exploration policy can be randomized, but biased toward reasonable grasps. Data from such scripted episodes can be stored in offline episode data database 150 and utilized in initial training of critic network 152 to bootstrap the initial training [Column 7, lines 48-60]).
	It would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify the invention of Cassero with randomly generating, a plurality of samples of the parameterized input from within the range of values of the input as taught by Bodnar so that the robot can be trained for a variety of random inputs to better adapt to unpredictable circumstances. 
	Regarding Claim 2. Cassero in combination with Bodnar teaches the method of claim 1.
Cassero also teaches:
wherein the input to the robotic device comprises at least one of:
a mass, torque, force, speed, number, type, or range of motion of a component of the robotic device (The input values for parameters for the robot components can include allowable ranges for velocity, torque, and so on [paragraph 51], which also reads on a range of motion for a component of the robotic device); 
a perturbance imparted to the robotic device (The robotic control system 150 is configured to control the robotic components 170a-n in the execution environment 170 to execute a robotic task, or for brevity, a “task.” In some implementations, the robotic control system 150 is a real-time robotic control system. For example, one of the robots in the execution environment 170 may be required to perform a certain operation at regular intervals, e.g. 10 milliseconds; if the robot ever fails to execute the operation in a given time window, then the robot enters a fault state [paragraph 24], wherein the fault state reads on a perturbance imparted on the robotic device); 
an operator command (User input [paragraph 51] can be interpreted as an operator command); or 
an environmental characteristic (in implementations in which the template robotic control plan 162 is configurable for multiple different execution environments, the user input 142 can include data characterizing the current state of the execution environment 170 [paragraph 50]).
	Regarding Claim 3. Cassero in combination with Bodnar teaches the method of claim 1.
Cassero also teaches:
wherein the control policy is adapted to cause the robotic device to perform at least one of a motion without a defined start or end, a periodic motion, or an episodic motion (one of the robots in the execution environment 170 may be required to perform a certain operation at regular intervals, e.g. 10 milliseconds [paragraph 24]).
	Regarding Claim 4. Cassero in combination with Bodnar teaches the method of claim 1.
Cassero also teaches:
wherein the training further comprises:
simulating, via the processing element, a motion of the actuator using the robotic model (As another particular example, in implementations in which the template robotic control plan 162 is configurable for multiple different execution environments, the user input 142 can include data characterizing the current state of the execution environment 170. For example, the user input 142 can include one or more of: a three-dimension virtual model of the execution environment 170; or a respective location and pose for each of one or more objects in the environment 170 (e.g., the robotic components 170a-n, one or more assembly components to be assembled together if the robotic task is an assembly task, and so on). For instance, the user system 140 can display an image of the execution environment 170 to the user, and the user can identify (e.g., by using a computer mouse to click on the image) the location of one or more “targets” of the robotic task, e.g., the location of an electrical cable and the location of a wall socket if the robotic task is an insertion task [paragraph 50]); 
comparing, via the processing element, the simulated motion of the actuator to a reference motion of the actuator, wherein the reference motion is based on the plurality of samples (As a particular example, the planner 120 can obtain from the training system 130 a measure of the training performance of the learned values for the learnable parameters (e.g., a training loss or training accuracy of the machine learning procedure corresponding to the learnable parameters), and compare the measure of the training performance with a measure of the current performance of the specific robotic control plan 124 executed by the robotic control system 150 using the default values for the learnable parameters [paragraph 79], wherein the training performance of the control plan is a reference motion, and the current performance is the recent simulated motion); and 
rewarding, via the processing element, the control policy based on the comparison (From the execution data 172, the training system can determine rewards for the actions of the robotic components 170a-n (i.e., the actions driven by the commands 132), and use the determined rewards to update the learnable parameters corresponding to the reinforcement learning procedure. In particular, the reinforcement learning procedure can define a reward function that receives as input the execution data 172 (or an input generated from the execution data 172) and generates a reward as output. Generally, the determined reward is indicative of the extent to which the robotic task has been accomplished. The training system 130 can use any appropriate technique to update the learnable parameters using the determined reward; for example, if the reinforcement learning procedure is parameterized (at least in part) by a neural network, then the training system 130 can perform backpropagation and gradient descent to update the network parameters of the neural network [paragraph 71]).
Regarding Claim 5. Cassero teaches a method of operating a robotic device comprising:
receiving, at a processing element, a user input, wherein the processing element is in communication with one or more actuators of the robotic device (To determine values for the user-determined open parameters of the template robotic control plan 162, the planner 120 can obtain a user input 142 from the user system 140 [paragraph 41]. The user system 140 can prompt the user to provide the user input 142 using any appropriate user interface, e.g., a command line interface or a graphical user interface. The user can provide responses to the prompts of the user system 140 in any appropriate way, e.g., by provided a text input using a keyboard, by selecting one or more display options using a computer mouse, by providing a voice input using a microphone, and so on [paragraph 43]. Each learnable state can include data defining (i) one or more learnable parameters of the learnable state and (ii) a machine learning procedure for automatically learning a respective value for each learnable parameter of the learnable state [paragraph 102]. The software stack can include actuator feedback controllers. An actuator feedback controller can include control logic for controlling multiple robot components through their respective motor feedback controllers [paragraph 113]);
comparing, via the processing element, the user input to an animation database (For example, a user can physically manipulate the robotic component to demonstrate the movements that should be executed by the robotic component, and the robotic component learns to repeat the movements. In particular, one or more users physically in the execution environment 170 can manipulate one or more of the robotic components 170a-n, which can then send execution data 172 to the training system 130. The execution data 172 can characterize the movements demonstrated by the users. The training system 130 can then process the execution data to generate the commands 152 that can be issued to the robotic components 170a-n to cause them to repeat the movements [paragraph 72]. Similarly, if the designer determines to define a learning-from-demonstration procedure for determining values for learnable parameters of the second learnable state 230, then the designer can import a third-party learning-from-demonstration library into the learnable robotic control plan [paragraph 92], so a database can be built around movement plans (animations) from comparisons to user input, which could then be selected again later based on user input); 
selecting, via the processing element, an animation from the animation database based on the comparison (the examiner is interpreting “animation” to mean “a robot movement or execution thereof”. In some implementations, the one or more configuration procedures of the template robotic control plan 162 are predetermined; that is, the planner 120 executes each of the configuration procedures to generate the specific robotic control plan 124. In some other implementations, a selection of one or more particular configuration procedures from a set of multiple configuration procedures can itself be an open parameter of the template robotic control plan 162. The planner 120 can then use the one or more particular configuration procedures to determine values for one or more other open parameters of the template. As a particular example, the selection of one or more particular configuration procedures can be a user-determined open parameter [paragraph 38], wherein a “robotic control plan” reads on an animation); 
activating, via the processing element, a control policy for the selected animation (a joint collection controller can apply different policies to different subsystems of the robot movements as part of a software stack [paragraph 117]), wherein the control policy has been trained by a reinforcement learning method (From the execution data 172, the training system can determine rewards for the actions of the robotic components 170a-n (i.e., the actions driven by the commands 132), and use the determined rewards to update the learnable parameters corresponding to the reinforcement learning procedure [paragraph 71]), the reinforcement learning method comprising:
parameterizing, via the processing element, a training input to the robotic device, wherein the parameterizing comprises defining a range of values of the training input (FIG. 3 shows an example process for generating a specific robotic control plan from a template robotic control plan. The system obtains the template robotic control plan (step 302). The template robotic control plan is configurable for multiple different robotics applications, e.g., multiple different robotic tasks, multiple different robotic execution environments, multiple different sets of robotic components, and/or multiple different sets of execution constraints. The template robotic control plan includes data defining (i) an adaptation procedure and (ii) a set of one or more open parameters [paragraph 94]. The system obtains a user input that defines a respective value or range of values for each open parameter in the set of open parameters (step 304). The user input characterizes a specific robotics application for which the template robotic control plan can be configured. In some implementations, the template robotic control plan defines a set of multiple different adaptation procedures, and the user input identifies a particular adaptation procedure from the set of multiple different adaptation procedures [paragraph 96]), and 
generating, via the processing element, a plurality of samples of the parameterized training input from within the range of values of the training input (In some implementations, the template robotic control plan defines a default value for a particular open parameter in the set of open parameters. If the user input does not explicitly identify a value or range of values for the particular open parameter, then the system can determine to use the default value for the particular open parameter in the specific robotic control plan [paragraph 97]. Step 306 in particular talks about using the obtained values for the set of open parameters, the adaptation procedure to generate the specific robotic control plan from the template robotic control plan [paragraph 98]); 
generating, via the processing element, a low-level control adapted to control a robotic device actuator based on the control policy (paragraph 117, the joint control policies generated by the joint collection controller(s)); 
controlling, via the low-level control, the robotic device actuator (paragraph 117).
Cassero does not teach: 
	randomly generating, a low-level control adapted to control a robotic device actuator based on the control policy.
	However, Bodnar teaches:
	randomly generating, a low-level control adapted to control a robotic device actuator based on the control policy (Robots 180A, 180B, and/or other robots may be utilized to perform a large quantity of grasp episodes and data associated with the grasp episodes can be stored in offline episode data database 150 and/or provided for inclusion in online buffer 112 (of replay buffer(s) 110). Robots 180A and 180B can optionally initially perform grasp episodes (or other task episodes) according to a scripted exploration policy, in order to bootstrap data collection. The scripted exploration policy can be randomized, but biased toward reasonable grasps. Data from such scripted episodes can be stored in offline episode data database 150 and utilized in initial training of critic network 152 to bootstrap the initial training [Column 7, lines 48-60]).
	It would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify the invention of Cassero with randomly generating, a low-level control adapted to control a robotic device actuator based on the control policy as taught by Bodnar so that the robot can be trained for a variety of random inputs to better adapt to unpredictable circumstances. 
Regarding Claim 9. Cassero in combination with Bodnar teaches the method of operating the robotic device of claim 5.
Cassero also teaches:
wherein the reinforcement learning method comprises:
parameterizing, via a second processing element, a second training input to the robotic device, wherein the parameterizing comprises defining a range of values of the second training input (FIG. 3 shows an example process for generating a specific robotic control plan from a template robotic control plan. The system obtains the template robotic control plan (step 302). The template robotic control plan is configurable for multiple different robotics applications, e.g., multiple different robotic tasks, multiple different robotic execution environments, multiple different sets of robotic components, and/or multiple different sets of execution constraints. The template robotic control plan includes data defining (i) an adaptation procedure and (ii) a set of one or more open parameters [paragraph 94]. The system obtains a user input that defines a respective value or range of values for each open parameter in the set of open parameters (step 304). The user input characterizes a specific robotics application for which the template robotic control plan can be configured. In some implementations, the template robotic control plan defines a set of multiple different adaptation procedures, and the user input identifies a particular adaptation procedure from the set of multiple different adaptation procedures [paragraph 96]); 
generating, via the second processing element, a plurality of samples of the parameterized second training input from within the range of values of the second training input (In some implementations, the template robotic control plan defines a default value for a particular open parameter in the set of open parameters. If the user input does not explicitly identify a value or range of values for the particular open parameter, then the system can determine to use the default value for the particular open parameter in the specific robotic control plan [paragraph 97]. Step 306 in particular talks about using the obtained values for the set of open parameters, the adaptation procedure to generate the specific robotic control plan from the template robotic control plan [paragraph 98]. Similarly, if the designer determines to define a learning-from-demonstration procedure for determining values for learnable parameters of the second learnable state 230, then the designer can import a third-party learning-from-demonstration library into the learnable robotic control plan [paragraph 92], so a database can be built around movement plans (animations) from comparisons to user input, which could then be selected again later based on user input); 
providing, via the second processing element, the plurality of samples of the parameterized second training input to the control policy (FIG. 4 describes how a learnable robotic plan includes defining a finite state machine that includes one or more learning states (402). The learnable robotic control plan includes data defining a state machine that includes multiple state and multiple transitions between states, where one or more of the states are learnable states. Each learnable state can include data defining (i) one or more learnable parameters of the learnable state and (ii) a machine learning procedure for automatically learning a respective value for each learnable parameter of the learnable state [paragraph 102]. The software stack can include actuator feedback controllers. An actuator feedback controller can include control logic for controlling multiple robot components through their respective motor feedback controllers [paragraph 113]); 
generating, via the second processing element, a policy action using the control policy and transmitting the policy action to a robotic model (As another example, at least one of the machine learning procedures of the learnable robotic control plan 164 can be a supervised learning procedure. The training system 130 can obtain a labeled training data set that includes multiple training examples that each include (i) a training input to the supervised learning model and (ii) a label that identifies a ground-truth output that the supervised learning model should generated in response to processing the training input. For example, each training input can represent a respective different configuration for the execution environment 170, and the supervised learning model can be configured to generate a model output that identifies one or more parameters for the execution of the specific robotic control plan 124 [paragraph 73]. A joint collection controller can handle issuing of command and status vectors that are exposed as a set of part abstractions. Each part can include a kinematic model, e.g., for performing inverse kinematic calculations, limit information, as well as a joint status vector and a joint command vector. For example, a single joint collection controller can be used to apply different sets of policies to different subsystems in the lower levels [paragraph 117]. As another example, at least one of the machine learning procedures of the learnable robotic control plan 164 can be a learning-from-demonstration procedure. Learning-from-demonstration is a technique whereby a user of a robotic component physically demonstrates a robotic task to be performed by the robotic component, and the robotic component learns from the physical demonstration how to perform the robotic task independently. For example, a user can physically manipulate the robotic component to demonstrate the movements that should be executed by the robotic component, and the robotic component learns to repeat the movements [paragraph 72]); 
simulating, via the second processing element, a motion of the robotic device actuator using the robotic model (As another particular example, in implementations in which the template robotic control plan 162 is configurable for multiple different execution environments, the user input 142 can include data characterizing the current state of the execution environment 170. For example, the user input 142 can include one or more of: a three-dimension virtual model of the execution environment 170; or a respective location and pose for each of one or more objects in the environment 170 (e.g., the robotic components 170a-n, one or more assembly components to be assembled together if the robotic task is an assembly task, and so on). For instance, the user system 140 can display an image of the execution environment 170 to the user, and the user can identify (e.g., by using a computer mouse to click on the image) the location of one or more “targets” of the robotic task, e.g., the location of an electrical cable and the location of a wall socket if the robotic task is an insertion task [paragraph 50]);
comparing, via the second processing element, the simulated motion of the robotic device actuator to a reference motion of the robotic device actuator, wherein the reference motion is based on the plurality of samples of the parameterized second training input (As a particular example, the planner 120 can obtain from the training system 130 a measure of the training performance of the learned values for the learnable parameters (e.g., a training loss or training accuracy of the machine learning procedure corresponding to the learnable parameters), and compare the measure of the training performance with a measure of the current performance of the specific robotic control plan 124 executed by the robotic control system 150 using the default values for the learnable parameters [paragraph 79], wherein the training performance of the control plan is a reference motion, and the current performance is the recent simulated motion); and 
rewarding, via the second processing element, the control policy based on the comparison of the simulated motion of the robotic device actuator to the reference motion of the robotic device actuator (From the execution data 172, the training system can determine rewards for the actions of the robotic components 170a-n (i.e., the actions driven by the commands 132), and use the determined rewards to update the learnable parameters corresponding to the reinforcement learning procedure. In particular, the reinforcement learning procedure can define a reward function that receives as input the execution data 172 (or an input generated from the execution data 172) and generates a reward as output. Generally, the determined reward is indicative of the extent to which the robotic task has been accomplished. The training system 130 can use any appropriate technique to update the learnable parameters using the determined reward; for example, if the reinforcement learning procedure is parameterized (at least in part) by a neural network, then the training system 130 can perform backpropagation and gradient descent to update the network parameters of the neural network [paragraph 71]).
Regarding Claim 10. Cassero in combination with Bodnar teaches the method of operating the robotic device of claim 9.
Cassero also teaches:
wherein the training input or the second training input to the robotic device comprises at least one of:
a mass, torque, force, speed, number, type, or range of motion of a component of the robotic device (The input values for parameters for the robot components can include allowable ranges for velocity, torque, and so on [paragraph 51], which also reads on a range of motion for a component of the robotic device); 
a perturbance imparted to the robotic device (The robotic control system 150 is configured to control the robotic components 170a-n in the execution environment 170 to execute a robotic task, or for brevity, a “task.” In some implementations, the robotic control system 150 is a real-time robotic control system. For example, one of the robots in the execution environment 170 may be required to perform a certain operation at regular intervals, e.g. 10 milliseconds; if the robot ever fails to execute the operation in a given time window, then the robot enters a fault state [paragraph 24], wherein the fault state reads on a perturbance imparted on the robotic device); 
the user input (User input [paragraph 51] can be interpreted as an operator command); or 
an environmental characteristic (in implementations in which the template robotic control plan 162 is configurable for multiple different execution environments, the user input 142 can include data characterizing the current state of the execution environment 170 [paragraph 50]).
Regarding Claim 11. Cassero in combination with Bodnar teaches the method of operating the robotic device of claim 9.
Cassero also teaches:
wherein the control policy is adapted to cause the robotic device to perform at least one of a motion without a defined start or end, a periodic motion, or an episodic motion (one of the robots in the execution environment 170 may be required to perform a certain operation at regular intervals, e.g. 10 milliseconds [paragraph 24]).
Regarding Claim 12. Cassero in combination with Bodnar teaches the method of operating the robotic device of claim 5.
Cassero also teaches:
wherein the selected animation comprises one or more of a background animation or a triggered animation, and the method of operating the robotic device further comprises layering at least one of the background animation or the triggered animation with a remote control animation (Interpreting “animation” as “a robot movement or execution thereof””, any executed robot control plan could read on a triggered animation. “In some other implementations, a selection of one or more particular configuration procedures from a set of multiple configuration procedures can itself be an open parameter of the template robotic control plan 162. The planner 120 can then use the one or more particular configuration procedures to determine values for one or more other open parameters of the template. As a particular example, the selection of one or more particular configuration procedures can be a user-determined open parameter” [paragraph 38], wherein a “robotic control plan” reads on an animation. In some other implementations, the planner 120 is remote to the user system 140, e.g., the user system 140 can be a component of a user device of the user while the planner 120 is hosted by a cloud system [paragraph 42], so the triggered animation can be layered with a remote control animation).
Regarding Claim 13. Cassero in combination with Bodnar teaches the method of operating the robotic device of claim 12.
Cassero also teaches:
wherein the remote control animation is based on the user input received from a remote control (Paragraph 42).
Regarding Claim 14. Cassero teaches a robotic device comprising:
a plurality of modular hardware components (The template robotic control plan can be configured to perform insertions of different types of hardware [paragraph 28]. The software stack can include multiple levels of increasing hardware specificity in one direction and increasing software abstraction in the other direction. At the lowest level of the software stack are robot components that include devices that carry out low-level actions and sensors that report low-level statuses. For example, robots can include a variety of low-level components including motors, encoders, cameras, drivers, grippers, application-specific sensors, linear or rotary position sensors, and other peripheral devices [paragraph 109], further confirming modularity of hardware components and disclosing examples of said components); 
a processing element in communication with the plurality of modular hardware components (The system 100 includes a number of functional components, including a planner 120, a training system 130, a user system 140, a robotic control system 150, and a plan database 160. Each of these components can be implemented as computer programs installed on one or more computers in one or more locations that are coupled to each other through any appropriate communications network, e.g., an intranet or the Internet, or combination of networks [paragraph 23]); 
a plurality of control policies trained by a reinforcement learning method to control the plurality of modular hardware components (a joint collection controller can apply different policies to different subsystems of the robot movements as part of a software stack [paragraph 117]. From the execution data 172, the training system can determine rewards for the actions of the robotic components 170a-n (i.e., the actions driven by the commands 132), and use the determined rewards to update the learnable parameters corresponding to the reinforcement learning procedure [paragraph 71]), the reinforcement learning method comprising: 
parameterizing, via the processing element, a training input to the robotic device, wherein the parameterizing comprises defining a range of values of the training input (FIG. 3 shows an example process for generating a specific robotic control plan from a template robotic control plan. The system obtains the template robotic control plan (step 302). The template robotic control plan is configurable for multiple different robotics applications, e.g., multiple different robotic tasks, multiple different robotic execution environments, multiple different sets of robotic components, and/or multiple different sets of execution constraints. The template robotic control plan includes data defining (i) an adaptation procedure and (ii) a set of one or more open parameters [paragraph 94]. The system obtains a user input that defines a respective value or range of values for each open parameter in the set of open parameters (step 304). The user input characterizes a specific robotics application for which the template robotic control plan can be configured. In some implementations, the template robotic control plan defines a set of multiple different adaptation procedures, and the user input identifies a particular adaptation procedure from the set of multiple different adaptation procedures [paragraph 96]), and
generating, via the processing element, a plurality of samples of the parameterized training input from within the range of values of the training input (In some implementations, the template robotic control plan defines a default value for a particular open parameter in the set of open parameters. If the user input does not explicitly identify a value or range of values for the particular open parameter, then the system can determine to use the default value for the particular open parameter in the specific robotic control plan [paragraph 97]. Step 306 in particular talks about using the obtained values for the set of open parameters, the adaptation procedure to generate the specific robotic control plan from the template robotic control plan [paragraph 98]). 
	Cassero does not teach: 
	randomly generating, a plurality of samples of the parameterized input from within the range of values of the input.
	However, Bodnar teaches:
	randomly generating, a plurality of samples of the parameterized input from within the range of values of the input (Robots 180A, 180B, and/or other robots may be utilized to perform a large quantity of grasp episodes and data associated with the grasp episodes can be stored in offline episode data database 150 and/or provided for inclusion in online buffer 112 (of replay buffer(s) 110). Robots 180A and 180B can optionally initially perform grasp episodes (or other task episodes) according to a scripted exploration policy, in order to bootstrap data collection. The scripted exploration policy can be randomized, but biased toward reasonable grasps. Data from such scripted episodes can be stored in offline episode data database 150 and utilized in initial training of critic network 152 to bootstrap the initial training [Column 7, lines 48-60]).
	It would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify the invention of Cassero with randomly generating, a plurality of samples of the parameterized input from within the range of values of the input as taught by Bodnar so that the robot can be trained for a variety of random inputs to better adapt to unpredictable circumstances. 
Regarding Claim 17. Cassero in combination with Bodnar teaches the robotic device of claim 14.
Cassero also teaches:
further comprising selecting, via the processing element, an animation from an animation database, wherein the selected animation comprises one or more of a background animation or a triggered animation, and the robotic device is operable by layering at least one of the background animation or the triggered animation with a remote control animation (Interpreting “animation” as “a robot movement or execution thereof””, any executed robot control plan could read on a triggered animation. “In some other implementations, a selection of one or more particular configuration procedures from a set of multiple configuration procedures can itself be an open parameter of the template robotic control plan 162. The planner 120 can then use the one or more particular configuration procedures to determine values for one or more other open parameters of the template. As a particular example, the selection of one or more particular configuration procedures can be a user-determined open parameter” [paragraph 38], wherein a “robotic control plan” reads on an animation. In some other implementations, the planner 120 is remote to the user system 140, e.g., the user system 140 can be a component of a user device of the user while the planner 120 is hosted by a cloud system [paragraph 42], so the triggered animation can be layered with a remote control animation).
Regarding Claim 18. Cassero in combination with Bodnar teaches the robotic device of claim 14.
Cassero also teaches:
wherein the reinforcement learning method comprises:
parameterizing, via a second processing element, a second training input to the robotic device, wherein the parameterizing comprises defining a range of values of the second training input (FIG. 3 shows an example process for generating a specific robotic control plan from a template robotic control plan. The system obtains the template robotic control plan (step 302). The template robotic control plan is configurable for multiple different robotics applications, e.g., multiple different robotic tasks, multiple different robotic execution environments, multiple different sets of robotic components, and/or multiple different sets of execution constraints. The template robotic control plan includes data defining (i) an adaptation procedure and (ii) a set of one or more open parameters [paragraph 94]. The system obtains a user input that defines a respective value or range of values for each open parameter in the set of open parameters (step 304). The user input characterizes a specific robotics application for which the template robotic control plan can be configured. In some implementations, the template robotic control plan defines a set of multiple different adaptation procedures, and the user input identifies a particular adaptation procedure from the set of multiple different adaptation procedures [paragraph 96]); 
generating, via the second processing element, a plurality of samples of the parameterized second training input from within the range of values of the second training input (In some implementations, the template robotic control plan defines a default value for a particular open parameter in the set of open parameters. If the user input does not explicitly identify a value or range of values for the particular open parameter, then the system can determine to use the default value for the particular open parameter in the specific robotic control plan [paragraph 97]. Step 306 in particular talks about using the obtained values for the set of open parameters, the adaptation procedure to generate the specific robotic control plan from the template robotic control plan [paragraph 98]); 
providing, via the second processing element, the plurality of samples of the parameterized second training input to a plurality of control policies (FIG. 4 describes how a learnable robotic plan includes defining a finite state machine that includes one or more learning states (402). The learnable robotic control plan includes data defining a state machine that includes multiple state and multiple transitions between states, where one or more of the states are learnable states. Each learnable state can include data defining (i) one or more learnable parameters of the learnable state and (ii) a machine learning procedure for automatically learning a respective value for each learnable parameter of the learnable state [paragraph 102]. The software stack can include actuator feedback controllers. An actuator feedback controller can include control logic for controlling multiple robot components through their respective motor feedback controllers [paragraph 113]); 
generating, via the second processing element, a policy action using one of the plurality of control policies and transmitting the policy action to a robotic model (As another example, at least one of the machine learning procedures of the learnable robotic control plan 164 can be a supervised learning procedure. The training system 130 can obtain a labeled training data set that includes multiple training examples that each include (i) a training input to the supervised learning model and (ii) a label that identifies a ground-truth output that the supervised learning model should generated in response to processing the training input. For example, each training input can represent a respective different configuration for the execution environment 170, and the supervised learning model can be configured to generate a model output that identifies one or more parameters for the execution of the specific robotic control plan 124 [paragraph 73]. A joint collection controller can handle issuing of command and status vectors that are exposed as a set of part abstractions. Each part can include a kinematic model, e.g., for performing inverse kinematic calculations, limit information, as well as a joint status vector and a joint command vector. For example, a single joint collection controller can be used to apply different sets of policies to different subsystems in the lower levels [paragraph 117]. As another example, at least one of the machine learning procedures of the learnable robotic control plan 164 can be a learning-from-demonstration procedure. Learning-from-demonstration is a technique whereby a user of a robotic component physically demonstrates a robotic task to be performed by the robotic component, and the robotic component learns from the physical demonstration how to perform the robotic task independently. For example, a user can physically manipulate the robotic component to demonstrate the movements that should be executed by the robotic component, and the robotic component learns to repeat the movements [paragraph 72]); 
simulating, via the second processing element, a motion of a robotic device actuator using the robotic model (As another particular example, in implementations in which the template robotic control plan 162 is configurable for multiple different execution environments, the user input 142 can include data characterizing the current state of the execution environment 170. For example, the user input 142 can include one or more of: a three-dimension virtual model of the execution environment 170; or a respective location and pose for each of one or more objects in the environment 170 (e.g., the robotic components 170a-n, one or more assembly components to be assembled together if the robotic task is an assembly task, and so on). For instance, the user system 140 can display an image of the execution environment 170 to the user, and the user can identify (e.g., by using a computer mouse to click on the image) the location of one or more “targets” of the robotic task, e.g., the location of an electrical cable and the location of a wall socket if the robotic task is an insertion task [paragraph 50]); 
comparing, via the second processing element, the simulated motion of the robotic device actuator to a reference motion of the robotic device actuator, wherein the reference motion is based on the plurality of samples of the parameterized second training input (As a particular example, the planner 120 can obtain from the training system 130 a measure of the training performance of the learned values for the learnable parameters (e.g., a training loss or training accuracy of the machine learning procedure corresponding to the learnable parameters), and compare the measure of the training performance with a measure of the current performance of the specific robotic control plan 124 executed by the robotic control system 150 using the default values for the learnable parameters [paragraph 79], wherein the training performance of the control plan is a reference motion, and the current performance is the recent simulated motion); and 
rewarding, via the second processing element, the one of the plurality of control policies based on the comparison of the simulated motion of the robotic device actuator to the reference motion of the robotic device actuator (From the execution data 172, the training system can determine rewards for the actions of the robotic components 170a-n (i.e., the actions driven by the commands 132), and use the determined rewards to update the learnable parameters corresponding to the reinforcement learning procedure. In particular, the reinforcement learning procedure can define a reward function that receives as input the execution data 172 (or an input generated from the execution data 172) and generates a reward as output. Generally, the determined reward is indicative of the extent to which the robotic task has been accomplished. The training system 130 can use any appropriate technique to update the learnable parameters using the determined reward; for example, if the reinforcement learning procedure is parameterized (at least in part) by a neural network, then the training system 130 can perform backpropagation and gradient descent to update the network parameters of the neural network [paragraph 71]).
Regarding Claim 19. Cassero in combination with Bodnar teaches the robotic device of claim 18.
Cassero also teaches:
wherein the input to the robotic device comprises at least one of:
a mass, torque, force, speed, number, type, or range of motion of a component of the robotic device (The input values for parameters for the robot components can include allowable ranges for velocity, torque, and so on [paragraph 51], which also reads on a range of motion for a component of the robotic device); 
a perturbance imparted to the robotic device (The robotic control system 150 is configured to control the robotic components 170a-n in the execution environment 170 to execute a robotic task, or for brevity, a “task.” In some implementations, the robotic control system 150 is a real-time robotic control system. For example, one of the robots in the execution environment 170 may be required to perform a certain operation at regular intervals, e.g. 10 milliseconds; if the robot ever fails to execute the operation in a given time window, then the robot enters a fault state [paragraph 24], wherein the fault state reads on a perturbance imparted on the robotic device); 
an operator command (User input [paragraph 51] can be interpreted as an operator command); or 
an environmental characteristic (in implementations in which the template robotic control plan 162 is configurable for multiple different execution environments, the user input 142 can include data characterizing the current state of the execution environment 170 [paragraph 50]).
Regarding Claim 20. Cassero in combination with Bodnar teaches the robotic device of claim 18.
Cassero also teaches:
wherein the plurality of policies are adapted to cause the robotic device to perform at least one of a motion without a defined start or end, a periodic motion, or an episodic motion (one of the robots in the execution environment 170 may be required to perform a certain operation at regular intervals, e.g. 10 milliseconds [paragraph 24]).
Regarding Claim 21. Cassero teaches the robotic device of claim 14.
Cassero also teaches:
wherein the robotic device is operable by:
receiving, at the processing element, a user input (To determine values for the user-determined open parameters of the template robotic control plan 162, the planner 120 can obtain a user input 142 from the user system 140 [paragraph 41]. The user system 140 can prompt the user to provide the user input 142 using any appropriate user interface, e.g., a command line interface or a graphical user interface. The user can provide responses to the prompts of the user system 140 in any appropriate way, e.g., by provided a text input using a keyboard, by selecting one or more display options using a computer mouse, by providing a voice input using a microphone, and so on [paragraph 43]. Each learnable state can include data defining (i) one or more learnable parameters of the learnable state and (ii) a machine learning procedure for automatically learning a respective value for each learnable parameter of the learnable state [paragraph 102]. The software stack can include actuator feedback controllers. An actuator feedback controller can include control logic for controlling multiple robot components through their respective motor feedback controllers [paragraph 113]); 
comparing, via the processing element, the user input to an animation database (For example, a user can physically manipulate the robotic component to demonstrate the movements that should be executed by the robotic component, and the robotic component learns to repeat the movements. In particular, one or more users physically in the execution environment 170 can manipulate one or more of the robotic components 170a-n, which can then send execution data 172 to the training system 130. The execution data 172 can characterize the movements demonstrated by the users. The training system 130 can then process the execution data to generate the commands 152 that can be issued to the robotic components 170a-n to cause them to repeat the movements [paragraph 72]. Similarly, if the designer determines to define a learning-from-demonstration procedure for determining values for learnable parameters of the second learnable state 230, then the designer can import a third-party learning-from-demonstration library into the learnable robotic control plan [paragraph 92], so a database can be built around movement plans (animations) from comparisons to user input, which could then be selected again later based on user input); 
selecting, via the processing element, an animation from the animation database based on the comparison (the examiner is interpreting “animation” to mean “a robot movement or execution thereof”. In some implementations, the one or more configuration procedures of the template robotic control plan 162 are predetermined; that is, the planner 120 executes each of the configuration procedures to generate the specific robotic control plan 124. In some other implementations, a selection of one or more particular configuration procedures from a set of multiple configuration procedures can itself be an open parameter of the template robotic control plan 162. The planner 120 can then use the one or more particular configuration procedures to determine values for one or more other open parameters of the template. As a particular example, the selection of one or more particular configuration procedures can be a user-determined open parameter [paragraph 38], wherein a “robotic control plan” reads on an animation); 
activating, via the processing element, a control policy of the plurality of control policies for the selected animation (a joint collection controller can apply different policies to different subsystems of the robot movements as part of a software stack [paragraph 117]. From the execution data 172, the training system can determine rewards for the actions of the robotic components 170a-n (i.e., the actions driven by the commands 132), and use the determined rewards to update the learnable parameters corresponding to the reinforcement learning procedure [paragraph 71]); 
generating, via the processing element, a low level control adapted to control an actuator of the robotic device (paragraph 117, the joint control policies generated by the joint collection controller(s)); 
controlling, via the low level control, the plurality of modular hardware components (paragraph 117).

Claim(s) 6-8, 15-16, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Cassero et al. US 20230050174 A1 (“Cassero”) in combination with Bodnar et al. US 11571809 B1 (“Bodnar”) as applied to claim 5, and 14 above, and further in view of Breazeal et al. US 20090319459 A1 (“Breazeal”).
Regarding Claim 6. Cassero in combination with Bodnar teaches the method of operating the robotic device of claim 5.
Cassero also teaches:
wherein the user input comprises a command to activate a show function of the robotic device (the user system 140 can display an image of the execution environment 170 to the user, and the user can identify (e.g., by using a computer mouse to click on the image) the location of one or more “targets” of the robotic task, e.g., the location of an electrical cable and the location of a wall socket if the robotic task is an insertion task [paragraph 50]. Any display system that the user can alter via input commands would read on a command to activate a show function).
Cassero does not teach:
wherein the show function provides additional animation or expressiveness to the robotic device without affecting an overall motion of a body of the robotic device.
However, Breazeal teaches:
wherein the show function provides additional animation or expressiveness to the robotic device without affecting an overall motion of a body of the robotic device (a physically-animated apparatus for improving a user's physical comfort level, comprising a robotic device capable of multiple degree-of-freedom motion and an affective-cognitive system. The affective-cognitive system preferably comprises a feature extraction subsystem, adapted for deriving physical information about a user from data obtained from at least one device configured for sensing current physical state data about the user, a perception subsystem, adapted for processing the physical information received from the feature extraction subsystem in order to determine the user's current posture, an action selection subsystem, adapted for determining an action to be taken in response to the determined posture and a set of user postural and movement goals, and a motor system, the motor system comprising at least one device adapted to physically animate the robotic device in accordance with the determined action [paragraph 15]).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify the invention of Cassero with wherein the show function provides additional animation or expressiveness to the robotic device without affecting an overall motion of a body of the robotic device as taught by Breazeal so as to allow the robot to perform physical animations and expressions to improve a user’s comfort level.
Regarding Claim 7. Cassero in combination with Bodnar and Breazeal teaches the method of operating the robotic device of claim 6.
Cassero also teaches:
wherein the show function is independent of the robotic device actuator ([paragraph 50], there is no indication that the display function is dependent on the robotic actuator).
Regarding Claim 8. Cassero in combination with Bodnar and Breazeal teaches the method of operating the robotic device of claim 6.
Cassero also teaches:
wherein the show function comprises activating at least one of a light, a moveable antenna, an eye, or a sound of the robotic device (To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and pointing device, e.g., a mouse, trackball, or a presence sensitive display or other surface by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback [paragraph 130], which reads on at least a light or sound activated as a display).
Regarding Claim 15. Cassero in combination with Bodnar teaches the robotic device of claim 14.
Cassero also teaches:
wherein the user input comprises a command to activate a show function of the robotic device (the user system 140 can display an image of the execution environment 170 to the user, and the user can identify (e.g., by using a computer mouse to click on the image) the location of one or more “targets” of the robotic task, e.g., the location of an electrical cable and the location of a wall socket if the robotic task is an insertion task [paragraph 50]. Any display system that the user can alter via input commands would read on a command to activate a show function).
Cassero does not teach:
wherein the show function provides additional animation or expressiveness to the robotic device without affecting an overall motion of a body of the robotic device.
However, Breazeal teaches:
wherein the show function provides additional animation or expressiveness to the robotic device without affecting an overall motion of a body of the robotic device (a physically-animated apparatus for improving a user's physical comfort level, comprising a robotic device capable of multiple degree-of-freedom motion and an affective-cognitive system. The affective-cognitive system preferably comprises a feature extraction subsystem, adapted for deriving physical information about a user from data obtained from at least one device configured for sensing current physical state data about the user, a perception subsystem, adapted for processing the physical information received from the feature extraction subsystem in order to determine the user's current posture, an action selection subsystem, adapted for determining an action to be taken in response to the determined posture and a set of user postural and movement goals, and a motor system, the motor system comprising at least one device adapted to physically animate the robotic device in accordance with the determined action [paragraph 15]).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify the invention of Cassero with wherein the show function provides additional animation or expressiveness to the robotic device without affecting an overall motion of a body of the robotic device as taught by Breazeal so as to allow the robot to perform physical animations and expressions to improve a user’s comfort level.
Regarding Claim 16. Cassero in combination with Bodnar and Breazeal teaches the robotic device of claim 15.
Cassero also teaches:
wherein the show function is independent of a robotic device actuator (([paragraph 50], there is no indication that the display function is dependent on the robotic actuator).
Regarding Claim 22. Cassero teaches a method of controlling a robotic device comprising:
generating, via a first trained control policy executed by a processing element (FIG. 4 describes how a learnable robotic plan includes defining a finite state machine that includes one or more learning states (402). The learnable robotic control plan includes data defining a state machine that includes multiple state and multiple transitions between states, where one or more of the states are learnable states. Each learnable state can include data defining (i) one or more learnable parameters of the learnable state and (ii) a machine learning procedure for automatically learning a respective value for each learnable parameter of the learnable state [paragraph 102]. The software stack can include actuator feedback controllers. An actuator feedback controller can include control logic for controlling multiple robot components through their respective motor feedback controllers [paragraph 113]), a first policy action adapted to perform a continuous motion without a defined start or end (As one example, an execution environment of industrial robots can be controlled by a real-time software control system that requires each robot to repeatedly receive commands at a certain frequency, e.g., 1, 10, or 100 kHz [paragraph 3], indicating that these can be continuous motions without a defined start or end, or at least, no defined start or end is described, which would make this element obvious to one of ordinary skill in the art to try );
generating, via a second trained control policy executed by the processing element, a second policy action adapted to perform a periodic motion (In some implementations, the template robotic control plan defines a default value for a particular open parameter in the set of open parameters. If the user input does not explicitly identify a value or range of values for the particular open parameter, then the system can determine to use the default value for the particular open parameter in the specific robotic control plan [paragraph 97]. Step 306 in particular talks about using the obtained values for the set of open parameters, the adaptation procedure to generate the specific robotic control plan from the template robotic control plan [paragraph 98]. One of the robots in the execution environment 170 may be required to perform a certain operation at regular intervals, e.g. 10 milliseconds [paragraph 24], which is a periodic motion); 
generating, via a third trained control policy executed by the processing element, a third policy action adapted to perform a third motion (there is technically no limit to the number of control plans that the planner at 120 can obtain from the plan database [paragraph 33]); and 
deploying, via the processing element, the first, second, and third policy actions to a plurality of actuators of the robotic at least one actuator of the plurality of actuators is adapted to perform the continuous motion, the periodic motion, and the episodic motion (step 406 of FIG. 4), and at least one of the first trained control policy, the second trained control policy, or the third trained control policy is trained using a reinforcement learning method comprising:
parameterizing, via the processing element, a training input to the robotic device, wherein the parameterizing comprises defining a range of values of the training input (FIG. 3 shows an example process for generating a specific robotic control plan from a template robotic control plan. The system obtains the template robotic control plan (step 302). The template robotic control plan is configurable for multiple different robotics applications, e.g., multiple different robotic tasks, multiple different robotic execution environments, multiple different sets of robotic components, and/or multiple different sets of execution constraints. The template robotic control plan includes data defining (i) an adaptation procedure and (ii) a set of one or more open parameters [paragraph 94]. The system obtains a user input that defines a respective value or range of values for each open parameter in the set of open parameters (step 304). The user input characterizes a specific robotics application for which the template robotic control plan can be configured. In some implementations, the template robotic control plan defines a set of multiple different adaptation procedures, and the user input identifies a particular adaptation procedure from the set of multiple different adaptation procedures [paragraph 96]. FIG. 4 describes how a learnable robotic plan includes defining a finite state machine that includes one or more learning states (402). The learnable robotic control plan includes data defining a state machine that includes multiple state and multiple transitions between states, where one or more of the states are learnable states. Each learnable state can include data defining (i) one or more learnable parameters of the learnable state and (ii) a machine learning procedure for automatically learning a respective value for each learnable parameter of the learnable state [paragraph 102]. The software stack can include actuator feedback controllers. An actuator feedback controller can include control logic for controlling multiple robot components through their respective motor feedback controllers [paragraph 113]),
generating, via the processing element, a plurality of samples of the parameterized training input from within the range of values of the training input (In some implementations, the template robotic control plan defines a default value for a particular open parameter in the set of open parameters. If the user input does not explicitly identify a value or range of values for the particular open parameter, then the system can determine to use the default value for the particular open parameter in the specific robotic control plan [paragraph 97]. Step 306 in particular talks about using the obtained values for the set of open parameters, the adaptation procedure to generate the specific robotic control plan from the template robotic control plan [paragraph 98]).
Cassero does not teach:
the third policy is adapted to perform an episodic motion. 
However, Breazeal teaches:
the third policy is adapted to perform an episodic motion (FIGS. 12A and 12B depict a flowchart outlining an embodiment of a methodology for both improving a user's cognitive performance and building social rapport through the affect-congruent posing of the system, according to one aspect of the invention. As shown in FIGS. 12A and 12B, when a user sits down in front of the physically animated visual display 1205, the system detects the user's identity 1210 and determines whether or not the user is new or it is the user's first use during a particular time period 1215. If so, the system exhibits "greeting behavior" 1220, which may optionally be controlled by user preference setting 1225. The system then monitors 1230 the user's current attention, interest, and posture state, based on data from camera sensors 1235, pressure distribution seat sensors 1240, task accomplishment detection devices 1245 and/or biometric sensors 1250. If the user is in a non-neutral affective state 1255, the system assesses 1260 whether the user is bored, distracted, blinking, or taking a break. If not, the system displays 1265 attention-following behavior, such as adjusting the distance and angle of the display from the user [paragraph 94]). 
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify the invention of Cassero with the third policy is adapted to perform an episodic motion as taught by Breazeal so as to allow the robot to perform motions on the basis of a set time period (episodic) as needed to improve a user’s comfort level.
	Cassero also does not teach: 
randomly generating, a plurality of samples of the parameterized input from within the range of values of the input.
	However, Bodnar teaches:
	randomly generating, a plurality of samples of the parameterized input from within the range of values of the input (Robots 180A, 180B, and/or other robots may be utilized to perform a large quantity of grasp episodes and data associated with the grasp episodes can be stored in offline episode data database 150 and/or provided for inclusion in online buffer 112 (of replay buffer(s) 110). Robots 180A and 180B can optionally initially perform grasp episodes (or other task episodes) according to a scripted exploration policy, in order to bootstrap data collection. The scripted exploration policy can be randomized, but biased toward reasonable grasps. Data from such scripted episodes can be stored in offline episode data database 150 and utilized in initial training of critic network 152 to bootstrap the initial training [Column 7, lines 48-60]).
	It would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify the invention of Cassero with randomly generating, a plurality of samples of the parameterized input from within the range of values of the input as taught by Bodnar so that the robot can be trained for a variety of random inputs to better adapt to unpredictable circumstances. 

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AARON G CAIN whose telephone number is (571)272-7009. The examiner can normally be reached Monday: 7:30am - 4:30pm EST to Friday 7:30pm - 4:30am.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wade Miles can be reached at (571) 270-7777. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AARON G CAIN/Examiner, Art Unit 3656
Read full office action
Prosecution Timeline

May 08, 2024
Application Filed
Nov 05, 2025
Non-Final Rejection — §103
Feb 05, 2026
Examiner Interview Summary
Feb 05, 2026
Applicant Interview (Telephonic)
Feb 06, 2026
Response Filed
Mar 17, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/157,979
Patent 12573302
METHOD FOR INFRASTRUCTURE-SUPPORTED ASSISTING OF A MOTOR VEHICLE
2y 5m to grant Granted Mar 10, 2026
17/733,079
Patent 12558790
METHOD AND COMPUTING SYSTEMS FOR PERFORMING OBJECT DETECTION
2y 5m to grant Granted Feb 24, 2026
17/995,673
Patent 12552019
MACHINE LEARNING METHOD AND ROBOT SYSTEM
2y 5m to grant Granted Feb 17, 2026
17/378,194
Patent 12544144
DENTAL ROBOT AND ORAL NAVIGATION METHOD
2y 5m to grant Granted Feb 10, 2026
18/030,783
Patent 12541205
MOVEMENT CONTROL SUPPORT DEVICE AND METHOD
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
40%
Grant Probability
66%
With Interview (+26.1%)
3y 3m
Median Time to Grant
Moderate
PTA Risk
Based on 130 resolved cases by this examiner. Grant probability derived from career allow rate.