Last updated: April 19, 2026
Application No. 18/232,239
SIMULATION OF ROBOTICS DEVICES USING A NEURAL NETWORK SYSTEMS AND METHODS

Non-Final OA §101§103
Filed
Aug 09, 2023
Examiner
VISCARRA, RICARDO I
Art Unit
3657
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
ETH ZÜRICH
OA Round
2 (Non-Final)
This examiner grants 62% of cases after interview

— +27.9% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 34 resolved cases, 2023–2026
Examiner Intelligence

VISCARRA, RICARDO I View full profile →
Grants 62% of resolved cases
Career Allow Rate
21 granted / 34 resolved
+9.8% vs TC avg
Strong +28% interview lift
Without
With
+27.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
23 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
13.0%
-27.0% vs TC avg
§103
61.9%
+21.9% vs TC avg
§102
16.4%
-23.6% vs TC avg
§112
6.2%
-33.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 34 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, filed 11/05/2025, with respect to the rejection(s) of claim(s) 1, 6, and 16 under 35 US.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  
Additionally, Applicant’s arguments with respect to the rejection of claims 4 and 19 have been considered but they are not persuasive. Applicant argues that Bai in view of Li fails to disclose a state augmentation transformer model. However, Applicant’s specification specifically discloses that the state augmentation transformer “can be a Temporal Fusion Transformer and/or another kind of neural network or other model configured to predict and/or augment state information of a robotics device” (at least as in paragraph 0037). Bai specifically teaches the machine learning model to predict the state of the robot system (at least as in col. 15, ln. 35-54) therefore Bai teaches additional limitation of claim 4 and 19. 	Examiner notes that if Applicant amends the claim language to precisely define the State Augmentation Transformer and to accurately reflect Applicant’s arguments, then the prior art would likely be overcome. 

Claim Objections
Claims 16-20 is/are objected to because of the following informality:
In line 1 of claims 16, “At least one computer-readable medium” should be simply “a computer-readable medium”;
In line 1 of claims 17-20, “The at least one computer-readable medium” should be “The computer-readable medium”. 
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 16-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does not fall within at least one of the four categories of patent eligible subject matter. Specifically, the claims are directed toward “at least one machine-readable medium carrying instructions” which, under its broadest reasonable interpretation, encompasses software per se. Software per se is not patentable under 35 U.S.C. 101; therefore, the claimed invention does not fall within a statutory class of patent eligible subject matter.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-5 and 8-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bai (US 11494632 B1) in view of Li (US 20230297068 A1), and further in view of Handa et al. (US 20200306960 A1, hereinafter Handa). 

Regarding claim 1, Bai discloses: 
A computer-implemented method of training a neural network to predict states of a robotics device, the method comprising:
receiving robotics data for at least one robotics device, wherein the robotics data includes (at least as in col. 18, ln. 32-34, wherein “At block 352, the system selects an episode data instance, such as an episode data instance generated using method 200 of FIG. 2”; at least as in col. 17, ln. 9-20, wherein “Turning now to FIG. 2, an example method 200 is illustrated of performing real episodes of a robotic task using a real physical robot, and storing real episode data instances based on the real episodes”; at least as in col. 17, ln. 28-61, wherein “At block 254, the system stores an episode data instance for the episode… At block 2541, the system stores trajectory data for the episode data instance… The trajectory data can be generated based on sensor data from sensors associated with one or more actuators of the real physical robot, such as positional sensor data from positional sensors associated with the actuators… At block 2543, the system stores a real episode success measure for the episode data instance… in determining whether a grasping task is successful for an episode, the system can monitor torque, position and/or other sensors of an end effector of the robot during the episode and/or after the episode to determine whether an object is likely successfully grasped in the episode”; at least as in col. 18, ln. 35-61, wherein “At block 354, the system configures a simulated environment for a simulated episode based on environmental data of the episode data instance… At block 356, the system causes the simulated robot to traverse a simulated trajectory based on trajectory data of the episode data instance”; at least as in col. 12, ln. 37-59, wherein “The simulator 120 includes a configuration engine 122. The configuration engine 122 dictates various parameters of the simulator 120 that are utilized by the simulator 120 in performing simulated robot episodes. As described in more detail below, the parameters dictated by the configuration engine 122 during a given simulated episode can be adapted based on feedback from the sim modification system 130, which causes the configuration engine 122 to iteratively adapt one or more parameters based on determinations of reality measures as described herein. Various parameters can be dictated by the configuration engine 122, such as simulated robot parameters of the simulated robot and/or environmental parameters that dictate one or more properties of one or more simulated environmental objects. Simulated robot parameters can include, for example, friction coefficients for simulated gripper(s) of the simulated robot, modeling (e.g., number of joint(s)) of simulated gripper(s) of the simulated robot, control parameter(s) for the simulated gripper(s), control parameter(s) for simulated actuator(s) of the simulated robot, etc.”);
generating, using the received robotics data, a training dataset, wherein generating the training dataset includes comparing the measurement data with simulated measurement data based on the digital simulation (at least as in col. 19, ln. 34-38, wherein “At block 362, the system determines a reality measure based on comparison of: (i) simulated episode success measures determined in the iterations of block 358 since a last iteration (if any) of block 362; and (ii) their corresponding real success measures”; at least as in col. 17, ln. 55-62, wherein “in determining whether a grasping task is successful for an episode, the system can monitor torque, position and/or other sensors of an end effector of the robot during the episode and/or after the episode to determine whether an object is likely successfully grasped in the episode”; at least as in col. 13, ln. 33-36, wherein “The simulated success measure for a simulated episode can have the same format as the real success measure for the corresponding episode data instance on which the simulated episode is based”; at least as in col. 18-19, ln. 65-5, wherein “in determining whether a grasping task is successful for a simulated episode, the system can utilize access to the ground truth state of the simulated environmental object(s) and/or the simulated robot to determine success/failure (e.g., based on the aperture of the simulated gripper and/or height of a simulated object)”; at least as in col. 20, ln. 13-43, wherein “After block 362, the system proceeds to block 364. At block 364, the system determines whether the reality measure satisfies a threshold and/or other criterion/criteria… If, at an iteration of block 364, the system determines the reality measure satisfies the threshold and/or other criteria/criterion, the system proceeds to block 368… At block 368, the system uses the simulator with the most recently modified parameters to generate simulated training examples based on new simulated episodes”); 
training, using the generated training dataset, a neural network (at least as in col. 20, ln. 35-43, wherein “At block 370, the system trains a machine learning model based on the simulated training examples generated at block 368” and block 370 may include block 371 and method 500; at least as in col. 20, ln. 44-59, wherein “the trained machine learning model can be trained to be used in processing of data to generate a predicted output, such as processing sensor data from one or more sensors of a robot (e.g., vision sensor(s), position sensor(s), torque sensor(s)), to generate predicted output that dictates one or more future movements of the robot”; at least as in col. 21, ln. 13-15, wherein “At block 556, the system generates a prediction based on processing of the training example input using the machine learning model”; at least as in col. 1, ln. 9-14, “Some of those approaches train a machine learning model (e.g., a deep neural network) that can be utilized generate one or more predictions that are utilized in control of a robot, and train the machine learning model using training examples that are based only on data from real-world physical robots”; at least as in col. 15, ln. 44-47, “determine an error based on the comparison and update the machine learning model by backpropagating the error over all or portions of the machine learning mode”); 

However, Bai does not explicitly disclose “indications of a set of components comprising at least one actuator and at least one structural element… to modify the digital simulation of the at least one robotics device… and modifying the digital simulation of the at least one robotics device using the trained neural network.”
Li discloses a robot system configured to, based on various pick-up conditions for grasping operations, grasp a plurality of objects, generate training data, and train a machine learning model. Li specifically teaches “indications of a set of components comprising at least one actuator and at least one structural element” (at least as in paragraph 0052, “The receiving unit 110 may receive the pick-up condition, which includes the information on the type of pick-up hand 31, the shape and size of the portion contacting the workpiece 50, etc., input by the user via the input unit 12, and may store the pick-up condition in the later-described storage unit 14. That is, the receiving unit 110 may receive information and store such information in the storage unit 14, the information including information on whether the pick-up hand 31 is of the air suction type or the gripping type, information on the shape and size of a suction pad contact portion where the pick-up hand 31 contacts the workpiece 50, information on the number of suction pads, information on the interval and distribution of a plurality of pads in a case where the pick-up hand 31 has the plurality of suction pads, and information on the shape and size of a portion where a gripping finger of the pick-up hand 31 contacts the workpiece 50, the number of gripping fingers, and the interval and distribution of the gripping fingers in a case where the pick-up hand 31 is of the gripping type. Note that the receiving unit 110 may receive such information in the form of a numerical value, but may receive the information in the form of a two-dimensional or three-dimensional graph (e.g., CAD data) or receive the information in the form of both a numerical value and a graph”).
Handa, in the same field of endeavor of robot control system trained to perform a task using a simulation, specifically teaches “to modify the digital simulation of the at least one robotics device… and modifying the digital simulation of the at least one robotics device using the trained neural network” (at least as in paragraph 0031, “a simulation utilizing the new values is generated to train the machine learning control system 124 to re-determine controls for the computer-controlled robot 110 to re-attempt to perform the bag placing task. In an embodiment, following the attempt, data relating to the inputs, outputs, and results of the re-attempted performance of the bag placing task is gathered and utilized by the control computer 122 and machine learning control system 124 to determine new values of the parameters for an updated simulation”; at least as in paragraph 0035, “a performance of the task is simulated via a simulation utilizing values determined for the parameters. In an embodiment, the simulation is generated utilizing a control computer comprising a machine learning control system and interface. In an embodiment, the control computer is a system like the control computer 122 described in connection with FIG. 1. In an embodiment, the control computer comprises various systems that can generate simulations, such as the simulation engine 408 described in connection with FIG. 4. In an embodiment, the simulation can be utilized to determine controls for a real world performance of the task”; at least as in paragraph 0066 and 0069, wherein the system utilizes a machine learning model trained by using real-world attempt data and simulation data to adjust the parameters of the simulation to approximately or exactly match the real-world attempt; at least as in paragraph 0025, “the machine learning control system 124 can utilize various structures such as a neural network, structured prediction system, anomaly detection system, supervised learning system, artificial intelligence system, and/or variations thereof, to manage the various control schemes”).
Therefore, it would have been obvious to one of the ordinary skill in the art at the effective filing date of the instant invention to modify the teachings of Bai, to include Li’s teaching of an information processing device configured to generate training data for training a machine learning model for specifying a position for retrieving a bulk-loaded work piece by utilizing retrieval conditions including information about the hand or the work piece and Handa’s teaching of a machine learning control system adjusting the parameters of the simulation, since Li teaches wherein the manipulator system with a pick-up condition including the type of pick-up hand avoids collision with a surrounding obstacle thus improving operation safety and efficiency and Handa teaches wherein the control system provides for increased accuracy of the simulation by tuning the simulation parameters based on failed real-world attempts. 

Regarding claim 2, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The computer-implemented method of claim 1, wherein training the neural network includes training a first model associated with a first component class and training a second model associated with a second component class (at least as in col. 15, ln. 35-54, wherein “The training engine 145 utilizes the simulated training examples 152 to train one or more machine learning models 160… The training engine 145 can also optionally train one or more of the machine learning model(s) 160 utilizing one or more real training examples that are based on output from real vision sensors and/or other components of (and/or associated with) a real robot during performance of episodes by the real robot”; at least as in col. 20, ln. 44-58, wherein “a control system of a robot can use the machine learning model in controlling one or more actuators and/or other component(s) of the robot. For instance, the trained machine learning model can be trained to be used in processing of data to generate a predicted output, such as processing sensor data from one or more sensors of a robot (e.g., vision sensor(s), position sensor(s), torque sensor(s)), to generate predicted output that dictates one or more future movements of the robot”).

Regarding claim 3, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The computer-implemented method of claim 2, wherein the first component class is an actuator class and the second component class is a structural element class (at least as in col. 20, ln. 44-58, wherein “a control system of a robot can use the machine learning model in controlling one or more actuators and/or other component(s) of the robot. For instance, the trained machine learning model can be trained to be used in processing of data to generate a predicted output, such as processing sensor data from one or more sensors of a robot (e.g., vision sensor(s), position sensor(s), torque sensor(s)), to generate predicted output that dictates one or more future movements of the robot. The robot can process such data, using the trained machine learning model, to generate predicted output, and generate one or more control commands to provide to actuator(s), based at least in part on the predicted output”).

Regarding claim 4, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The computer-implemented method of claim 2, wherein the first model and the second model each comprise a State Augmentation Transformer (at least as in col. 15, ln. 35-54, wherein “The training engine 145 utilizes the simulated training examples 152 to train one or more machine learning models 160. For example, the training engine 145 can process training example input of a simulated training example using one of the machine learning model(s) 160, generate a predicted output based on the processing, compare the predicted output to training example output of the simulated training example, and update the machine learning model based on the comparison. For instance, determine an error based on the comparison and update the machine learning model by backpropagating the error over all or portions of the machine learning model”; at least as in col. 21, ln. 15-20, wherein “the system can determine an error based on the comparison, and backpropagate the error over all or portions of the machine learning model”).

Regarding claim 5, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The computer-implemented method of claim 1, wherein predicting the state of the at least one robotics device includes receiving an initial estimate of the state and generating an additive residual value for the state (at least as in col. 14, ln. 12-24, “a reality measure of 1.5 can be determined based on a comparison of the quantity of simulated episodes where the real and sim success measures “agree” (60) to the quantity of simulated episodes where the real and sim success measures “disagree” (40). As other examples, a reality measure can be determined based on one or more of: a true positive rate, a true negative rate, a positive prediction value, a negative prediction value, a false negative rate, a false positive rate, a false discovery rate, a false omission rate, an F1 score, a Matthews correlation coefficient, an Informedness measure, and a Markedness measure”; at least as in col. 14, ln. 24-62, “When the reality measure determined by the reality measure engine 132 fails to satisfy a threshold and/or other criterion/criteria, the sim modification engine 134 of the sim modification system 130 can modify one or more parameters utilized by the configuration engine 122 during the simulated episodes utilized to determine the reality measure, and provide feedback to the configuration engine 122 to cause the configuration engine 122 to modify the parameters. Various parameters can be modified, such as simulated robot parameters of the simulated robot and/or environmental parameters that dictate one or more properties of one or more simulated environmental objects… a derivative free optimization technique can be utilized to iteratively adjust a friction coefficient parameter for the gripper, and the extent to which it is adjusted in a given iteration can be directly correlated to the reality measure (i.e., a relatively greater adjustment for a reality measure indicative of a relatively larger reality gap, and a relatively lesser adjustment for a reality measure indicative of a relatively smaller reality gap”).

Regarding claim 8, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The computer-implemented method of claim 1, wherein the predicted state of the at least one robotics device comprises a movement or position of the at least one actuator, the at least one structural element, or both the at least one actuator and the at least one structural element (at least as in col. 20, ln. 35-43, wherein “At block 370, the system trains a machine learning model based on the simulated training examples generated at block 368” and block 370 may include block 371 and method 500; at least as in col. 20, ln. 44-59, wherein “the trained machine learning model can be trained to be used in processing of data to generate a predicted output, such as processing sensor data from one or more sensors of a robot (e.g., vision sensor(s), position sensor(s), torque sensor(s)), to generate predicted output that dictates one or more future movements of the robot”; at least as in col. 21, ln. 13-15, wherein “At block 556, the system generates a prediction based on processing of the training example input using the machine learning model”; at least as in col. 22, ln. 30-37, wherein “The robot control system 660 may process the current image, optionally the additional image, and the candidate motion vector utilizing a trained machine learning model to generate a prediction of successful grasp, and based on the prediction can generate one or more end effector control commands for controlling the movement and/or grasping of an end effector of the robot”).

Regarding claim 9, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The computer-implemented method of claim 1, wherein the at least one sensor comprises a sensor included in the at least one actuator (at least as in col. 17, ln. 34-40, wherein “The trajectory data can be generated based on sensor data from sensors associated with one or more actuators of the real physical robot, such as positional sensor data from positional sensors associated with the actuators. The trajectory data can define the trajectory in joint space, task space, and/or other space(s).”).

Regarding claim 10, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The computer-implemented method of claim 1, wherein the set of components includes a mechanical element, an electrical element, or both (at least as in col. 14, ln. 24-43, “When the reality measure determined by the reality measure engine 132 fails to satisfy a threshold and/or other criterion/criteria, the sim modification engine 134 of the sim modification system 130 can modify one or more parameters utilized by the configuration engine 122 during the simulated episodes utilized to determine the reality measure, and provide feedback to the configuration engine 122 to cause the configuration engine 122 to modify the parameters. Various parameters can be modified, such as simulated robot parameters of the simulated robot and/or environmental parameters that dictate one or more properties of one or more simulated environmental objects. The various parameters can be modified manually (e.g., based on input from a human) and/or utilizing one or more automated techniques, such as derivative free optimization (e.g., CMA-ES and/or Bayesian optimization). In manual and/or automated techniques, a quantity of parameters modified and/or extent(s) of the modification(s) can optionally be based on the reality measure”).

Regarding claim 11, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The computer-implemented method of claim 1, wherein the set of components includes an electrical element, and wherein the measurement data includes electrical measurements associated with the electrical element (at least as in col. 18, ln. 32-34, wherein “At block 352, the system selects an episode data instance, such as an episode data instance generated using method 200 of FIG. 2”; at least as in col. 17, ln. 9-20, wherein “Turning now to FIG. 2, an example method 200 is illustrated of performing real episodes of a robotic task using a real physical robot, and storing real episode data instances based on the real episodes”; at least as in col. 17, ln. 28-61, wherein “At block 254, the system stores an episode data instance for the episode… At block 2541, the system stores trajectory data for the episode data instance… The trajectory data can be generated based on sensor data from sensors associated with one or more actuators of the real physical robot, such as positional sensor data from positional sensors associated with the actuators… At block 2543, the system stores a real episode success measure for the episode data instance… in determining whether a grasping task is successful for an episode, the system can monitor torque, position and/or other sensors of an end effector of the robot during the episode and/or after the episode to determine whether an object is likely successfully grasped in the episode”).

Regarding claim 12, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The computer-implemented method of claim 1, wherein the at least one robotics device includes a plurality of subsystems and the neural network comprises a model for each subsystem of the plurality of subsystems (at least as in col. 20, ln. 46-54, “a control system of a robot can use the machine learning model in controlling one or more actuators and/or other component(s) of the robot. For instance, the trained machine learning model can be trained to be used in processing of data to generate a predicted output, such as processing sensor data from one or more sensors of a robot (e.g., vision sensor(s), position sensor(s), torque sensor(s)), to generate predicted output that dictates one or more future movements of the robot”).

Regarding claim 13, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The computer-implemented method of claim 1, further comprising:
providing a control signal to the at least one robotics device, the control signal indicating a command to be executed by the at least one robotics device (at least as in col. 17, ln. 20-23, “At block 252, a real physical robot performs an episode of a robotic task. The robotic task can be, for example, a manipulation task”; at least as in col. 10, ln. 38-42, “During each episode, the robot 180 (or another robot) is controlled to cause the robot to attempt performance of a robotic task. The control of the robot 180 during an episode can be random, pseudo-random, and/or dictated by one or more control policies”);
generating the measurement data based on performance of the command (at least as in col. 10, ln. 58-62, “As the robot 180 moves during an episode, sensor data is generated by sensors of the robot that indicate movement of the robot during the episode. The robot data engine 112 of system 110 utilizes such sensor data to generate robot data for the episode”);
providing the control signal to the digital simulation of the at least one robotics device (at least as in col. 13, ln. 13-26, “In performing each such simulated episode, the simulated episode engine 124 further attempts to control a simulated robot to mimic the movements of the robot 180 in the corresponding episode data instance. For example, the simulated episode engine 124 can control a simulated robot to cause the simulated robot to traverse the trajectory defined by the trajectory data of the episode data instance. In these manners, in performing such a simulated episode, the simulated episode engine 124 attempts to simulate a corresponding one of the real episodes by configuring the environment based on the environmental data of the episode data instance of the real episode and by controlling the simulated robot in conformance with the robot data of the episode data instance of the real episode”); and 
generating the simulated measurement data based on simulated performance of the command by the digital simulation of the at least one robotics device (at least as in col. 13, ln. 26-55, “For each simulated episode that is based on a corresponding episode data instance, a sim success engine 126 of the simulator 120 evaluates the success of the robotic task for the simulated episode, and generates a simulated success measure based on the evaluation. Each simulated success measure indicates a degree of success of the robotic task for the corresponding simulated episode. The simulated success measure for a simulated episode can have the same format as the real success measure for the corresponding episode data instance on which the simulated episode is based… The sim success engine 126 can utilize one or more techniques to determine simulated success measures for simulated episodes. For example, in determining whether a grasping task is successful for a simulated episode, the sim success engine 126 can consider the grasping task is successful if the simulator 120 indicates that, after actuating the simulated grasping members of the simulated robot, the simulated grasping members are both contacting a simulated environmental object”).

Regarding claim 14, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The computer-implemented method of claim 1, wherein the digital simulation of the at least one robotics device comprises a graphic representation (at least as in col. 16, ln. 18-50, “For each new simulated grasp episode, the sim training example generation system 140 can utilize buffered (or otherwise stored) data from the simulator 120 for the new simulated grasp episode to generate a plurality of simulated training examples. Each training example can include a rendered image (and/or other rendered vision frame(s)) for a time step of the simulated new grasp episode and a task-space motion vector from a pose of a simulated end effector at that time step to the final pose of the simulated end effector at the final time step of the simulated new grasp episode. For example, a rendered image can be rendered from a point of view of a simulated camera of the simulated robot, such as a simulated stationary camera—or a simulated non-stationary camera, such as a simulated non-stationary camera attached to one of the links of the simulated robot. Further, the rendered images for each time step can be based on data from the simulator 120 for that time step (e.g., taken from the pose of the simulated camera at that time step, and capturing the simulated robot and simulated environment at that time step). The rendered images can be, for example, two-dimensional (“2D”) images with multiple color channels (e.g., red, green, and blue (“RGB”)). Also, for example, the images can instead be two-and-a-half dimensional (“2.5D”) images with RGB and depth channels. For the motion vector for a time step, the motion vector can be based on a transformation between the current pose of the simulated end effector at the time step and the final pose of the simulated end effector for the simulated new grasp episode. The training example output for each training example can be based on whether the corresponding new simulated grasp episode was successful (e.g., “0” or other value if not successful, and “1” or other value if successful)”).

Regarding claim 15, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The computer-implemented method of claim 1, further comprising:
generating a testing dataset;
evaluating accuracy of the trained neural network using the testing dataset; and
retraining the trained neural network based on comparing the accuracy of the trained neural network to a threshold value (at least as in col. 15, ln. 35-55, “The training engine 145 utilizes the simulated training examples 152 to train one or more machine learning models 160. For example, the training engine 145 can process training example input of a simulated training example using one of the machine learning model(s) 160, generate a predicted output based on the processing, compare the predicted output to training example output of the simulated training example, and update the machine learning model based on the comparison. For instance, determine an error based on the comparison and update the machine learning model by backpropagating the error over all or portions of the machine learning model. The training engine 145 can also optionally train one or more of the machine learning model(s) 160 utilizing one or more real training examples that are based on output from real vision sensors and/or other components of (and/or associated with) a real robot during performance of episodes by the real robot. Such real episodes can include those utilized to generate the episode data instances 150 and/or other episodes”).

Regarding claim 16, Bai discloses: 
At least one computer-readable medium carrying instructions that, when executed by a processor (at least as in col. 8, ln. 15-22, “Other implementations may include a non-transitory computer readable storage medium storing instructions executable by one or more processors (e.g., central processing unit(s) (CPU(s)), graphics processing unit(s) (GPU(s)), and/or tensor processing unit(s) (TPU(s)) to perform a method such as one or more of the methods described above and/or elsewhere herein”), cause the processor to perform operations comprising:
receive robotics data for at least one robotics device, wherein the robotics data includes indications of a set of components comprising at least one actuator and at least one structural element, a digital simulation of the at least one robotics device, and measurement data received from at least one sensor associated with the at least one robotics device (at least as in col. 18, ln. 32-34, wherein “At block 352, the system selects an episode data instance, such as an episode data instance generated using method 200 of FIG. 2”; at least as in col. 17, ln. 9-20, wherein “Turning now to FIG. 2, an example method 200 is illustrated of performing real episodes of a robotic task using a real physical robot, and storing real episode data instances based on the real episodes”; at least as in col. 17, ln. 28-61, wherein “At block 254, the system stores an episode data instance for the episode… At block 2541, the system stores trajectory data for the episode data instance… The trajectory data can be generated based on sensor data from sensors associated with one or more actuators of the real physical robot, such as positional sensor data from positional sensors associated with the actuators… At block 2543, the system stores a real episode success measure for the episode data instance… in determining whether a grasping task is successful for an episode, the system can monitor torque, position and/or other sensors of an end effector of the robot during the episode and/or after the episode to determine whether an object is likely successfully grasped in the episode”; at least as in col. 18, ln. 35-61, wherein “At block 354, the system configures a simulated environment for a simulated episode based on environmental data of the episode data instance… At block 356, the system causes the simulated robot to traverse a simulated trajectory based on trajectory data of the episode data instance”; at least as in col. 12, ln. 37-59, wherein “The simulator 120 includes a configuration engine 122. The configuration engine 122 dictates various parameters of the simulator 120 that are utilized by the simulator 120 in performing simulated robot episodes. As described in more detail below, the parameters dictated by the configuration engine 122 during a given simulated episode can be adapted based on feedback from the sim modification system 130, which causes the configuration engine 122 to iteratively adapt one or more parameters based on determinations of reality measures as described herein. Various parameters can be dictated by the configuration engine 122, such as simulated robot parameters of the simulated robot and/or environmental parameters that dictate one or more properties of one or more simulated environmental objects. Simulated robot parameters can include, for example, friction coefficients for simulated gripper(s) of the simulated robot, modeling (e.g., number of joint(s)) of simulated gripper(s) of the simulated robot, control parameter(s) for the simulated gripper(s), control parameter(s) for simulated actuator(s) of the simulated robot, etc.”);
generate, using the received robotics data, a training dataset, wherein generating the training dataset includes comparing the measurement data with simulated measurement data based on the digital simulation (at least as in col. 19, ln. 34-38, wherein “At block 362, the system determines a reality measure based on comparison of: (i) simulated episode success measures determined in the iterations of block 358 since a last iteration (if any) of block 362; and (ii) their corresponding real success measures”; at least as in col. 17, ln. 55-62, wherein “in determining whether a grasping task is successful for an episode, the system can monitor torque, position and/or other sensors of an end effector of the robot during the episode and/or after the episode to determine whether an object is likely successfully grasped in the episode”; at least as in col. 13, ln. 33-36, wherein “The simulated success measure for a simulated episode can have the same format as the real success measure for the corresponding episode data instance on which the simulated episode is based”; at least as in col. 18-19, ln. 65-5, wherein “in determining whether a grasping task is successful for a simulated episode, the system can utilize access to the ground truth state of the simulated environmental object(s) and/or the simulated robot to determine success/failure (e.g., based on the aperture of the simulated gripper and/or height of a simulated object)”; at least as in col. 20, ln. 13-43, wherein “After block 362, the system proceeds to block 364. At block 364, the system determines whether the reality measure satisfies a threshold and/or other criterion/criteria… If, at an iteration of block 364, the system determines the reality measure satisfies the threshold and/or other criteria/criterion, the system proceeds to block 368… At block 368, the system uses the simulator with the most recently modified parameters to generate simulated training examples based on new simulated episodes”); 
train, using the generated training dataset, a neural network (at least as in col. 20, ln. 35-43, wherein “At block 370, the system trains a machine learning model based on the simulated training examples generated at block 368” and block 370 may include block 371 and method 500; at least as in col. 20, ln. 44-59, wherein “the trained machine learning model can be trained to be used in processing of data to generate a predicted output, such as processing sensor data from one or more sensors of a robot (e.g., vision sensor(s), position sensor(s), torque sensor(s)), to generate predicted output that dictates one or more future movements of the robot”; at least as in col. 21, ln. 13-15, wherein “At block 556, the system generates a prediction based on processing of the training example input using the machine learning model”; at least as in col. 1, ln. 9-14, “Some of those approaches train a machine learning model (e.g., a deep neural network) that can be utilized generate one or more predictions that are utilized in control of a robot, and train the machine learning model using training examples that are based only on data from real-world physical robots”; at least as in col. 15, ln. 44-47, “determine an error based on the comparison and update the machine learning model by backpropagating the error over all or portions of the machine learning mode”); 

However, Bai does not explicitly disclose “indications of a set of components comprising at least one actuator and at least one structural element… to modify the digital simulation of the at least one robotics device… and modify the digital simulation of the at least one robotics device using the trained neural network.”
Li discloses a robot system configured to, based on various pick-up conditions for grasping operations, grasp a plurality of objects, generate training data, and train a machine learning model. Li specifically teaches “indications of a set of components comprising at least one actuator and at least one structural element” (at least as in paragraph 0052, “The receiving unit 110 may receive the pick-up condition, which includes the information on the type of pick-up hand 31, the shape and size of the portion contacting the workpiece 50, etc., input by the user via the input unit 12, and may store the pick-up condition in the later-described storage unit 14. That is, the receiving unit 110 may receive information and store such information in the storage unit 14, the information including information on whether the pick-up hand 31 is of the air suction type or the gripping type, information on the shape and size of a suction pad contact portion where the pick-up hand 31 contacts the workpiece 50, information on the number of suction pads, information on the interval and distribution of a plurality of pads in a case where the pick-up hand 31 has the plurality of suction pads, and information on the shape and size of a portion where a gripping finger of the pick-up hand 31 contacts the workpiece 50, the number of gripping fingers, and the interval and distribution of the gripping fingers in a case where the pick-up hand 31 is of the gripping type. Note that the receiving unit 110 may receive such information in the form of a numerical value, but may receive the information in the form of a two-dimensional or three-dimensional graph (e.g., CAD data) or receive the information in the form of both a numerical value and a graph”).
Handa, in the same field of endeavor of robot control system trained to perform a task using a simulation, specifically teaches “to modify the digital simulation of the at least one robotics device… and modify the digital simulation of the at least one robotics device using the trained neural network” (at least as in paragraph 0031, “a simulation utilizing the new values is generated to train the machine learning control system 124 to re-determine controls for the computer-controlled robot 110 to re-attempt to perform the bag placing task. In an embodiment, following the attempt, data relating to the inputs, outputs, and results of the re-attempted performance of the bag placing task is gathered and utilized by the control computer 122 and machine learning control system 124 to determine new values of the parameters for an updated simulation”; at least as in paragraph 0035, “a performance of the task is simulated via a simulation utilizing values determined for the parameters. In an embodiment, the simulation is generated utilizing a control computer comprising a machine learning control system and interface. In an embodiment, the control computer is a system like the control computer 122 described in connection with FIG. 1. In an embodiment, the control computer comprises various systems that can generate simulations, such as the simulation engine 408 described in connection with FIG. 4. In an embodiment, the simulation can be utilized to determine controls for a real world performance of the task”; at least as in paragraph 0066 and 0069, wherein the system utilizes a machine learning model trained by using real-world attempt data and simulation data to adjust the parameters of the simulation to approximately or exactly match the real-world attempt; at least as in paragraph 0025, “the machine learning control system 124 can utilize various structures such as a neural network, structured prediction system, anomaly detection system, supervised learning system, artificial intelligence system, and/or variations thereof, to manage the various control schemes”).
Therefore, it would have been obvious to one of the ordinary skill in the art at the effective filing date of the instant invention to modify the teachings of Bai, to include Li’s teaching of an information processing device configured to generate training data for training a machine learning model for specifying a position for retrieving a bulk-loaded work piece by utilizing retrieval conditions including information about the hand or the work piece and Handa’s teaching of a machine learning control system adjusting the parameters of the simulation, since Li teaches wherein the manipulator system with a pick-up condition including the type of pick-up hand avoids collision with a surrounding obstacle thus improving operation safety and efficiency and Handa teaches wherein the control system provides for increased accuracy of the simulation by tuning the simulation parameters based on failed real-world attempts. 

Regarding claim 17, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The at least one computer-readable medium of claim 16, wherein training the neural network includes training a first model associated with a first component class and training a second model associated with a second component class (at least as in col. 15, ln. 35-54, wherein “The training engine 145 utilizes the simulated training examples 152 to train one or more machine learning models 160… The training engine 145 can also optionally train one or more of the machine learning model(s) 160 utilizing one or more real training examples that are based on output from real vision sensors and/or other components of (and/or associated with) a real robot during performance of episodes by the real robot”; at least as in col. 20, ln. 44-58, wherein “a control system of a robot can use the machine learning model in controlling one or more actuators and/or other component(s) of the robot. For instance, the trained machine learning model can be trained to be used in processing of data to generate a predicted output, such as processing sensor data from one or more sensors of a robot (e.g., vision sensor(s), position sensor(s), torque sensor(s)), to generate predicted output that dictates one or more future movements of the robot”).

Regarding claim 18, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The at least one computer-readable medium of claim 17, wherein the first component class is an actuator class and the second component class is a structural element class (at least as in col. 20, ln. 44-58, wherein “a control system of a robot can use the machine learning model in controlling one or more actuators and/or other component(s) of the robot. For instance, the trained machine learning model can be trained to be used in processing of data to generate a predicted output, such as processing sensor data from one or more sensors of a robot (e.g., vision sensor(s), position sensor(s), torque sensor(s)), to generate predicted output that dictates one or more future movements of the robot. The robot can process such data, using the trained machine learning model, to generate predicted output, and generate one or more control commands to provide to actuator(s), based at least in part on the predicted output”).

Regarding claim 19, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The at least one computer-readable medium of claim 17, wherein the first model and the second model each comprise a State Augmentation Transformer (at least as in col. 15, ln. 35-54, wherein “The training engine 145 utilizes the simulated training examples 152 to train one or more machine learning models 160. For example, the training engine 145 can process training example input of a simulated training example using one of the machine learning model(s) 160, generate a predicted output based on the processing, compare the predicted output to training example output of the simulated training example, and update the machine learning model based on the comparison. For instance, determine an error based on the comparison and update the machine learning model by backpropagating the error over all or portions of the machine learning model”; at least as in col. 21, ln. 15-20, wherein “the system can determine an error based on the comparison, and backpropagate the error over all or portions of the machine learning model”).

Regarding claim 20, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The at least one computer-readable medium of claim 16, wherein predicting the state of the at least one robotics device includes receiving an initial estimate of the state and generating an additive residual value for the state (at least as in col. 14, ln. 12-24, “a reality measure of 1.5 can be determined based on a comparison of the quantity of simulated episodes where the real and sim success measures “agree” (60) to the quantity of simulated episodes where the real and sim success measures “disagree” (40). As other examples, a reality measure can be determined based on one or more of: a true positive rate, a true negative rate, a positive prediction value, a negative prediction value, a false negative rate, a false positive rate, a false discovery rate, a false omission rate, an F1 score, a Matthews correlation coefficient, an Informedness measure, and a Markedness measure”; at least as in col. 14, ln. 24-62, “When the reality measure determined by the reality measure engine 132 fails to satisfy a threshold and/or other criterion/criteria, the sim modification engine 134 of the sim modification system 130 can modify one or more parameters utilized by the configuration engine 122 during the simulated episodes utilized to determine the reality measure, and provide feedback to the configuration engine 122 to cause the configuration engine 122 to modify the parameters. Various parameters can be modified, such as simulated robot parameters of the simulated robot and/or environmental parameters that dictate one or more properties of one or more simulated environmental objects… a derivative free optimization technique can be utilized to iteratively adjust a friction coefficient parameter for the gripper, and the extent to which it is adjusted in a given iteration can be directly correlated to the reality measure (i.e., a relatively greater adjustment for a reality measure indicative of a relatively larger reality gap, and a relatively lesser adjustment for a reality measure indicative of a relatively smaller reality gap”).

Claim(s) 6-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bai (US 11494632 B1) in view of Li (US 20230297068 A1) and Handa et al. (US 20200306960 A1, hereinafter Handa), and further in view of Beckman et al. (US-10800040-B1, hereinafter Beckman).

Regarding claim 6, in view of the above combination of Bai, Li, and Handa, Bai further discloses: 
The computer-implemented method of claim 1, further comprising:
applying the trained neural network (at least as in col. 20, ln. 44-58, wherein “At block 372, the system provides the machine learning model, as trained, to one or more robots for use by the robot(s). For example, a control system of a robot can use the machine learning model in controlling one or more actuators and/or other component(s) of the robot. For instance, the trained machine learning model can be trained to be used in processing of data to generate a predicted output, such as processing sensor data from one or more sensors of a robot (e.g., vision sensor(s), position sensor(s), torque sensor(s)), to generate predicted output that dictates one or more future movements of the robot. The robot can process such data, using the trained machine learning model, to generate predicted output, and generate one or more control commands to provide to actuator(s), based at least in part on the predicted output”).
However, Bai does not explicitly disclose “to a different robotics device to predict a future state of the different robotics device.”
Beckman, in the same field of endeavor of machine learning system for controlling robots using simulations and real-world trail data, specifically teaches “to a different robotics device to predict a future state of the different robotics device” (at least as in col. 6, ln. 6-30, wherein “the robotic control system 220 can operate as the machine learning training system that generates the robotic control policy. During both real-world training and implementation, the controller 250 can provide programmatic control of the robotic system 110, for example by maintaining robotic position data, determining a sequence of actions needed to perform tasks based on a current iteration of the control policy, and causing actuation of the various components of the robotic system 110. The robotic control system 220 is illustrated graphically as a server system, and the server system can be configured to control (via a network) a number of remote robotic systems that are the same or different from one another that are performing the same task or different tasks”).
Therefore, it would have been obvious to one of the ordinary skill in the art at the effective filing date of the instant invention to modify the teachings of Bai, to include Beckman’s teaching of implementing a control policy to different robots, since Beckman teaches wherein the robotic control system may be refined to achieve a desired level of success or accuracy thus reducing the time required to train multiple individual control systems.

Regarding claim 7, the above combination of Bai, Li, Handa, and Beckman discloses the computer-implemented method of claim 6, however, Bai does not explicitly disclose “wherein the different robotics device includes a different set of components.”
Li discloses a robot system configured to, based on various pick-up conditions for grasping operations, grasp a plurality of objects, generate training data, and train a machine learning model. Li specifically teaches “wherein the different robotics device includes a different set of components” (at least as in paragraph 0052, “The receiving unit 110 may receive the pick-up condition, which includes the information on the type of pick-up hand 31, the shape and size of the portion contacting the workpiece 50, etc., input by the user via the input unit 12, and may store the pick-up condition in the later-described storage unit 14. That is, the receiving unit 110 may receive information and store such information in the storage unit 14, the information including information on whether the pick-up hand 31 is of the air suction type or the gripping type, information on the shape and size of a suction pad contact portion where the pick-up hand 31 contacts the workpiece 50, information on the number of suction pads, information on the interval and distribution of a plurality of pads in a case where the pick-up hand 31 has the plurality of suction pads, and information on the shape and size of a portion where a gripping finger of the pick-up hand 31 contacts the workpiece 50, the number of gripping fingers, and the interval and distribution of the gripping fingers in a case where the pick-up hand 31 is of the gripping type”).
Therefore, it would have been obvious to one of the ordinary skill in the art at the effective filing date of the instant invention to modify the teachings of Bai, to include Li’s teaching of an information processing device configured to generate training data for training a machine learning model for specifying a position for retrieving a bulk-loaded work piece by utilizing retrieval conditions including information about the hand or the work piece, since Li teaches wherein the manipulator system with a pick-up condition including the type of pick-up hand avoids collision with a surrounding obstacle thus improving operation safety and efficiency.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RICARDO ICHIKAWA VISCARRA whose telephone number is (571)270-0154. The examiner can normally be reached M-F 9-12 & 2-4 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Adam Mott can be reached on (571) 270-5376. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/RICARDO I VISCARRA/Examiner, Art Unit 3657   

/ADAM R MOTT/Supervisory Patent Examiner, Art Unit 3657
Read full office action
Prosecution Timeline

Aug 09, 2023
Application Filed
Jul 26, 2025
Non-Final Rejection — §101, §103
Nov 05, 2025
Response Filed
Feb 17, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/098,915
Patent 12558719
BINDING DEVICE, BINDING SYSTEM, METHOD FOR CONTROLLING BINDING DEVICE, AND COMPUTER READABLE STORAGE MEDIUM STORING PROGRAM
2y 5m to grant Granted Feb 24, 2026
16/579,530
Patent 12545356
MICROMOBILITY ELECTRIC VEHICLE WITH WALK-ASSIST MODE
2y 5m to grant Granted Feb 10, 2026
17/716,676
Patent 12528400
MOBILE FULFILLMENT CONTAINER APPARATUS, SYSTEMS, AND RELATED METHODS
2y 5m to grant Granted Jan 20, 2026
18/363,009
Patent 12502781
ROBOT OFFSET SIMULATION METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Dec 23, 2025
17/913,002
Patent 12487602
IMPROVED NAVIGATION FOR A ROBOTIC WORK TOOL
2y 5m to grant Granted Dec 02, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

2-3
Expected OA Rounds
62%
Grant Probability
90%
With Interview (+27.9%)
3y 9m
Median Time to Grant
Moderate
PTA Risk
Based on 34 resolved cases by this examiner. Grant probability derived from career allow rate.