Last updated: April 19, 2026
Application No. 17/446,347
INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD

Non-Final OA §101§103§112
Filed
Aug 30, 2021
Examiner
WECHSELBERGER, ALFRED H.
Art Unit
2187
Tech Center
2100 — Computer Architecture & Software
Assignee
Preferred Networks Inc.
OA Round
3 (Non-Final)
This examiner grants 58% of cases after interview

— +36.5% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 212 resolved cases, 2023–2026
Examiner Intelligence

WECHSELBERGER, ALFRED H. View full profile →
Grants 58% of resolved cases
Career Allow Rate
122 granted / 212 resolved
+2.5% vs TC avg
Strong +36% interview lift
Without
With
+36.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 8m
Avg Prosecution
42 currently pending
Career history
254
Total Applications
across all art units
Statute-Specific Performance

§101
30.0%
-10.0% vs TC avg
§103
38.9%
-1.1% vs TC avg
§102
3.8%
-36.2% vs TC avg
§112
24.0%
-16.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 212 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/01/2025 has been entered.
Claims 1 – 6, 10 – 11, 13 and 15 - 32 have been presented for examination.  Claims 1, 3 – 5, 10 – 11, 13, 15 – 18 are currently amended.  Claims 7 – 9, 12 and 14 are cancelled.  Claims 25 – 32 are new.

Response to Rejections Under 35 U.S.C. § 101
Applicant’s arguments have been fully considered.  However, the Office does not consider them to be persuasive.

Applicant argues: “As is described below, claim 10 does not include an abstract idea and distinguishes over the cited references. The Applicant respectfully submits that amended claim 10 encapsulates an inventive concept. Thus, claim 11 as a whole provides an inventive concept. Therefore, claim 11 is patent eligible” (emphasis added)

	Applicant appears to argue that the claims are eligible at Step 2B by providing a conclusory arguments that claim 10 encapsulates an inventive concept (see MPEP 2106.05(I) The Search for an Inventive Concept).  Examiner notes that claim 10 recites similar “input” and “perform” and “train” steps as the previous claims which were rejected under Step 2B (see Claim Rejections - 35 USC § 101 for the detailed rejection).  Further, Applicant has not specifically argued any of the claim 10 limitations.  


Response to Rejections Under 35 U.S.C. § 103
Applicant’s arguments have been fully considered.  However, the Office does not consider them to be persuasive.

Applicant argues: “As is described above, Li ( 496) merely discloses that the simulated output for controlling how the physical process performs a task in the virtual environment is input to a machine learning model. Li (496) does not disclose a differentiable simulator.  Thus, nowhere in the specification and drawings, does Li (496) teach or suggest the feature of "perform, based on input information and an environment variable into a differentiable simulator, a simulation with respect to a state of a virtual world," “ (emphasis in original)

Piecemeal analysis.  The differentiable simulator is taught by Degrave (Thesis)

Applicant argues: “As is described above, Beckman (008) merely discloses parameters that dictate what action or sequence of actions the robotic system 110 should take. Thus, nowhere in the specification and drawings, does Beckman (008) teach or suggest the feature of "wherein the environment variable includes information related to a physical quantity of an object in the real world".” (emphasis in original)

	The broadest reasonable interpretation is discussed in MPEP 2111.  Examiner notes that the sequence of actions that robotic system should take (i.e., the control policy) depends directly on the object being controlled and through its interaction with the robotic system, said object representing a physical object having a weight (see Beckman (008) Col. 12, Lines 56 – 62 the refined control policy reflects the weight distribution  of the object being controlled “As such, a weight distribution in the object 105 can be measured and this weight distribution used to model the object 105 in the revised simulated environment. The robotic control system 220 can run additional simulations in the revised simulated environment and use evaluations of these simulations to refine the policy and/or build a new policy at block 301.”). Therefore, the control policy itself reflects the weight of the controlled object (i.e., includes information related to a physical quantity of an object in the real world).  

	Applicant argues: “As is described above, Beckman (008) discloses that the control policy is the model or model parameters. Thus, Beckman (008) merely discloses that the robotic control system 220 trains the model or model parameters. The Applicant respectfully submits that the simulated output of Li (496) is input into a machine learning model and the simulated output of Li (496) is not a model or model parameters. Therefore, it is not obvious for those skilled in the art to modify Li (496) to update the simulated output, which is not a model or model parameters and which is an input into the machine learning model” (emphasis added)

Applicant argues that “the simulated output of Li (496) is input into a machine learning model and the simulation output of Li (496) is not a model or model parameters” and it is not obvious to “update the simulated output, which is not a model or model parameters and which is an input into the machine learning model” (see emphasis).  The claims are given their broadest reasonable interpretation in light of the specification (see MPEP 2111).  Examiner notes that Beckman (008) explicitly teaches using simulated outputs to train the control policy, therefore the simulated output of Li (496) can directly be used as inputs to the control policy of Beckman (008) for training purposes.  Applicant argues that the simulated output of Li (496) “is not a model or model parameters” which is not alleged or relied upon in the instant office action.

	Applicant argues: “As is described above, Beckman (008) merely discloses evolution strategies to train an artificial neural network control policy. Thus, nowhere in the specification and drawings, does Beckman (008) teach or suggest the feature of "train, without updating parameters of the differentiable simulator itself, the first neural network based on the second state of the virtual world".”

	Beckman (008) explicitly teaches that evolution strategies or backpropagation is used to train the neural network, which merely uses the simulation results as part of the training, which does require the simulator itself to be modified after obtaining the simulation results (see Beckman (008) Col. 8, Lines 50 – 55 control policy is trained with backpropogation for optimum performance in the virtual simulation, and Col. 11, Lines 15 – 30 a previously generated simulated environment can be used to iterate on the control policy, where the previous simulation result(s) are utilized without changing any parameters of the simulator used to generate said results “At block 301, the robotic control system 220 trains the control policy based on evaluating simulated robotic performance of the task. As described above, this can involve the physics simulation engine 205 generating a simulated environment 230 that is a high-fidelity (though perhaps more simplified in terms of how certain parameters are modeled) copy of the real-world environment and system of the robotic task performance, and then running a number of trials of the robotic system 110 performing the task 100 within the simulated environment 230. Evaluations of each trial from the feedback engine 210, via a human evaluator and/or via a computerized reward functions or machine learning classifier evaluation, can guide the machine learning system 215 to generate and iterate a simulated control policy, for example using evolution strategies or reinforcement learning.”)

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., an abstract idea) without significantly more.

Dependent claim 11 recite(s) the same statutory category at Step 1 as the parent claim(s), and further recite(s): In claim 11 wherein the at least one processor calculates a reward based on the information related to the second state of the virtual world.  At Step 2A, Prong I the recited limitations, alone or in combination, amount to steps that, under its broadest reasonable interpretation, cover mathematical concepts (see MPEP 2106.04(a)(2)(I)).  For example, the “calculates” recites performing calculations of a desired quantity. Accordingly, the claim(s) recite(s) an abstract idea.
At Step 2A, Prong II this judicial exception is not integrated into a practical application since the claimed invention further claims: as incorporated from parent claim 10 at least one memory; and at least one processor configured to: input information related to a first state of a virtual world and an environment variable into a first neural network to output information related to a control method; perform by inputting the information related to the first state of the virtual world, the environment variable, and the information related to the control method into a differentiable simulator, a simulation with respect to a state of the virtual world to obtain information related to a second state of the virtual world, the second state of the virtual world being a state after a target is controlled based on the information related to the control method, the second state of the virtual world being subsequent in time to the first state of the virtual world; and train, without updating parameters of the differentiable simulator itself, the first neural network based on a result of the second state of the virtual world, wherein the environment variable includes information related to a physical quantity of an object; In claim 11 train the first neural network based on the reward without updating parameters of the differentiable simulator.  For example, the “memory” and “processor” are recited at a high-level of generality such that they amount to no more than mere application of the judicial exception using generic computer components which does not amount to an improvement in computer functionality (see MPEP 2106.04(d)(I)).  The “input” amounts to insignificant data gathering since it is recited at a high-level of generality with regard to how the data is gathered, and since the remaining steps merely utilize the obtained data in a generic manner (see MPEP 2106.05(g)). The “perform” amounts to insignificant data gathering since output from said simulations are merely utilized by the “train” and “calculates” in a generic manner (see MPEP 2106.05(g)). The “train” amounts to reciting the words “apply it” since it recites the idea of an outcome that is merely based on a previous simulation or mental process result.  The claim is directed to an abstract idea.
At Step 2B the claim(s) do not recite additional elements that, alone or in an ordered combination, are sufficient to amount to significantly more than the judicial exception.  As discussed above with respect to the integration of the abstract idea into a practical application, the recited “memory” and “processor” amounts to no more than mere instructions to apply the judicial exception using generic computer components. The additional elements do not amount to a particular machine (see MPEP 2106.05(b)(I)). Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.  The ”input” comprise well-understood, routine, and conventional activity since they are generic with regard to how the data is obtained which reasonably includes any electronic means (see MPEP 2106.05(d)(II) “i. Receiving or transmitting data over a network … iv. Storing and retrieving information in memory”).  The “perform” comprise well-understood, routine, and conventional activity since it requires no more than generic computer components executing generic neural network models (see the instant application Paragraph 23 – 24 “A simulation program is installed in the simulation device 120, and when the simulation program is executed, the simulation device 120 functions as a simulation unit 121. The simulation unit 121 includes a differentiable physical simulator for reproducing the real world. The simulation unit 121 includes a model of ‘a neural network (NN) for realization’ ”). The “train” amounts to reciting the words “apply it” at least since it requires no more than generic computer components.  Considering the additional elements in combination does not add anything more than when considering them individually since the data-gathering steps necessarily occur before the data-outputting steps, and all the additional elements require no more than generic computer functions.  For at least these reasons, the claim is not patent eligible.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
Determining the scope and contents of the prior art.
Ascertaining the differences between the prior art and the claims at issue.
Resolving the level of ordinary skill in the pertinent art.
Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1 – 6, 10 – 11, 15 – 26 and 28 - 32 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US 2018/0345496) (henceforth “Li (496)”) in view of Beckman et al. (US 11584008) (henceforth “Beckman (008)”), and further in view of Degrave, J. “Incorporating Prior Knowledge into Deep Neural Network Controllers of Legged Robots” (henceforth “Degrave (Thesis)”).  Li (496) and Beckman (008) and Degrave (Thesis) are analogous art because they solve the same problem of simulating a virtual world with changes, and because they are from the same field of simulating virtual worlds.

With regard to claim 1, Li (496) teaches an information processing device comprising: at least one memory; and at least one processor configured to: (Paragraph 84 - 85 computer readable medium and processor)
perform, by inputting input information and an environment variable into a simulator, a simulation with respect to a state of a virtual world, the input information being related to a first state of the virtual world being based on an observation result of a state of a real world, (Li (496) Paragraph 55 a virtual simulator is instantiated with output on how to perform a task (environment variable) and real data collected from a process (input information) “As shown, adaptation engine 122 initially receives, as input to a machine learning model that adapts a simulation of a physical process executing in a virtual environment to a physical world, simulated output for controlling how the physical process performs a task in the virtual environment and real-world data collected from the physical process performing the task in the physical world (operation 402). The simulated output may include a predicted next state of the physical process, and the real-world data may include a set of previous states of the physical process and/or a set of previous actions performed by the physical process”)
a result of the simulation including information related to a second state of the virtual world, and the second state of the virtual world being subsequent in time to the first state of the virtual world and (Li (496) Paragraph 59 the virtual world is updated again (the result of the simulation including information related to a second state of the virtual world subsequent in time) to reflect commands and changes applied in the process (after the real world has changed) “After the command is carried out by the robot, the action performed by the robot in response to the command and the current state of the robot ( e.g., the actual positions and orientations of the robot and objects) may be fed back into the machine learning model as the most recent previous action and most recent previous state of the robot, respectively. The positions and orientations of the robot and objects may thus continue to be updated based on previous actions, previous states, and output from the additional machine learning model until the task is complete (e.g., the execution of the robot in performing the task in the physical world reaches a final or satisfactory state).”)
information related to a changed state of the virtual world, the information related to the changed state of the virtual world being based on an observation result of the real world that is observed after the real world has changed. (Li (496) Paragraph 59 the virtual world is updated again (the result of the simulation including information related to a second state of the virtual world subsequent in time) to reflect commands and changes applied in the process (after the real world has changed) “After the command is carried out by the robot, the action performed by the robot in response to the command and the current state of the robot ( e.g., the actual positions and orientations of the robot and objects) may be fed back into the machine learning model as the most recent previous action and most recent previous state of the robot, respectively. The positions and orientations of the robot and objects may thus continue to be updated based on previous actions, previous states, and output from the additional machine learning model until the task is complete (e.g., the execution of the robot in performing the task in the physical world reaches a final or satisfactory state).”)

Li (496) does not appear to explicitly disclose: update the environment variable so that a difference between the result of the simulation and information related to a changed state of the virtual world is reduced; wherein the environment variable includes information related to a physical quantity of an object in the real world.

	However, Beckman (008) teaches:
update an environment variable so that a difference between a result of a simulation and information related to a changed state of a virtual world is reduced (Beckman (008) Col. 11, Lines 15 – 21 the control algorithm of a robot is adjusted to perform well in simulation that matches the existing process “At block 301, the robotic control system 220 trains the control policy based on evaluating simulated robotic performance of the task. As described above, this can involve the physics simulation engine 205 generating a simulated environment 230 that is a high-fidelity (though perhaps more simplified in terms of how certain parameters are modeled) copy of the real-world environment and system of the robotic task performance,”)
wherein the environment variable includes information related to a physical quantity of an object in the real world. (Beckman (008) Col. 8, Lines 50 – 55 the control policy includes information on the robot arm and task being implemented, and Col. 12, Lines 56 – 62 the refined control policy reflects the weight distribution (a physical quantity) of the object being controlled (includes information related to) “As such, a weight distribution in the object 105 can be measured and this weight distribution used to model the object 105 in the revised simulated environment. The robotic control system 220 can run additional simulations in the revised simulated environment and use evaluations of these simulations to refine the policy and/or build a new policy at block 301.”)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21)

	Li (496) in view of Beckman (008) does not appear to explicitly disclose: that the simulator is a differentiable simulator.

	However, Degrave (Thesis) teaches:
perform, by inputting information input information into a differentiable simulator, a simulation with respect to a state of a virtual world. (Degrave (Thesis) Page 161 the differentiable simulator models a robot fitness “The goal is to implement a modern 3D Rigid body engine, in which parameters can be differentiated with respect to the fitness a robot achieves in a simulation, such that these parameters can be optimized with methods based on gradient descent.”, and Page 160 the fitness is with regard to control system (based on input information and an environment variable) in relation to task, where Li (496) and Beckman (008) similarly relate to a robot controlled to be perform desired tasks “To solve tasks efficiently, robots require an optimization of their control system”)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) in view of Beckman (008) with the differentiable physics simulation for interacting with a virtual robot disclosed by Degrave (Thesis).  One of ordinary skill in the art would have been motivated to make this modification in order to make the optimization process more efficient (Degrave (Thesis) Page 161)

With regard to claim 10, Li (496) teaches an information processing device comprising: at least one memory; and at least one processor configured to: (Paragraph 84 - 85 computer readable medium and processor)
perform by inputting information related to the first state of the virtual world, the environment variable, and the information related to the control method into a simulator, a simulation with respect to a state of the virtual world to obtain information related to a second state of the virtual world, (Paragraph 55 a virtual simulator is instantiated with output on how to perform a task (environment variable) and real data collected from a process (input information) “As shown, adaptation engine 122 initially receives, as input to a machine learning model that adapts a simulation of a physical process executing in a virtual environment to a physical world, simulated output for controlling how the physical process performs a task in the virtual environment and real-world data collected from the physical process performing the task in the physical world (operation 402). The simulated output may include a predicted next state of the physical process, and the real-world data may include a set of previous states of the physical process and/or a set of previous actions performed by the physical process”)
the second state of the virtual world being a state after a target is controlled based on the information related to the control method, the second state of the virtual world being subsequent in time to the first state of the virtual world; and (Paragraph 59 the virtual world is updated again to reflect commands and changes applied in the process (after the real world has changed) “After the command is carried out by the robot, the action performed by the robot in response to the command and the current state of the robot ( e.g., the actual positions and orientations of the robot and objects) may be fed back into the machine learning model as the most recent previous action and most recent previous state of the robot, respectively. The positions and orientations of the robot and objects may thus continue to be updated based on previous actions, previous states, and output from the additional machine learning model until the task is complete ( e.g., the execution of the robot in performing the task in the physical world reaches a final or satisfactory state).”

Li (496) does not appear to explicitly disclose: input information related to a first state of a virtual world and an environment variable into a first neural network to output information related to a control method; and train, without updating parameters of the simulator itself, the first neural network based on the second state of the virtual world; wherein the environment variable includes information related to a physical quantity of an object.

However, Beckman (008) teaches:
input information related to a first state of a virtual world and an environment variable into a first neural network to output information related to a control method; (Col. 8, Lines 53 – 55 control policy is neural network, and Col. 10, Lines 1 – 6 the control policy takes in the needed inputs “The control policy includes parameters that dictate, given certain inputs ( e.g., a designation of the task to be performed and/or data from sensors, including the robot's sensors), what action or sequence of actions the robotic system 110 should take.”)
train, without updating parameters of the simulator itself, the first neural network based on a second state of the virtual world; (Beckman (008) Col. 8, Lines 50 – 55 control policy is trained with backpropogation for optimum performance in the virtual simulation, and Col. 11, Lines 15 – 30 a previously generated simulated environment can be used to iterate on the control policy, where the previous simulation result(s) are utilized without changing any parameters of the simulator used to generate said results “At block 301, the robotic control system 220 trains the control policy based on evaluating simulated robotic performance of the task. As described above, this can involve the physics simulation engine 205 generating a simulated environment 230 that is a high-fidelity (though perhaps more simplified in terms of how certain parameters are modeled) copy of the real-world environment and system of the robotic task performance, and then running a number of trials of the robotic system 110 performing the task 100 within the simulated environment 230. Evaluations of each trial from the feedback engine 210, via a human evaluator and/or via a computerized reward functions or machine learning classifier evaluation, can guide the machine learning system 215 to generate and iterate a simulated control policy, for example using evolution strategies or reinforcement learning.”)
wherein the environment variable includes information related to a physical quantity of an object. (Beckman (008) Col. 8, Lines 50 – 55 the control policy included information on the robot arm and task being implemented, and Col. 12, Lines 56 – 62 the refined control policy reflects the weight distribution (a physical quantity) of the object being controlled (includes information related to) “As such, a weight distribution in the object 105 can be measured and this weight distribution used to model the object 105 in the revised simulated environment. The robotic control system 220 can run additional simulations in the revised simulated environment and use evaluations of these simulations to refine the policy and/or build a new policy at block 301.”)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21)

	Li (496) in view of Beckman (008) does not appear to explicitly disclose: that the simulator is a differentiable simulator.

	However, Degrave (Thesis) teaches:
perform, by inputting information input information into a differentiable simulator, a simulation with respect to a state of a virtual world. (Degrave (Thesis) Page 161 the differentiable simulator models a robot fitness “The goal is to implement a modern 3D Rigid body engine, in which parameters can be differentiated with respect to the fitness a robot achieves in a simulation, such that these parameters can be optimized with methods based on gradient descent.”, and Page 160 the fitness is with regard to control system (based on input information and an environment variable) in relation to task, where Li (496) and Beckman (008) similarly relate to a robot controlled to be perform desired tasks “To solve tasks efficiently, robots require an optimization of their control system”)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) in view of Beckman (008) with the differentiable physics simulation for interacting with a virtual robot disclosed by Degrave (Thesis).  One of ordinary skill in the art would have been motivated to make this modification in order to make the optimization process more efficient (Degrave (Thesis) Page 161)

With regard to claim 15, Li (496) teaches an information processing device comprising: at least one memory; and at least one processor configured to (Paragraph 84 - 85 computer readable medium and processor)
perform, by inputting input information and an environment variable into a simulator, a simulation with respect to a state of a virtual world, the input information being related to a first state of the virtual world and being based on an observation result of a state of a real world, (Paragraph 55 a virtual simulator is instantiated with output on how to perform a task (environment variable) and real data collected from a process (input information) “As shown, adaptation engine 122 initially receives, as input to a machine learning model that adapts a simulation of a physical process executing in a virtual environment to a physical world, simulated output for controlling how the physical process performs a task in the virtual environment and real-world data collected from the physical process performing the task in the physical world (operation 402). The simulated output may include a predicted next state of the physical process, and the real-world data may include a set of previous states of the physical process and/or a set of previous actions performed by the physical process”)
a result of the simulation including information related to a second state of the virtual world, and the second state of the virtual world being subsequent in time to the first state of the virtual world and (Li (496) Paragraph 59 the virtual world is updated again (the result of the simulation including information related to a second state of the virtual world subsequent in time) to reflect commands and changes applied in the process (after the real world has changed) “After the command is carried out by the robot, the action performed by the robot in response to the command and the current state of the robot ( e.g., the actual positions and orientations of the robot and objects) may be fed back into the machine learning model as the most recent previous action and most recent previous state of the robot, respectively. The positions and orientations of the robot and objects may thus continue to be updated based on previous actions, previous states, and output from the additional machine learning model until the task is complete (e.g., the execution of the robot in performing the task in the physical world reaches a final or satisfactory state).”)
a changed state of the virtual world being based on an observation result of a changed state of the real world that is observed after the real world has changed from the state of the real world. (Paragraph 59 the virtual world is updated again to reflect commands and changes applied in the process (after the real world has changed) “After the command is carried out by the robot, the action performed by the robot in response to the command and the current state of the robot ( e.g., the actual positions and orientations of the robot and objects) may be fed back into the machine learning model as the most recent previous action and most recent previous state of the robot, respectively. The positions and orientations of the robot and objects may thus continue to be updated based on previous actions, previous states, and output from the additional machine learning model until the task is complete ( e.g., the execution of the robot in performing the task in the physical world reaches a final or satisfactory state).”)

Li (496) does not appear to explicitly disclose: wherein the environment variable has been updated so that a difference between the result of the simulation and information related to a changed state of the virtual world is reduced; wherein the environment variable includes information related to a physical quantity an object in the real world.

	However, Beckman (008) teaches:
wherein an environment variable has been updated so that a difference between a result of the simulation and information related to a changed state of a virtual world is reduced. (Col. 11, Lines 15 – 21 the control algorithm of a robot is adjusted to perform well in simulation that matches the existing process “At block 301, the robotic control system 220 trains the control policy based on evaluating simulated robotic performance of the task. As described above, this can involve the physics simulation engine 205 generating a simulated environment 230 that is a high-fidelity (though perhaps more simplified in terms of how certain parameters are modeled) copy of the real-world environment and system of the robotic task performance,”)
wherein the environment variable includes information related to a physical quantity an object in the real world. (Beckman (008) Col. 8, Lines 50 – 55 the control policy included information on the robot arm and task being implemented, and Col. 12, Lines 56 – 62 the refined control policy reflects the weight distribution (a physical quantity) of the object being controlled (includes information related to) “As such, a weight distribution in the object 105 can be measured and this weight distribution used to model the object 105 in the revised simulated environment. The robotic control system 220 can run additional simulations in the revised simulated environment and use evaluations of these simulations to refine the policy and/or build a new policy at block 301.”) 
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21).

With regard to claim 18, Li (496) in view of Beckman (008) teaches an information processing device comprising: at least one memory; and at least one processor configured to: (Paragraph 84 - 85 computer readable medium and processor)
perform, by inputting the information related to the first state of the virtual world, the environment variable and the information related to the control method into a simulator, a simulation with respect to a state of the virtual world to obtain information related to a second state of the virtual world, a second state of the virtual world being a state after a target is controlled based on the information related to the control method, the second state of the virtual world being subsequent in time to the first state of the virtual world. (Paragraph 55 a virtual simulator is instantiated with output on how to perform a task (environment variable) and real data collected from a process (input information) “As shown, adaptation engine 122 initially receives, as input to a machine learning model that adapts a simulation of a physical process executing in a virtual environment to a physical world, simulated output for controlling how the physical process performs a task in the virtual environment and real-world data collected from the physical process performing the task in the physical world (operation 402). The simulated output may include a predicted next state of the physical process, and the real-world data may include a set of previous states of the physical process and/or a set of previous actions performed by the physical process”)

Li (496) does not appear to explicitly disclose in the same embodiment: input information related to a first state of a virtual world and an environment variable into a first neural network to output information related to a control method; wherein the environment variable includes information related to a physical quantity an object.

However, Beckman (008) teaches:
input information related to a first state of a virtual world and an environment variable into a first neural network to output information related to a control method; and (Col. 8, Lines 53 – 55 control policy is neural network, and Col. 10, Lines 1 – 6 the control policy takes in the needed inputs “The control policy includes parameters that dictate, given certain inputs ( e.g., a designation of the task to be performed and/or data from sensors, including the robot's sensors), what action or sequence of actions the robotic system 110 should take.”)
wherein the environment variable includes information related to a physical quantity of an object. (Beckman (008) Col. 8, Lines 50 – 55 the control policy included information on the robot arm and task being implemented, and Col. 12, Lines 56 – 62 the refined control policy reflects the weight distribution (a physical quantity) of the object being controlled (includes information related to) “As such, a weight distribution in the object 105 can be measured and this weight distribution used to model the object 105 in the revised simulated environment. The robotic control system 220 can run additional simulations in the revised simulated environment and use evaluations of these simulations to refine the policy and/or build a new policy at block 301.”)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21)

	Li (496) in view of Beckman (008) does not appear to explicitly disclose: that the simulator is a differentiable simulator.

	However, Degrave (Thesis) teaches:
perform, by inputting information input information into a differentiable simulator, a simulation with respect to a state of a virtual world. (Degrave (Thesis) Page 161 the differentiable simulator models a robot fitness “The goal is to implement a modern 3D Rigid body engine, in which parameters can be differentiated with respect to the fitness a robot achieves in a simulation, such that these parameters can be optimized with methods based on gradient descent.”, and Page 160 the fitness is with regard to control system (based on input information and an environment variable) in relation to task, where Li (496) and Beckman (008) similarly relate to a robot controlled to be perform desired tasks “To solve tasks efficiently, robots require an optimization of their control system”)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) in view of Beckman (008) with the differentiable physics simulation for interacting with a virtual robot disclosed by Degrave (Thesis).  One of ordinary skill in the art would have been motivated to make this modification in order to make the optimization process more efficient (Degrave (Thesis) Page 161)

With regard to claim 2, Li (496) in view of Beckman (008) and further in view of Degraves (Thesis) teaches all the elements of the parent claim 1, and further teaches:
updates the environment variable by performing backpropagation so that the difference is reduced. (Beckman (008) Col. 8, Lines 50 – 55 control policy is trained with backpropogation for optimum performance in the virtual simulation)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21).

With regard to claim 3, Li (496) in view of Beckman (008) teaches all the elements of the parent claim 1, and further teaches:
obtain an output from the differential simulator (Degrave (Thesis) Page 161 the differentiable simulator models a robot fitness “The goal is to implement a modern 3D Rigid body engine, in which parameters can be differentiated with respect to the fitness a robot achieves in a simulation, such that these parameters can be optimized with methods based on gradient descent.”, Page 160 the fitness is with regard to control system (based on input information and an environment variable) in relation to task, where Li (496) and Beckman (008) similarly relate to a robot controlled to be perform desired tasks “To solve tasks efficiently, robots require an optimization of their control system”)
input the output into a neural network to generate the result of the simulation; and (Beckman (008) Col. 8, Lines 53 – 55 control policy is neural network, and Col. 10, Lines 1 – 6 the control policy takes in the needed inputs “The control policy includes parameters that dictate, given certain inputs ( e.g., a designation of the task to be performed and/or data from sensors, including the robot's sensors), what action or sequence of actions the robotic system 110 should take.”)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) in view of Beckman (008) with the differentiable physics simulation for interacting with a virtual robot disclosed by Degrave (Thesis).  One of ordinary skill in the art would have been motivated to make this modification in order to make the optimization process more efficient (Degrave (Thesis) Page 161)

With regard to claim 4, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 1, and further teaches:
performs the simulation by inputting the input information, the environment variable, and information related to a control method in the real world, and (Li (496) Paragraph 55 a virtual simulator is instantiated with output on how to perform a task (environment variable) and real data collected from a process (input information) “As shown, adaptation engine 122 initially receives, as input to a machine learning model that adapts a simulation of a physical process executing in a virtual environment to a physical world, simulated output for controlling how the physical process performs a task in the virtual environment and real-world data collected from the physical process performing the task in the physical world (operation 402). The simulated output may include a predicted next state of the physical process, and the real-world data may include a set of previous states of the physical process and/or a set of previous actions performed by the physical process”)
into the differentiable simulator (Degrave (Thesis) Page 161 the differentiable simulator models a robot fitness “The goal is to implement a modern 3D Rigid body engine, in which parameters can be differentiated with respect to the fitness a robot achieves in a simulation, such that these parameters can be optimized with methods based on gradient descent.”, and Page 160 the fitness is with regard to control system (based on input information and an environment variable) in relation to task, where Li (496) and Beckman (008) similarly relate to a robot controlled to be perform desired tasks “To solve tasks efficiently, robots require an optimization of their control system”)
wherein the information related to the changed state of the virtual world being based on the observation result of the changed state of the real world that is observed after the real world has changed from the state of the real world (Li (496) Paragraph 59 the virtual world is updated again (the result of the simulation including information related to a second state of the virtual world subsequent in time) to reflect commands and changes applied in the process (after the real world has changed) “After the command is carried out by the robot, the action performed by the robot in response to the command and the current state of the robot ( e.g., the actual positions and orientations of the robot and objects) may be fed back into the machine learning model as the most recent previous action and most recent previous state of the robot, respectively. The positions and orientations of the robot and objects may thus continue to be updated based on previous actions, previous states, and output from the additional machine learning model until the task is complete (e.g., the execution of the robot in performing the task in the physical world reaches a final or satisfactory state).”)
by controlling, based on the control method, a robot in the real world. (Li (496) Paragraph 59 the virtual world is updated again to reflect commands and changes applied in the process (after the real world has changed) based on actual responses of a robot (a robot in the real world) “After the command is carried out by the robot, the action performed by the robot in response to the command and the current state of the robot ( e.g., the actual positions and orientations of the robot and objects) may be fed back into the machine learning model as the most recent previous action and most recent previous state of the robot, respectively. The positions and orientations of the robot and objects may thus continue to be updated based on previous actions, previous states, and output from the additional machine learning model until the task is complete ( e.g., the execution of the robot in performing the task in the physical world reaches a final or satisfactory state).”
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) in view of Beckman (008) with the differentiable physics simulation for interacting with a virtual robot disclosed by Degrave (Thesis).  One of ordinary skill in the art would have been motivated to make this modification in order to make the optimization process more efficient (Degrave (Thesis) Page 161)

With regard to claim 5, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 4, and further teaches:
input the input information and the environment variable into a neural network to obtain the information related to the control method in the real world. (see Claim Rejections - 35 USC § 112) (Beckman (008) Col. 8, Lines 50 – 55 control policy is trained with backpropogation for optimum performance in the virtual simulation)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21)

With regard to claim 6, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 5, and further teaches:
train the neural network based on the result of the simulation. (see Claim Rejections - 35 USC § 112) (Beckman (008) Col. 8, Lines 50 – 55 control policy is trained with backpropogation for optimum performance in the virtual simulation)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21)

With regard to claim 11, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 10, and further teaches:
wherein the at least one processor calculates a reward based on the information related to the second state of the virtual world, and train the first neural network based on the reward without updating parameters of the simulator itself. (Beckman (008) Col. 8, Lines 50 – 55 control policy is trained with backpropogation for optimum performance in the virtual simulation, and Col. 11, Lines 15 – 30 a previously generated simulated environment can be used to iterate on the control policy, where the previous simulation result(s) are utilized without changing any parameters of the simulator used to generate said results “At block 301, the robotic control system 220 trains the control policy based on evaluating simulated robotic performance of the task. As described above, this can involve the physics simulation engine 205 generating a simulated environment 230 that is a high-fidelity (though perhaps more simplified in terms of how certain parameters are modeled) copy of the real-world environment and system of the robotic task performance, and then running a number of trials of the robotic system 110 performing the task 100 within the simulated environment 230. Evaluations of each trial from the feedback engine 210, via a human evaluator and/or via a computerized reward functions or machine learning classifier evaluation, can guide the machine learning system 215 to generate and iterate a simulated control policy, for example using evolution strategies or reinforcement learning.”)

With regard to claim 16, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 15, and further teaches: wherein the at least one processor is further configured to:
obtain an output from the differential simulator (Degrave (Thesis) Page 161 the differentiable simulator models a robot fitness “The goal is to implement a modern 3D Rigid body engine, in which parameters can be differentiated with respect to the fitness a robot achieves in a simulation, such that these parameters can be optimized with methods based on gradient descent.”, Page 160 the fitness is with regard to control system (based on input information and an environment variable) in relation to task, where Li (496) and Beckman (008) similarly relate to a robot controlled to be perform desired tasks “To solve tasks efficiently, robots require an optimization of their control system”)
input the output into a neural network, and wherein the neural network has been trained so that the difference is reduced. (Beckman (008) Col. 8, Lines 50 – 55 control policy is a neural network trained with backpropogation for optimum performance in the virtual simulation, and Col. 11, Lines 15 – 30 a previously generated simulated environment can be used to iterate on the control policy, where the previous simulation result(s) are utilized without changing any parameters of the simulator used to generate said results “At block 301, the robotic control system 220 trains the control policy based on evaluating simulated robotic performance of the task. As described above, this can involve the physics simulation engine 205 generating a simulated environment 230 that is a high-fidelity (though perhaps more simplified in terms of how certain parameters are modeled) copy of the real-world environment and system of the robotic task performance, and then running a number of trials of the robotic system 110 performing the task 100 within the simulated environment 230. Evaluations of each trial from the feedback engine 210, via a human evaluator and/or via a computerized reward functions or machine learning classifier evaluation, can guide the machine learning system 215 to generate and iterate a simulated control policy, for example using evolution strategies or reinforcement learning.”)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) in view of Beckman (008) with the differentiable physics simulation for interacting with a virtual robot disclosed by Degrave (Thesis).  One of ordinary skill in the art would have been motivated to make this modification in order to make the optimization process more efficient (Degrave (Thesis) Page 161)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21).

With regard to claim 17, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 15, and further teaches:
wherein the at least one processor is configured to perform the simulation by inputting the input information, the environment variable, and information related to a control method. (Beckman (008) Col. 8, Lines 53 – 55 control policy is neural network, and Col. 10, Lines 1 – 6 the control policy takes in the needed inputs “The control policy includes parameters that dictate, given certain inputs ( e.g., a designation of the task to be performed and/or data from sensors, including the robot's sensors), what action or sequence of actions the robotic system 110 should take.”
into the differentiable simulator (Degrave (Thesis) Page 161 the differentiable simulator models a robot fitness)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) in view of Beckman (008) with the differentiable physics simulation for interacting with a virtual robot disclosed by Degrave (Thesis).  One of ordinary skill in the art would have been motivated to make this modification in order to make the optimization process more efficient (Degrave (Thesis) Page 161)

With regard to claim 19, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 18, and further teaches:
a control device comprising: at least one memory; and at least one processor configured to: (Li (496) Paragraph 84 - 85 computer readable medium and processor)
transmit information related to an observation result of a real world to the information processing device as claimed in claim 18; (Li (496) Abstract real sensor data is received “real-world data collected from the physical process performing the task in the physical world.”)
receive the information related to the control method from the information processing device; and control an object in the real world based on the information related to the control method. (Li (496) Abstract a simulated output is used to control a physical object “The technique further includes transmitting the augmented output to the physical process to control how the physical process performs the task in the physical world.”)

With regard to claim 20, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 19, and further teaches:
the control device as claimed in claim 19; (Li (496) Paragraph 84 – 85 teaches the control device in claim 19)
a sensor device configured to acquire the observation result of the real world, the sensor device including at least one of a camera or a sensor; (Li (496) Paragraph 55 data is collected, which would be from a sensor “real-world data collected from the physical process performing the task in the physical world (operation 402)”, and Paragraph 50 “The real-world images may be captured by cameras located on assemblies for carrying out the physical process, such as cameras on robots that perform grasping and/or assembly tasks by interacting with the objects.”)
a drive device configured to perform drive in the real world,; wherein the drive device is operated based on the information related to the control method that is obtained by the control device. (Li (496) Paragraph 3 a car can be commanded “self-driving car may utilize computer vision, control systems, and/or artificial intelligence to drive along a route between two points while avoiding obstacles and obeying traffic signals and signs without requiring human input”)
the drive device including at least one of an actuator or a motor (Beckman (008) Col. 10 Top “The robotic system 110 can be a robot having a number of linkages coupled by a number of joints (motorized or passive) and one or more end effectors configured to interact with the robot's environment.”)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with motorized robot disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to describe a desired robot performance (Beckman (008) Col. 11, Lines 15 – 21)

With regard to claim 21 and 23, Li (496) in view of Beckman (008), and further in view of Degrave (Theiss) teaches all the elements of the parent claim 1 and 10, and further teaches: wherein the environment variable includes information of at least one of a weight or a size of the object. (Beckman (008) Col. 10, Top the control policy includes task to be performed including data from sensor “The control policy includes parameters that dictate, given certain inputs ( e.g., a designation of the task to be performed and/or data from sensors, including the robot's sensors), what action or sequence of actions the robotic system 110 should take. These parameters can be learned by the machine learning system 215 as described herein.”, and Col. 12, Bottom the weight distribution of the object related to the task can be inputted in to the simulator as a task relevant parameter (a weight of the object) “As such, a weight distribution in the object 105 can be measured and this weight distribution used to model the object 105 in the revised simulated environment. The robotic control system 220 can run additional simulations in the revised simulated environment and use evaluations of these simulations to refine the policy and/or build a new policy at block 301.”)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21)

With regard to claim 22, Li (496) in view of Beckman (008), and further in view of Degrave (Theiss) teaches all the elements of the parent claim 5, and further teaches: wherein the control method is a method for controlling a robot in the real world. (Li (496) Paragraph 59 the virtual world is updated again to reflect commands and changes applied in the process (after the real world has changed) “After the command is carried out by the robot, the action performed by the robot in response to the command and the current state of the robot ( e.g., the actual positions and orientations of the robot and objects) may be fed back into the machine learning model as the most recent previous action and most recent previous state of the robot, respectively. The positions and orientations of the robot and objects may thus continue to be updated based on previous actions, previous states, and output from the additional machine learning model until the task is complete (e.g., the execution of the robot in performing the task in the physical world reaches a final or satisfactory state).”)

With regard to claim 24, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 20, and further teaches: wherein the device is a robot (Li (496) Paragraph 59 the virtual world is updated again to reflect commands and changes applied in the process (after the real world has changed) “After the command is carried out by the robot, the action performed by the robot in response to the command and the current state of the robot ( e.g., the actual positions and orientations of the robot and objects) may be fed back into the machine learning model as the most recent previous action and most recent previous state of the robot, respectively. The positions and orientations of the robot and objects may thus continue to be updated based on previous actions, previous states, and output from the additional machine learning model until the task is complete (e.g., the execution of the robot in performing the task in the physical world reaches a final or satisfactory state).”).

	With regard to claim 25, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 1, and further teaches:
wherein the at least one processor is configured to update, without updating parameters of the differentiable simulator itself, the environment variable so that the difference is reduced (Beckman (008) Col. 8, Lines 50 – 55 control policy is trained with backpropogation for optimum performance in the virtual simulation, and Col. 11, Lines 15 – 30 a previously generated simulated environment can be used to iterate on the control policy, where the previous simulation result(s) are utilized without changing any parameters of the simulator used to generate said results “At block 301, the robotic control system 220 trains the control policy based on evaluating simulated robotic performance of the task. As described above, this can involve the physics simulation engine 205 generating a simulated environment 230 that is a high-fidelity (though perhaps more simplified in terms of how certain parameters are modeled) copy of the real-world environment and system of the robotic task performance, and then running a number of trials of the robotic system 110 performing the task 100 within the simulated environment 230. Evaluations of each trial from the feedback engine 210, via a human evaluator and/or via a computerized reward functions or machine learning classifier evaluation, can guide the machine learning system 215 to generate and iterate a simulated control policy, for example using evolution strategies or reinforcement learning.”)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21).

With regard to claim 26, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 3, and further teaches:
wherein the at least one processor is configured to train the neural network so that the difference is reduced (Beckman (008) Col. 11, Lines 15 – 21 the control algorithm of a robot is adjusted to perform well in simulation that matches the existing process (so that the difference is reduced), and Col. 8, Lines 50 – 55 control policy is a neural network trained with backpropogation for optimum performance in the virtual simulation.”).
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) with the with updating and evaluating real world performance of robot control policy disclosed by Beckman (008).  One of ordinary skill in the art would have been motivated to make this modification in order to improve robot performance (Beckman (008) Col. 11, Lines 15 – 21).

With regard to claim 28 and 31, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 1 and 15, and further teaches: wherein the input information includes data acquired by a sensor device in the real world. (Beckman (008) Col. 3, Lines 15 – 19 “the environment in which the objects and robot exist, sensors that furnish inputs ( e.g., cameras, microphones, radar, lidar, joint-position sensors, strain gauges, barometers, airspeed sensors, thermometers, and hygrometers),”)

With regard to claim 29 and 32, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 1 and 15, and further teaches:
wherein the result of the simulation includes image data. (Li (496) Paragraph 34 “The trained machine learning models 208 may then be used to produce, from simulated images 206 of objects, augmented images 220 that mimic real-world images of the same objects.”, and Figure 2A 
    PNG
    media_image1.png
    418
    636
    media_image1.png
    Greyscale
)

	With regard to claim 30, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 4, and further teaches:
wherein the result of the simulation includes information related to a position and an angle of an arm of the robot. (Li (496) Paragraph 59 the virtual world is updated again to reflect commands and changes applied in the process (after the real world has changed) “After the command is carried out by the robot, the action performed by the robot in response to the command and the current state of the robot ( e.g., the actual positions and orientations of the robot and objects) may be fed back into the machine learning model as the most recent previous action and most recent previous state of the robot, respectively. The positions and orientations of the robot and objects may thus continue to be updated based on previous actions, previous states, and output from the additional machine learning model until the task is complete (e.g., the execution of the robot in performing the task in the physical world reaches a final or satisfactory state).”)

Claim 13 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Li (496) in view of Beckman (008), and further in view of Degrave (Thesis), and further in view of Popov et al. “Data-efficient Deep Reinforcement Learning for Dexterous Manipulation” (henceforth “Popov”).  Li (496) and Beckman (008) and Degrave (Thesis) and Popov are analogous art because they solve the same problem of simulating a virtual world with changes, and because they are from the same field of simulating virtual worlds.

With regard to claim 13 and 27, Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) teaches all the elements of the parent claim 10 and 18, and further teaches:
obtain an output from the differential simulator (Degrave (Thesis) Page 161 the differentiable simulator models a robot fitness “The goal is to implement a modern 3D Rigid body engine, in which parameters can be differentiated with respect to the fitness a robot achieves in a simulation, such that these parameters can be optimized with methods based on gradient descent.”, Page 160 the fitness is with regard to control system (based on input information and an environment variable) in relation to task, where Li (496) and Beckman (008) similarly relate to a robot controlled to be perform desired tasks “To solve tasks efficiently, robots require an optimization of their control system”)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) in view of Beckman (008) with the differentiable physics simulation for interacting with a virtual robot disclosed by Degrave (Thesis).  One of ordinary skill in the art would have been motivated to make this modification in order to make the optimization process more efficient (Degrave (Thesis) Page 161)

Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) does not appear to explicitly disclose: wherein the at least one processor input the output into a second neural network to generate the information related to the second state of the virtual world.

	However, Popov teaches:
inputs an result from a world into a second neural network to generate information related to a second state of a virtual world (Page 3, Left value network in a reinforcement learning agent can be a neural network “One recent work [5], closely related to the ideas followed in this paper, provides a proof of concept demonstration that value-based methods using neural network approximators can be used for robotic manipulation in the real world”, and Figure 1 
    PNG
    media_image2.png
    121
    107
    media_image2.png
    Greyscale
)
It would have been obvious to one of ordinary skill in the art to combine the synchronizing a virtual world simulator of robot to a physical process disclosed by Li (496) in view of Beckman (008), and further in view of Degrave (Thesis) with the network in reinforcement learning disclosed by Popov.  One of ordinary skill in the art would have been motivated to make this modification in order to approximate the value function (Popov Page 3, Left).

Examiner General Comments
With regard to the prior art rejection(s), any cited portion of the relied upon reference(s), either to specific areas or as direct language, is intended to be interpreted in the context of the reference(s) as a whole, as would be understood by one of ordinary skill in the art.  Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner. The entire reference is considered to provide disclosure relating to the claimed invention. The claims & only the claims form the metes & bounds of the invention. Office personnel are to give the claims their broadest reasonable interpretation in light of the supporting disclosure. Unclaimed limitations appearing in the specification are not read into the claim. Prior art was referenced using terminology familiar to one of ordinary skill in the art. Such an approach is broad in concept and can be either explicit or implicit in meaning. Examiner's Notes are provided with the cited references to assist the applicant to better understand how the examiner interprets the applied prior art. Such comments are entirely consistent with the intent and spirit of compact prosecution.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALFRED H. WECHSELBERGER whose telephone number is (571)272-8988. The examiner can normally be reached M - F, 10am to 6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emerson Puente can be reached at 571-272-3652. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ALFRED H. WECHSELBERGER/ExaminerArt Unit 2187



/EMERSON C PUENTE/Supervisory Patent Examiner, Art Unit 2187
Read full office action
Prosecution Timeline

Aug 30, 2021
Application Filed
Mar 22, 2025
Non-Final Rejection — §101, §103, §112
May 27, 2025
Response Filed
Sep 06, 2025
Final Rejection — §101, §103, §112
Dec 01, 2025
Request for Continued Examination
Dec 08, 2025
Response after Non-Final Action
Mar 04, 2026
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/718,605
Patent 12561501
SYSTEM AND METHOD FOR EXCESS GAS UTILIZATION
2y 5m to grant Granted Feb 24, 2026
17/378,118
Patent 12517804
GENERATING TECHNOLOGY ENVIRONMENTS FOR A SOFTWARE APPLICATION
2y 5m to grant Granted Jan 06, 2026
17/385,261
Patent 12468581
INTER-KERNEL DATAFLOW ANALYSIS AND DEADLOCK DETECTION
2y 5m to grant Granted Nov 11, 2025
17/182,538
Patent 12462075
RESOURCE PREDICTION SYSTEM FOR EXECUTING MACHINE LEARNING MODELS
2y 5m to grant Granted Nov 04, 2025
17/324,718
Patent 12450145
ADVANCED SIMULATION MANAGEMENT TOOL FOR A MEDICAL RECORDS SYSTEM
2y 5m to grant Granted Oct 21, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
58%
Grant Probability
94%
With Interview (+36.5%)
3y 8m
Median Time to Grant
High
PTA Risk
Based on 212 resolved cases by this examiner. Grant probability derived from career allow rate.