DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This Office Action is in response to the application filed on October 24th, 2025. Claims 1-20 are presently pending and are presented for examination.
Response to Amendment
In response to the Applicant’s response filed October 24th, 2025, Examiner withdraws the previous claim objections; maintains the previous claim interpretation; and withdraws the previous 35 U.S.C. 102 prior art rejections.
Response to Arguments
Applicant’s arguments filed October 24th, 2025, with respect to the rejection(s) of the claim(s) have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of US-20240189994 (hereinafter, “PG”).
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-20 are rejected under 35 U.S.C. 102(a)(2) as anticipated by US-20240189994 (hereinafter, “PG”).
Regarding claim 1 PG discloses one or more processors (see at least [0141]; “more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by data processing apparatus, cause the apparatus to perform the operations or actions”), comprising: circuitry to;
use one or more images of a simulated environment to generate one or more actions to be performed by an autonomous device (see at least [0049]; “the agent may be an autonomous or semiautonomous land, air, or sea vehicle navigating through the environment to a specified destination in the environment”) to perform one or more tasks (see at least [0064]; “At each of the plurality of time steps, the policy system 200 obtains an observation image 206 characterizing a state of the environment at the time step. In the example of FIG. 2, the agent performs a single action in response to each observation image 206,” the first observation image used corresponds to applicant’s image); and
train one or more transformer neural networks to control the autonomous device to perform the one or more tasks based, at least in part, on one or more other images of the autonomous device in the simulated environment performing the one or more tasks using the one or more actions (see at least [0044]; “After having generated the sequence of input tokens 132, the policy system 100 then processes the sequence of input tokens 132 using a Transformer neural network 140 to generate a policy output 142 that defines an action to be performed by the agent 102 in response to the observation image 106 received at the time step,” and [0056-0057]; “the policy system 100 can be used to control the interactions of the agent with a simulated environment, and the policy system 100 (or another training system) can train the set of neural networks used to control the agent 102 based on the interactions of the agent 102 (or another agent) with the simulated environment to determine trained values of the parameters of the set of neural networks. Training the set of neural networks will be described in more detail below with reference to FIGS. 5-6. After the set of neural networks are trained based on the interactions of the agent 102 (or another agent) with a simulated environment, the trained neural networks can be used by the policy system 100 to control the interactions of a real-world agent with the real-world environment, i.e., to control the agent that was being simulated in the simulated environment,” the interactions to complete the task correspond to applicant’s actions, and [0064]; “a new observation image is obtained by the policy system 200 after each action that the agent performs,” the new observation image corresponds to Applicant’s other image, each new image would be used by the neural network to generate a policy output based on said image).
Regarding claim 2 PG discloses all of the limitations of claim 1. Additionally, PG discloses wherein the one or more actions are generated by a task and motion planning module (see at least [0064]; “At each of the plurality of time steps, the policy system 200 obtains an observation image 206 characterizing a state of the environment at the time step. In the example of FIG. 2, the agent performs a single action in response to each observation image 206,” the policy system corresponds to the task and motion planning module).
Regarding claim 3 PG discloses all of the limitations of claim 2. Additionally, PG discloses wherein the task and motion planning module is to access an initial state of an environment (see at least Fig. 4, step 402 – Obtain an observation image, the observation image corresponds to an initial state of an environment, and [0032]; “the policy system 100 obtains an observation image 106 characterizing a state of the environment 104 at the time step.”).
Regarding claim 4 PG discloses all of the limitations of claim 2. Additionally, PG discloses wherein the task and motion planning module is to access an initial state of the autonomous device (see at least Fig. 4, step 402 – Obtain an observation image, the observation image corresponds to an initial state of an environment, and [0035]; “While this specification generally describes that the observations are images, in some cases the observations can include additional data in addition to image, e.g., proprioceptive data characterizing the agent or other data captured by other sensor of the agent. In these cases, the other data can be encoded jointly with the observation image 106 by the image encoder neural network 120.”).
Regarding claim 5 PG discloses all of the limitations of claim 1. Additionally, PG discloses wherein the one or more other images of the autonomous device in the simulated environment performing the one or more tasks are determined using an image sensor (see at least [0027]; “the observation images 106 can be images captured by a camera sensor of the agent 102 or by a camera sensor located in the environment 104. The camera sensor can for example be a still camera or a video camera).
Regarding claim 6 PG discloses all of the limitations of claim 1. Additionally, PG discloses wherein the one or more neural networks are trained using the one or more images (see at least [0044]; “After having generated the sequence of input tokens 132, the policy system 100 then processes the sequence of input tokens 132 using a Transformer neural network 140 to generate a policy output 142 that defines an action to be performed by the agent 102 in response to the observation image 106 received at the time step,”) and one or more simulations of the performance of the one or more tasks (see at least [0054]; “Generally, when the environment 104 is a simulated environment, the actions 144 may include simulated versions of one or more of the previously described actions or types of actions.”).
Regarding claim 7 PG discloses all of the limitations of claim 1. Additionally, PG discloses wherein the circuitry is to use the one or more neural networks to identify one or more control inputs of the autonomous device to control the autonomous device to perform the one or more tasks (see at least [0113-0114]; “The system selects an action to be performed by the agent using the policy output (step 410). This selection can be made by selecting a respective value for one or more of the plurality of action dimensions using the respective categorical distributions that are defined by the policy output of the Transformer neural network. The system causes the agent to perform the selected action (step 412), e.g., by directly submitting the control input to the agent or by transmitting instructions or other data, e.g., over a data communication network, to a control system for the agent that will cause the agent to perform the selected action.”).
Regarding claim 8 PG discloses a system (see at least Fig. 1) comprising: one or more processors (see at least [0141]; “more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by data processing apparatus, cause the apparatus to perform the operations or actions”) to:
use one or more images of a simulated environment to generate one or more actions to be performed by an autonomous device (see at least [0049]; “the agent may be an autonomous or semiautonomous land, air, or sea vehicle navigating through the environment to a specified destination in the environment”) to perform one or more tasks (see at least [0064]; “At each of the plurality of time steps, the policy system 200 obtains an observation image 206 characterizing a state of the environment at the time step. In the example of FIG. 2, the agent performs a single action in response to each observation image 206,” the first observation image used corresponds to applicant’s image); and
train one or more transformer neural networks to control the autonomous device (see at least [0023]; “The physical process may include an industrial robot, three-dimensional (3D) printer, machine tool, self-driving car, and/or another type of automated technology”) to perform the one or more tasks based, at least in part, on one or more other images of the autonomous device in the simulated environment performing the one or more tasks using the one or more actions (see at least [0044]; “After having generated the sequence of input tokens 132, the policy system 100 then processes the sequence of input tokens 132 using a Transformer neural network 140 to generate a policy output 142 that defines an action to be performed by the agent 102 in response to the observation image 106 received at the time step,” and [0056-0057]; “the policy system 100 can be used to control the interactions of the agent with a simulated environment, and the policy system 100 (or another training system) can train the set of neural networks used to control the agent 102 based on the interactions of the agent 102 (or another agent) with the simulated environment to determine trained values of the parameters of the set of neural networks. Training the set of neural networks will be described in more detail below with reference to FIGS. 5-6. After the set of neural networks are trained based on the interactions of the agent 102 (or another agent) with a simulated environment, the trained neural networks can be used by the policy system 100 to control the interactions of a real-world agent with the real-world environment, i.e., to control the agent that was being simulated in the simulated environment,” the interactions to complete the task correspond to applicant’s actions, and [0064]; “a new observation image is obtained by the policy system 200 after each action that the agent performs,” the new observation image corresponds to Applicant’s other image, each new image would be used by the neural network to generate a policy output based on said image).
Regarding claim 9 PG discloses all of the limitations of claim 8. Additionally, PG discloses wherein the one or more actions are generated by a task and motion planning module (see at least [0064]; “At each of the plurality of time steps, the policy system 200 obtains an observation image 206 characterizing a state of the environment at the time step. In the example of FIG. 2, the agent performs a single action in response to each observation image 206,” the policy system corresponds to the task and motion planning module).
Regarding claim 10 PG discloses all of the limitations of claim 9. Additionally, PG discloses wherein the task and motion planning module is to access an initial state of an environment (see at least Fig. 4, step 402 – Obtain an observation image, the observation image corresponds to an initial state of an environment, and [0032]; “the policy system 100 obtains an observation image 106 characterizing a state of the environment 104 at the time step.”).
Regarding claim 11 PG discloses all of the limitations of claim 9. Additionally, PG discloses wherein the task and motion planning module is to access an initial state of the autonomous device (see at least Fig. 4, step 402 – Obtain an observation image, the observation image corresponds to an initial state of an environment, and [0035]; “While this specification generally describes that the observations are images, in some cases the observations can include additional data in addition to image, e.g., proprioceptive data characterizing the agent or other data captured by other sensor of the agent. In these cases, the other data can be encoded jointly with the observation image 106 by the image encoder neural network 120.”).
Regarding claim 12 PG discloses all of the limitations of claim 8. Additionally, PG discloses wherein the one or more images of the autonomous device in the simulated environment performing the one or more tasks are determined using an image sensor (see at least [0027]; “the observation images 106 can be images captured by a camera sensor of the agent 102 or by a camera sensor located in the environment 104. The camera sensor can for example be a still camera or a video camera).
Regarding claim 13 PG discloses all of the limitations of claim 8. Additionally, PG discloses wherein the one or more neural networks are trained using the one or more images (see at least [0044]; “After having generated the sequence of input tokens 132, the policy system 100 then processes the sequence of input tokens 132 using a Transformer neural network 140 to generate a policy output 142 that defines an action to be performed by the agent 102 in response to the observation image 106 received at the time step,”) and the one or more simulations of the performance of the one or more tasks (see at least [0054]; “Generally, when the environment 104 is a simulated environment, the actions 144 may include simulated versions of one or more of the previously described actions or types of actions.”).
Regarding claim 14 PG discloses all of the limitations of claim 8. Additionally, PG discloses wherein the one or more processors are to use the one or more neural networks to identify one or more control inputs of the autonomous device to control the autonomous device to perform the one or more tasks (see at least [0113-0114]; “The system selects an action to be performed by the agent using the policy output (step 410). This selection can be made by selecting a respective value for one or more of the plurality of action dimensions using the respective categorical distributions that are defined by the policy output of the Transformer neural network. The system causes the agent to perform the selected action (step 412), e.g., by directly submitting the control input to the agent or by transmitting instructions or other data, e.g., over a data communication network, to a control system for the agent that will cause the agent to perform the selected action.”).
Regarding claim 15 PG discloses a method (see at least Fig. 4) comprising: using one or more images of a simulated environment to generate one or more actions to be performed by an autonomous device (see at least [0049]; “the agent may be an autonomous or semiautonomous land, air, or sea vehicle navigating through the environment to a specified destination in the environment”) to perform one or more tasks (see at least [0064]; “At each of the plurality of time steps, the policy system 200 obtains an observation image 206 characterizing a state of the environment at the time step. In the example of FIG. 2, the agent performs a single action in response to each observation image 206,” the first observation image used corresponds to applicant’s image); and
training one or more transformer neural networks to control the autonomous device to perform the one or more tasks based, at least in part, on one or more other images of the autonomous device in the simulated environment performing the one or more tasks using the one or more actions (see at least [0044]; “After having generated the sequence of input tokens 132, the policy system 100 then processes the sequence of input tokens 132 using a Transformer neural network 140 to generate a policy output 142 that defines an action to be performed by the agent 102 in response to the observation image 106 received at the time step,” and [0056-0057]; “the policy system 100 can be used to control the interactions of the agent with a simulated environment, and the policy system 100 (or another training system) can train the set of neural networks used to control the agent 102 based on the interactions of the agent 102 (or another agent) with the simulated environment to determine trained values of the parameters of the set of neural networks. Training the set of neural networks will be described in more detail below with reference to FIGS. 5-6. After the set of neural networks are trained based on the interactions of the agent 102 (or another agent) with a simulated environment, the trained neural networks can be used by the policy system 100 to control the interactions of a real-world agent with the real-world environment, i.e., to control the agent that was being simulated in the simulated environment,” the interactions to complete the task correspond to applicant’s actions, and [0064]; “a new observation image is obtained by the policy system 200 after each action that the agent performs,” the new observation image corresponds to Applicant’s other image, each new image would be used by the neural network to generate a policy output based on said image).
Regarding claim 16 PG discloses all of the limitations of claim 15. Additionally, PG discloses wherein the one or more actions are generated by a task and motion planning module (see at least [0064]; “At each of the plurality of time steps, the policy system 200 obtains an observation image 206 characterizing a state of the environment at the time step. In the example of FIG. 2, the agent performs a single action in response to each observation image 206,” the policy system corresponds to the task and motion planning module).
Regarding claim 17 PG discloses all of the limitations of claim 16. Additionally, PG discloses wherein the task and motion planning module is to access an initial state of an environment (see at least Fig. 4, step 402 – Obtain an observation image, the observation image corresponds to an initial state of an environment, and [0032]; “the policy system 100 obtains an observation image 106 characterizing a state of the environment 104 at the time step.”).
Regarding claim 18 PG discloses all of the limitations of claim 16. Additionally, PG discloses wherein the task and motion planning module is to access an initial state of the autonomous device (see at least Fig. 4, step 402 – Obtain an observation image, the observation image corresponds to an initial state of an environment, and [0035]; “While this specification generally describes that the observations are images, in some cases the observations can include additional data in addition to image, e.g., proprioceptive data characterizing the agent or other data captured by other sensor of the agent. In these cases, the other data can be encoded jointly with the observation image 106 by the image encoder neural network 120.”).
Regarding claim 19 PG discloses all of the limitations of claim 15. Additionally, PG discloses wherein the one or more other images of the autonomous device in the simulated environment performing the one or more tasks are determined using an image sensor (see at least [0027]; “the observation images 106 can be images captured by a camera sensor of the agent 102 or by a camera sensor located in the environment 104. The camera sensor can for example be a still camera or a video camera).
Regarding claim 20 PG discloses all of the limitations of claim 15. Additionally, PG discloses wherein the one or more neural networks are trained using the one or more images (see at least [0044]; “After having generated the sequence of input tokens 132, the policy system 100 then processes the sequence of input tokens 132 using a Transformer neural network 140 to generate a policy output 142 that defines an action to be performed by the agent 102 in response to the observation image 106 received at the time step,”) and one or more simulations of the performance of the one or more tasks (see at least [0054]; “Generally, when the environment 104 is a simulated environment, the actions 144 may include simulated versions of one or more of the previously described actions or types of actions.”).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ASHLEIGH NICOLE TURNBAUGH whose telephone number is (703)756-1982. The examiner can normally be reached Monday - Friday 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Helal Algahaim can be reached at (571) 270-5227. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ASHLEIGH NICOLE TURNBAUGH/Examiner, Art Unit 3666
/HELAL A ALGAHAIM/SPE , Art Unit 3666