Last updated: May 04, 2026
Application No. 18/610,285
LEARNING ROBUST LEGGED ROBOT LOCOMOTION WITH IMPLICIT TERRAIN IMAGINATION VIA DEEP REINFORCEMENT LEARNING

Non-Final OA §102§103
Filed
Mar 20, 2024
Priority
Jul 10, 2023 — RE 10-2023-0089039
Examiner
WATTS III, JAMES MILLER
Art Unit
3657
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
UROBOTICS CORP.
OA Round
1 (Non-Final)
Interview Optional

— +18.7% interview lift. Examiner has a relatively high allowance rate (72%); +18.7% interview lift. A written response may suffice.
Based on 43 resolved cases, 2023–2026
Examiner Intelligence

WATTS III, JAMES MILLER View full profile →
Grants 72% — above average
Career Allowance Rate
31 granted / 43 resolved
+20.1% vs TC avg
Strong +19% interview lift
Without
With
+18.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
22 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
9.6%
-30.4% vs TC avg
§103
54.7%
+14.7% vs TC avg
§102
17.0%
-23.0% vs TC avg
§112
18.5%
-21.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 43 resolved cases
Office Action

§102 §103
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 11 and 18 are objected to because of the following informalities: Claims 11 and 18 appear to contain typographical errors in the final clause of each claim, and should be amended to “…and encodes the proprioceptive observation into the body velocity and the latent state through the encoder.”  Appropriate correction is required.

Drawings
The drawings are objected to. Figure 5 appears to contain a typographical error. The legend entry for “Next partial observations” is currently shown as “ot-1“, but it appears that it should be “ot+1”. The proposed change to Fig. 5 is shown in the markup below.

    PNG
    media_image1.png
    214
    472
    media_image1.png
    Greyscale

Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-4, 6-7, 10-11, and 14-16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ji (G. Ji, J. Mun, H. Kim and J. Hwangbo, "Concurrent Training of a Control Policy and a State Estimator for Dynamic and Robust Legged Locomotion," in IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4630-4637, April 2022).

Claim 1
Ji teaches
A legged robot control method performed by a computer device, 
wherein the computer device comprises at least one processor configured to execute computer-readable instructions included in a memory, 
(Ji - [p.4635, col 2, Computational Cost] … Using a single core of Ryzen9 5950x, the estimator network takes 7 μs for a forward pass, …)
EXAMINER NOTE: The use of a Ryzen9 processor indicates a computing device executing computer-readable instructions included in a memory, as the Ryzen9 5950x is a processor designed for desktop computers, which include a memory.
the legged robot control method comprises inferring, by the at least one processor, 
an action of a quadrupedal robot from proprioception through a deep reinforcement learning-legged robot model, 
(Ji - [p.4632, col 2, Network Architecture] Our neural network structure consists of 3 components: an actor, a critic, and an estimator. All of them are designed as a Multi-Layer Perceptron (MLP) network, … The actor maps an observation to an action and the critic [20] estimates the value of the current state. The estimator network is to estimate states of the robot such as the base linear velocity. Those values are estimated by taking an observation ot as an input, and fed to the actor.)
and a locomotion policy that implicitly infers properties of terrains through which the quadrupedal robot moves is learned in the legged robot model.  
(Ji - [Abstract] … The trained policy and state estimator are capable of traversing diverse terrains such as a hill, slippery plate, and bumpy road.
[p.4630, col 2, ln 20 thru p.4631, col 1, ln 11] In addition, to walk and run on challenging terrains blind, information about the terrain must be estimated. … In our proposed approach, this information is indirectly estimated as a distance from the terrain to the foot. …  To address the aforementioned shortcomings of the existing methods, we present a learning-based state estimation network, which is concurrently trained with the policy network.)
Claim 2
Ji teaches the limitations of claim 1 as outlined above. Ji further teaches
wherein a locomotion policy that enables a blind locomotion of the quadrupedal robot using an asymmetric actor-critic architecture is learned in the legged robot model.  
(Ji - [p.4632, col 2, Network Architecture] Our neural network structure consists of 3 components: an actor, a critic, and an estimator. All of them are designed as a Multi-Layer Perceptron (MLP) network, … The actor maps an observation to an action and the critic [20] estimates the value of the current state. The estimator network is to estimate states of the robot such as the base linear velocity. Those values are estimated by taking an observation ot as an input, and fed to the actor.
[p.4633, col 1, ln 13-14] - Our system takes sensor data as an input, and outputs desired joint positions for each actuator.
[p. 4633, col 1, ln 20-27] - The observation tuple is defined as

    PNG
    media_image2.png
    38
    543
    media_image2.png
    Greyscale

where φ and ω are the base orientation and angular velocity, q
and q˙ are the joint positions and velocities, qdest−1 and qdest−2 are the desired joint position targets for two previous time steps, Qhist and Q'hist are the joint position error history and joint velocity history, bpf is the Cartesian positions of the feet relative to the center of mass expressed in the body frame, and cmd is the given velocity command.)
EXAMINER NOTE: The observations do not include visual input, so the locomotion is therefore "blind"



Claim 3
Ji teaches the limitations of claim 1 as outlined above. Ji further teaches
wherein a context-aided estimator that estimates surrounding environmental information during a learning process of the locomotion policy is jointly learned in the legged robot model.  
(Ji -[p.4630, col 2, ln 20 thru p.4631, col 1, ln 11]  In addition, to walk and run on challenging terrains blind, information about the terrain must be estimated. … In our proposed approach, this information is indirectly estimated as a distance from the terrain to the foot. …  To address the aforementioned shortcomings of the existing methods, we present a learning-based state estimation network, which is concurrently trained with the policy network.)
EXAMINER NOTE: The state estimation network (context aided estimator) is concurrently trained (learned) with the policy network (locomotion policy). The state estimation network implicitly estimates environmental information as a distance from terrain to foot.

Claim 4
Ji teaches the limitations of claim 3 as outlined above. Ji further teaches
wherein the legged robot model is a neural network that infers the action when a proprioceptive observation, a body velocity, and a latent state are given as a policy network configured as an actor network in an asymmetric actor-critic network.
(Ji - [p.4633, col 1, ln 23-31] The estimator network is designed to predict the state of the robot without utilizing a dedicated estimation algorithm. In this paper, the linear velocity, foot height, and contact probability are estimated. The linear velocity estimate is essential in following velocity command. By removing the necessity of sophisticated state estimation algorithms, the implementation on the robot becomes much simpler. It also has an advantage that the controllers become robust against inevitable errors of the state estimator.
EXAMINER NOTE: The estimator network estimates foot height and contact probability (latent state) and linear velocity (body velocity) from proprioceptive observations. See also annotated Fig. 2. Note that the policy network is an actor in an actor-critic network.

    PNG
    media_image3.png
    668
    1430
    media_image3.png
    Greyscale








Claim 6
Ji teaches the limitations of claim 4 as outlined above. Ji further teaches
wherein the proprioceptive observation is measured using a joint encoder and an inertial measurement unit (IMU), 
(Ji - Our goal is to develop an RL-based control framework that can follow the given velocity command, which consists of desired base linear velocities in the forward and lateral directions, and the desired yaw rate. We assume that the robot is equipped with
an Inertial Measurement Unit (IMU) and joint encoders.
[p.4633, col 1, ln 13-19] Our system takes sensor data as an input, and outputs desired joint positions for each actuator. Our framework still uses an analytical estimate of the gravity vector expressed in the body frame because it is computed by the IMU sensor. Furthermore, the estimation algorithms for the orientation are simple and reliable. Joint velocities are computed on the motor controllers by applying the finite difference method on joint positions.)
EXAMINER NOTE: The proprioceptive observations ot are collected via IMU and joint encoders. See also Fig. 2 (annotated version reproduced above in the rejection of claim 4), where the joint state is shown to come from encoders. 
and the body velocity and the latent state are estimated using the context-aided estimator.  
(Ji - [p. 4633, col 1, para. 3] The estimator network is designed to predict the state of the robot without utilizing a dedicated estimation algorithm. In this paper, the linear velocity, foot height, and contact probability are estimated.
EXAMINER NOTE: The estimator network estimates foot height and contact probability (latent state) and linear velocity (body velocity) from proprioceptive observations. See also annotated Fig. 2 (reproduced in rejection of claim 4). 
Claim 7
Ji teaches the limitations of claim 6 as outlined above. Ji further teaches
wherein the proprioceptive observation includes at least one of 
a body angular velocity, a gravity vector in a body frame, a body velocity command, a joint angle, a joint angular velocity, and a previous action.  
(Ji - [p.4633, col 1, ln 20-27] The observation tuple is defined as

    PNG
    media_image2.png
    38
    543
    media_image2.png
    Greyscale

where φ and ω are the base orientation and angular velocity, q
and q˙ are the joint positions and velocities, qdest−1 and qdest−2 are the desired joint position targets for two previous time steps, Qhist and Q'hist are the joint position error history and joint velocity history, bpf is the Cartesian positions of the feet relative to the center of mass expressed in the body frame, and cmd is the given velocity command.)

Claim 10
Ji teaches the limitations of claim 4 as outlined above. Ji further teaches
wherein the context-aided estimator includes a body velocity estimation model and an auto-encoder model that shares a unified encoder.
EXAMINER NOTE: See Fig. 2 of Ji. The estimator network encodes the proprioceptive observation ot into linear velocity (body velocity) and contact probability (latent state).





Claim 11
Ji teaches the limitations of claim 4 as outlined above. Ji further teaches
wherein the context-aided estimator includes a single encoder and a multi-head decoder and encodes the proprioceptive observation into the body velocity the latent state through the encoder.  
EXAMINER NOTE: See Fig. 2 of Ji. The estimator network encodes the proprioceptive observation ot into linear velocity (body velocity) and contact probability (latent state).

Claim 14
Ji teaches 
A non-transitory computer-readable recording medium storing instructions that, when executed by a processor, cause the processor to perform a legged robot control method
(Ji - [p.4635, col 2, Computational Cost] … Using a single core of Ryzen9 5950x, the estimator network takes 7 μs for a forward pass, …)
EXAMINER NOTE: The use of a Ryzen9 processor indicates a computing device executing computer-readable instructions included in a memory, as the Ryzen9 5950x is a processor designed for desktop computers.
robot control method comprising 
inferring an action of a quadrupedal robot from proprioception through a deep reinforcement learning-legged robot model, 
(Ji - [p.4632, col 2, Network Architecture] Our neural network structure consists of 3 components: an actor, a critic, and an estimator. All of them are designed as a Multi-Layer Perceptron (MLP) network, … The actor maps an observation to an action and the critic [20] estimates the value of the current state. The estimator network is to estimate states of the robot such as the base linear velocity. Those values are estimated by taking an observation ot as an input, and fed to the actor.)
wherein a locomotion policy that implicitly infers properties of terrains through which the quadrupedal robot moves is learned in the legged robot model.  
(Ji - [Abstract] … The trained policy and state estimator are capable of traversing diverse terrains such as a hill, slippery plate, and bumpy road.
[p.4630, col 2, ln 20 thru p.4631, col 1, ln 11]  In addition, to walk and run on challenging terrains blind, information about the terrain must be estimated. … In our proposed approach, this information is indirectly estimated as a distance from the terrain to the foot. …  To address the aforementioned shortcomings of the existing methods, we present a learning-based state estimation network, which is concurrently trained with the policy network.)

Claim 15
Ji teaches 
A computer-implemented legged robot control system comprising: at least one processor configured to execute computer-readable instructions included in a memory,
(Ji - [p.4635, col 2, Computational Cost] … Using a single core of Ryzen9 5950x, the estimator network takes 7 μs for a forward pass, …)
EXAMINER NOTE: The use of a Ryzen9 processor indicates a computing device executing computer-readable instructions included in a memory, as the Ryzen9 5950x is a processor designed for desktop computers.

inferring an action of a quadrupedal robot from proprioception through a deep reinforcement learning-legged robot model, 
(Ji - [p.4632, col 2, Network Architecture] Our neural network structure consists of 3 components: an actor, a critic, and an estimator. All of them are designed as a Multi-Layer Perceptron (MLP) network, … The actor maps an observation to an action and the critic [20] estimates the value of the current state. The estimator network is to estimate states of the robot such as the base linear velocity. Those values are estimated by taking an observation ot as an input, and fed to the actor.)
and a locomotion policy that implicitly infers properties of terrains through which the quadrupedal robot moves is learned in the legged robot model.  
(Ji - [Abstract] … The trained policy and state estimator are capable of traversing diverse terrains such as a hill, slippery plate, and bumpy road.
[p.4630, col 2, ln 20 thru p.4631, col 1, ln 11]  In addition, to walk and run on challenging terrains blind, information about the terrain must be estimated. … In our proposed approach, this information is indirectly estimated as a distance from the terrain to the foot. …  To address the aforementioned shortcomings of the existing methods, we present a learning-based state estimation network, which is concurrently trained with the policy network.)






Claim 16
Ji teaches the limitations of claim 15 as outlined above. Ji further teaches
wherein a locomotion policy that enables a blind locomotion of the quadrupedal robot using an asymmetric actor-critic architecture is learned in the legged robot model,
(Ji - [p.4632, col 2, Network Architecture] Our neural network structure consists of 3 components: an actor, a critic, and an estimator. All of them are designed as a Multi-Layer Perceptron (MLP) network, … The actor maps an observation to an action and the critic [20] estimates the value of the current state. The estimator network is to estimate states of the robot such as the base linear velocity. Those values are estimated by taking an observation ot as an input, and fed to the actor.
[p.4633, col 1, ln 13-14] - Our system takes sensor data as an input, and outputs desired joint positions for each actuator.
[p. 4633, col 1, ln 20-27] - The observation tuple is defined as

    PNG
    media_image2.png
    38
    543
    media_image2.png
    Greyscale

where φ and ω are the base orientation and angular velocity, q
and q˙ are the joint positions and velocities, qdest−1 and qdest−2 are the desired joint position targets for two previous time steps, Qhist and Q'hist are the joint position error history and joint velocity history, bpf is the Cartesian positions of the feet relative to the center of mass expressed in the body frame, and cmd is the given velocity command.)
EXAMINER NOTE: The observations do not include visual input, so the locomotion is therefore "blind"


and a context-aided estimator that estimates surrounding environmental information during a learning process of the locomotion policy is jointly learned in the legged robot model.  
(Ji -[p.4630, col 2, ln 20 thru p.4631, col 1, ln 11]  In addition, to walk and run on challenging terrains blind, information about the terrain must be estimated. … In our proposed approach, this information is indirectly estimated as a distance from the terrain to the foot. …  To address the aforementioned shortcomings of the existing methods, we present a learning-based state estimation network, which is concurrently trained with the policy network.)
EXAMINER NOTE: The state estimation network (context aided estimator) is concurrently trained (learned) with the policy network (locomotion policy). The state estimation network implicitly estimates environmental information as a distance from terrain to foot.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.




Claim(s) 5, 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ji in view of Ren (L. Ren, C. Wang, Y. Yang and Z. Cao, "A learning-based control approach for blind quadrupedal locomotion with guided-DRL and hierarchical-DRL," 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO), Sanya, China, 2021, pp. 881-886).

Claim 5
Ji teaches the limitations of claim 4 as outlined above. Ji further teaches
wherein the policy network is trained with an interplay with a value network configured as a critic network in the asymmetric actor-critic network,
EXAMINER NOTE: See annotated Fig. 2 above. The interplay is shown between the actor and critic networks, with the policy network being the actor.
and the value network is trained using … height information of the robot's surrounding environment.
(Ji - [p.4633, col 2, ln 16-21] First, foot contact states are obtainable from joint position errors. Second, a terrain slope becomes observable from the foot contact states, orientation, and joint positions. Therefore, as the slope is observable, the estimator network can compute the foot height under the assumption that the terrain is even.
EXAMINER NOTE: The estimator determines height information. Per Fig. 2, the output xt of the estimator is also used as input to the value network (critic network), which indicates that the value network is trained using height information. 


Ren also teaches the above interplay (see Fig. 1, cited below). Ji may not explicitly teach the following limitations in combination. However, Ren reaches
and the value network is trained using a disturbance force randomly applied to a robot's body 
(Ren - [p.883, col 1, ln 15-23] The robot senses information about the environment through force sensors and an inertial measurement unit (IMU). The leg phase signals Φt and the touchdown signal ct are detected by pressure sensor, the robot quaternion qt , the robot angular velocity wt and the robot accelerated velocity at are calculated by IMU. During the entire locomotion, no external sensors provide information about the terrain for robot. The state represents the proprioceptive measurements set, which is denoted as St = [Φt, ct, qt, wt, at ].)
EXAMINER NOTE: See Ren, Fig. 1. The state is used as an input to the value net, which indicates that the value net is trained using the state St.

    PNG
    media_image4.png
    422
    770
    media_image4.png
    Greyscale

(Ren - [p.885, col 1, ln 4-8] … The robot was commanded to trot in the forward direction and an external force was applied in the lateral direction by a pendulum, which is about 60N. Although the force is strong enough to cause the robot to slip, the robot can readjust At to return to steady locomotion.
[p.885, col 2, ln 1-7] It is worth noting, despite such disturbance was never specified in training, the policy automatically learned to balance the robot body. Similar to the adaptability of different terrains, the reason is that the pendulum striking causes changes in the values of St detected by the IMU and force sensors, and policy can adjust the values of At in time to recover the normal locomotion.)
EXAMINER NOTE: Ren's experiment discussed on p.885 indicates that random disturbance forces are implicitly learned through the state observed through the IMU and force sensors, and the values of the action At are adjusted according to the observed values of the state. This is similar to applicant's description of detecting external disturbance force dt on p.12, paragraph 2 ("… In the DreamWaQ model 200, the policy network may be trained to implicitly infer dt and ht from proprioception.")

It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Ji’s quadrupedal robot control with Ren’s suggestion to train the model to account for external disturbances in order to increase robustness.
(Ren - [p.885, col 1, ln 1-3] The robot will inevitably be subject to external disturbances during blind locomotion in the real environment, so it must be robust to maintain its stability and balance.)




Claim 17
Jin teaches the limitations of claim 15 as outlined above. Jin further teaches
the legged robot model is a neural network that infers the action when a proprioceptive observation, a body velocity, and a latent state are given as a policy network configured as an actor network in an asymmetric actor-critic network, 
(Ji - [p.4633, col 1, ln 23-31] The estimator network is designed to predict the state of the robot without utilizing a dedicated estimation algorithm. In this paper, the linear velocity, foot height, and contact probability are estimated. The linear velocity estimate is essential in following velocity command. By removing the necessity of sophisticated state estimation algorithms, the implementation on the robot becomes much simpler. It also has an advantage that the controllers become robust against inevitable errors of the state estimator.
EXAMINER NOTE: The estimator network estimates foot height and contact probability (latent state) and linear velocity (body velocity) from proprioceptive observations. See also annotated Fig. 2 (shown above with respect to claim 4). Note that the policy network is an actor in an actor-critic network.
the policy network is trained with an interplay with a value network configured as a critic network in the asymmetric actor-critic network, 
EXAMINER NOTE: See annotated Fig. 2 above. The interplay is shown between the actor and critic networks, with the policy network being the actor.
and the value network is trained using … height information of the robot's surrounding environment.
(Ji - [p.4633, col 2, ln 16-21] First, foot contact states are obtainable from joint position errors. Second, a terrain slope becomes observable from the foot contact states, orientation, and joint positions. Therefore, as the slope is observable, the estimator network can compute the foot height under the assumption that the terrain is even.
EXAMINER NOTE: The estimator determines height information. Per Fig. 2, the output xt of the estimator is also used as input to the value network (critic network), which indicates that the value network is trained using height information. 

Ren also teaches the above interplay (see Fig. 1, cited below). Ji may not explicitly teach the following limitations in combination. However, Ren reaches
and the value network is trained using a disturbance force randomly applied to a robot's body 
(Ren - [p.883, col 1, ln 15-23] The robot senses information about the environment through force sensors and an inertial measurement unit (IMU). The leg phase signals Φt and the touchdown signal ct are detected by pressure sensor, the robot quaternion qt , the robot angular velocity wt and the robot accelerated velocity at are calculated by IMU. During the entire locomotion, no external sensors provide information about the terrain for robot. The state represents the proprioceptive measurements set, which is denoted as St = [Φt, ct, qt, wt, at ].)
EXAMINER NOTE: See Ren, Fig. 1. The state is used as an input to the value net, which indicates that the value net is trained using the state St.

    PNG
    media_image4.png
    422
    770
    media_image4.png
    Greyscale

(Ren - [p.885, col 1, ln 4-8] … The robot was commanded to trot in the forward direction and an external force was applied in the lateral direction by a pendulum, which is about 60N. Although the force is strong enough to cause the robot to slip, the robot can readjust At to return to steady locomotion.
[p.885, col 2, ln 1-7] It is worth noting, despite such disturbance was never specified in training, the policy automatically learned to balance the robot body. Similar to the adaptability of different terrains, the reason is that the pendulum striking causes changes in the values of St detected by the IMU and force sensors, and policy can adjust the values of At in time to recover the normal locomotion.)
EXAMINER NOTE: Ren's experiment discussed on p.885 indicates that random disturbance forces are implicitly learned through the state observed through the IMU and force sensors, and the values of the action At are adjusted according to the observed values of the state. This is similar to applicant's description of detecting external disturbance force dt on p.12, paragraph 2 ("… In the DreamWaQ model 200, the policy network may be trained to implicitly infer dt and ht from proprioception.")

It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Ji’s quadrupedal robot control with Ren’s suggestion to train the model to account for external disturbances in order to increase robustness.
(Ren - [p.885, col 1, ln 1-3] The robot will inevitably be subject to external disturbances during blind locomotion in the real environment, so it must be robust to maintain its stability and balance.)

Claim 18
The combination of Ji and Ren teaches the limitations of claim 17 as outlined above. Ji further teaches
the proprioceptive observation is measured using a joint encoder and an inertial measurement unit (IMU), 
(Ji - Our goal is to develop an RL-based control framework that can follow the given velocity command, which consists of desired
base linear velocities in the forward and lateral directions, and
the desired yaw rate. We assume that the robot is equipped with
an Inertial Measurement Unit (IMU) and joint encoders.
[p.4633, col 1, ln 13-19] Our system takes sensor data as an input, and outputs desired joint positions for each actuator. Our framework still uses an analytical estimate of the gravity vector expressed in the body frame because it is computed by the IMU sensor. Furthermore, the estimation algorithms for the orientation are simple and reliable. Joint velocities are computed on the motor controllers by applying the finite difference method on joint positions.)
EXAMINER NOTE: The proprioceptive observations ot are collected via IMU and joint encoders. See also Fig. 2 (annotated version reproduced above in the rejection of claim 4), where the joint state is shown to come from encoders. 
the body velocity and the latent state are estimated using the context-aided estimator, 
(Ji - [p. 4633, col 1, para. 3] The estimator network is designed to predict the state of the robot without utilizing a dedicated estimation algorithm. In this paper, the linear velocity, foot height, and contact probability are estimated.
EXAMINER NOTE: The estimator network estimates foot height and contact probability (latent state) and linear velocity (body velocity) from proprioceptive observations. See also annotated Fig. 2 (reproduced in rejection of claim 4). 

and the context-aided estimator includes a single encoder and a multi-head decoder and encodes the proprioceptive observation into the body velocity the latent state through the encoder.  
EXAMINER NOTE: See Fig. 2 of Ji. The estimator network encodes the proprioceptive observation ot into linear velocity (body velocity) and contact probability (latent state).




Claims 8 is rejected under 35 U.S.C. 103 as being unpatentable over Ji in view of Lee (Lee, Joonho, Jemin Hwangbo, and Marco Hutter. "Robust recovery controller for a quadrupedal robot using deep reinforcement learning." arXiv preprint arXiv:1901.07517 (2019).

Claim 8
Ji teaches the limitations of claim 4 as outlined above. Ji may not explicitly teach the following limitations in combination. However, Lee teaches
wherein the policy network is trained to infer a joint angle around a robot's stand still pose.
(Lee - [p.5, col 2, para. 4] The output of a policy network is mapped to joint position targets differently depending on the task. For locomotion, the desired joint position φd is defined as φd = kot +φn where k is a scaling parameter, ot is the output, and φn is a nominal joint configuration (standing). It is designed such that the distribution of the target positions has a standard deviation of approximately 1 and mean at the nominal configuration at the beginning of training. It accelerates the learning because the agent explores trajectories near the standing configuration more frequently.)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Ji’s quadruped robot control with Lee’s suggestion to infer joint angle around standing pose in order to accelerate learning.



Claims 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Ji in view of Fu (Fu, Zipeng, et al. "Minimizing energy consumption leads to the emergence of gaits in legged robots." arXiv preprint arXiv:2111.01674 (2021).

Claim 12
Ji teaches the limitations of claim 4 as outlined above. Ji may not explicitly teach the following limitations in combination. However, Fu teaches
wherein a power distribution reward for a motor used on the robot is included in a reward function to train the policy network.  
(Fu - [p.4, 2.4 Energy Consumption-Based Reward] Let’s denote the linear velocity as v and the angular velocity as ω, both in the robot’s base frame. We additionally define joint torques as τ and joint velocities as q˙. We define our reward as sum of the
following three terms:

    PNG
    media_image5.png
    51
    733
    media_image5.png
    Greyscale

…
rforward rewards the agent for walking straight at the specified speed, renergy penalizes energy consumption and ralive is the survival bonus.
…
Notice that the actual energy consumption on the robot depending on the low-level hardware design is not directly measurable. We estimate the unit energy consumption per time step by summing the instantaneous power of the 12 motors by multiplying the torque and the joint velocity at each motor.)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Ji’s robot control with Fu’s suggestionto incorporate energy consumption rewards in order to create a policy which results in natural locomotion patterns in various terrains.
(Fu - [p.3, ln 6-9] The main contributions of this paper include:
Show that minimizing energy consumption plays a key role in the emergence of natural locomotion patterns in both flat as well as complex terrains at different speeds without relying on demonstrations or predefined motion heuristics.)

Claim 19
Ji teaches the limitations of claim 17 as outlined above. Ji may not explicitly teach the following limitations in combination. However, Fu teaches
wherein a power distribution reward for a motor used on the robot is included in a reward function to train the policy network.  
(Fu - [p.4, 2.4 Energy Consumption-Based Reward] Let’s denote the linear velocity as v and the angular velocity as ω, both in the robot’s base frame. We additionally define joint torques as τ and joint velocities as q˙. We define our reward as sum of the
following three terms:

    PNG
    media_image5.png
    51
    733
    media_image5.png
    Greyscale

…
rforward rewards the agent for walking straight at the specified speed, renergy penalizes energy consumption and ralive is the survival bonus.
…
Notice that the actual energy consumption on the robot depending on the low-level hardware design is not directly measurable. We estimate the unit energy consumption per time step by summing the instantaneous power of the 12 motors by multiplying the torque and the joint velocity at each motor.)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Ji’s robot control with Fu’s suggestionto incorporate energy consumption rewards in order to create a policy which results in natural locomotion patterns in various terrains.
(Fu - [p.3, ln 6-9] The main contributions of this paper include:
Show that minimizing energy consumption plays a key role in the emergence of natural locomotion patterns in both flat as well as complex terrains at different speeds without relying on demonstrations or predefined motion heuristics.)

Allowable Subject Matter
Claims 9, 13 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter: When considered in combination with the other limitations, Examiner is unable to find an explicit teaching or reasonable combination of references in the prior art which would suggest the limitations of “wherein the context-aided estimator is optimized using a hybrid loss function that includes body velocity estimation loss and variational auto-encoder (VAE) loss” (claim 9), “wherein adaptive bootstrapping for adaptively tuning a bootstrapping probability is performed according to a reward coefficient of variation by the context-aided estimator during training of the policy network,” (claims 13 and 20). Examiner notes that these limitations alone do not make the claims patent eligible. Rather, the claimed combination of these limitations with the limitations in the preceding claims appear to be non-obvious improvements. 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US-20240116170-A1
US-20210397961-A1
US-12151380-B2
US-11868882-B2
KR-20090126090-A
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES MILLER WATTS whose telephone number is (703)756-1249. The examiner can normally be reached 7:30-5:30 M-TH.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Adam Mott can be reached at 571-270-5376. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JAMES MILLER WATTS III/Examiner, Art Unit 3657                                                                                                                                                                                                        


/ADAM R MOTT/Supervisory Patent Examiner, Art Unit 3657
Read full office action
Prosecution Timeline

Mar 20, 2024
Application Filed
Oct 15, 2025
Non-Final Rejection — §102, §103
Jan 11, 2026
Response Filed
Precedent Cases

Applications granted by this same examiner with similar technology

18/746,382
Patent 12611772
ROBOT CONTROL DEVICE AND ROBOT CONTROL METHOD
1y 10m to grant Granted Apr 28, 2026
18/273,824
Patent 12605829
ROBOT SYSTEM AND WORKPIECE SUPPLY METHOD
2y 9m to grant Granted Apr 21, 2026
18/301,774
Patent 12608005
SMART MOWER AND SMART MOWING SYSTEM
3y 0m to grant Granted Apr 21, 2026
18/261,111
Patent 12600040
SIMULATION DEVICE USING THREE-DIMENSIONAL POSITION INFORMATION OBTAINED FROM OUTPUT FROM VISION SENSOR
2y 9m to grant Granted Apr 14, 2026
18/342,983
Patent 12576536
ROBOTIC WITH DEPTH FINDING CAPABILITY AND METHOD OF USING
2y 8m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
72%
Grant Probability
91%
With Interview (+18.7%)
2y 8m (~6m remaining)
Median Time to Grant
Low
PTA Risk
Based on 43 resolved cases by this examiner. Grant probability derived from career allowance rate.