Last updated: April 19, 2026
Application No. 18/240,124
ROBOT NAVIGATION IN DEPENDENCE ON GESTURE(S) OF HUMAN(S) IN ENVIRONMENT WITH ROBOT

Final Rejection §102§103
Filed
Aug 30, 2023
Examiner
MILES, JONATHAN WADE
Art Unit
3656
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Gdm Holding LLC
OA Round
2 (Final)
Interview Optional

— +48.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 578 resolved cases, 2023–2026
Examiner Intelligence

MILES, JONATHAN WADE View full profile →
Grants 70% — above average
Career Allow Rate
406 granted / 578 resolved
+18.2% vs TC avg
Strong +48% interview lift
Without
With
+48.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
5 currently pending
Career history
583
Total Applications
across all art units
Statute-Specific Performance

§101
0.6%
-39.4% vs TC avg
§103
38.1%
-1.9% vs TC avg
§102
33.1%
-6.9% vs TC avg
§112
21.8%
-18.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 578 resolved cases
Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
	This Office action is in response to the amendment filed September 10, 2025. Claim 13 is amended, and claims 1-20 are pending and addressed below.
Response to Arguments
Applicant's arguments filed September 10, 2025, have been fully considered but they are not persuasive.
In the arguments presented on pages 9-10, Applicant argues the rejection of the “sequence of state data” in the limitation “processing the sequence of state data, using a sequential neural network (NN) model, to generate a sequence of position deltas” is improper as the Office action alleged that the “wheel distance information” and “angular rate information” disclosed in [0094] of Zhang I. Applicant further alleges that Office action has failed to demonstrate the images disclosed in [0090] of Zhang I are processing using a NN model to generate a sequence of position deltas as claimed.
However, the rejection below and previously issued in the non-final rejection (see page 4) does identify [0090] of Zhang I as disclosing “AI core processor 221 implements selected ensemble neural networks implementing trained classifiers to determine a situation state of the environment encountered by the robot using sensory input.” [0090] of Zhang I further discloses “[a]pplication processor 222, in conjunction with AI core 221, gathers information including images captured from RGBD sensors 203”, wherein “[a]pplication processor 222 and AI core processor 221 process these information inputs to utilize it to understand the environment surrounding the robot, the robot's own internal operations and health, and desires and requests being made of the robot by its human companion and makes output of commands to utility processor system 301 to control the robot's own functions”. The color images captured by the RGBD cameras are “sensory inputs”, wherein the “information inputs” processed by the AI core processor includes the images captured by the RGBD cameras. [0094] of Zhang I further discloses that “color images from visual-spectrum sensitive camera 204…are provided to…RBG camera-based SLAM 285,” thus clearly tying the information together for processing. Therefore, rejection is maintained.
Applicant further alleges on pages 12-13 that Zhang I fails to teach “generating…an intermediate target position” and further “causing the low-level navigation policy to supplant a current goal position…with the intermediate target position.” However, as indicated in [0114], Zhang I “commands to travel to a location” based on the environment. Such command, inherently involves path planning, which inherently includes a series of intermediate target positions. Since Zhang I is constantly updating the environmental data based on the sensory inputs ([0090]) and may adjust the path accordingly to the sensory inputs to avoid obstacles as needed, which is further disclosed in [0219] of Zhang I.
Note that applicant is responsible for considering the entirety of the cited prior art.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 5, 8, 9, 10, 11, 13, 14, 15, and 19 are rejected under 35 U.S.C. 102(a)(2) as being unpatentable by Zhang et al., US Publication 2023/0363610 (hereinafter “Zhang 1”).

As per claim 1, Zhang 1 discloses a method implemented by one or more processors of a mobile robot in an environment. Zhang in paragraph 89 states “FIG. 2A depicts a representative robot architecture 200A suitable for implementing a multiple sensory capable home robot. Architecture 200A includes a higher-level cognitive level processor system 201 cooperatively coupled with a utility level processor system 202. Cognitive processor system 201 is preferably equipped to process nodes of one or more deep neural networks trained with responses to sensed inputs (e.g., 241, 251, 261, 271, 243, 244, 254, 264, 274, 284) from the robot's environment and commands from human or other supervisory users.” The passage describes both the cognitive processor system 201 and utility processor system 202, which include an application processor 222, and an AI core processor 221. These clearly constitute one or more processors. The processors are described as actively processing sensed inputs from the environment, understanding the environment, and controlling robot functions. These activities reflect implementation of computational methods. The entire excerpt is situated in the context of a mobile robot. Zhang 1 further discloses the method comprising: during navigation of the mobile robot using a low-level navigation policy that generates low-level robot actions in dependence on a corresponding goal position. Zhang 1 in paragraph 114 states “At a step 480, the method includes responsive to receiving a command to travel to a location, using the occupancy grid to plan a path of travel to a location commanded to avoid colliding with obstructions.” The excerpt from step 480 discloses that the robot receives a command to travel to a location which aligns with the concept of having a goal position. It uses the occupancy grid to plan a path of travel, which corresponds to generating low-level robot actions. The path planning is performed “to a location commanded” clearly tying the robot’s low-level motion planning and action execution to a specific goal. The description of using an occupancy grid to generate a movement plan reflects a low-level, reactive navigation mechanism based on spatial data, consistent with a low level navigation policy. Zhang 1 also discloses identifying a sequence of state data, the sequence of state data comprising: a sequence of color images that each include one or more color channels and that are each captured by a camera of the robot, wherein the sequence of color images includes a current color image and one or more previous color images. Zhang 1 in paragraph 90 states “RGBD Sensors 203 include nominally one or more Red Green Blue (RGB) visual range cameras 204 configured to capture images of the environment surrounding the robot…Application processor 222, in conjunction with AI core 221, gathers information including images captured from RGBD sensors 203…AI core processor 221 implements selected ensemble neural networks implementing trained classifiers to determine a situation state of the environment encountered by the robot using sensory input”. Paragraph 90 discloses that “RGBD sensors 203” include “Red Green Blue (RGB) visual range cameras 204,” which directly supports the requirement for color images with color channels. It explains that these cameras are “configured to capture images of the environment,” confirming that images are actively acquired by the robot. The application processor and AI core “gather information including images captures from RGBD sensors” and these images are used as sensory input to determine a “situation state”. This indicates that images are part of a state data sequence. Zhang 1 discloses the sequence of state data comprising…a sequence of normalized position data instances that each reflects a corresponding position already encountered by the mobile robot during navigation of the mobile robot, wherein the sequence of normalized position data instances includes a current position data instance and one or more previous position data instances. Zhang 1 in paragraph 114 states “At a step 430, the method includes, determining, by a processor, a three-dimensional 3D point cloud of points having 3D information including location information from the depth camera and the at least one visual spectrum-capable camera. The points correspond to the features in the environment as extracted…At a step 480, the method includes responsive to receiving a command to travel to a location, using the occupancy grid to plan a path of travel to a location commanded to avoid colliding with obstructions.” Step 430 explicitly mentions that the robot determines a “3D point cloud of points having 3D information including location information” from sensor data, which inherently contains positional data gathered at various times and positions during the robot’s operation. The process of creating a point cloud using sensory input from multiple positions, coupled with an occupancy map, implies the robot stores and updates spatial representation as it navigates. This history of observed positions forms a sequence of position data instances, as required by the limitation. Step 480 indicates that the robot uses the occupancy grid, which is populated by previously encountered positions, to plan a current navigation path. This requires accessing both current position data (to determine the robot’s present location) and previously recorded position data (to evaluate possible paths and obstructions). This effectively forms a sequence of normalized position data instances including a current and one or more previous instances. The transformation of raw sensor data into a structured 3D point cloud and occupancy grid inherently involves normalization or standardization of data formats and coordinate frames. Zhang 1 discloses the method comprising…processing the sequence of state data, using a sequential neural network (NN) model, to generate a sequence of position deltas. Zhang 1 in paragraph 90 states “In some implementations, AI core processor 221 implements selected ensemble neural networks implementing trained classifiers to determine a situation state of the environment encountered by the robot using sensory input of the robot and training of the classifiers.” Further Zhang 1 in paragraph 94 states “wheel distance information from wheel encoder 251, and angular rate information from Inertial Measuring Unit (IMU) 261 are provided to a plurality of perception processors 280 including RBG camera-based SLAM 285, Gyro/wheel encoder odometry positions 286… These outputs are provided to decision logics 290 of vSLAM based path planning 295, odometry based path planning 296”. The sequence of state data is shown by the combination of “wheel distance information” and “angular rate information from Inertial Measuring Unit (IMU) 261”. These inputs (continuous over time) inherently form a sequence representing the robot’s motion state. Processing using a sequential neural network model is indicated by the term “ensemble neural networks implementing trained classifiers” used in the context of processing time-varying sensory inputs suggests sequential data processing. Generating a sequence of position deltas is indicated by “Gyro/wheel encoder odometry positions 286”, this shows the model or processing block outputs position deltas over time, i.e., the relative changes in position calculated from the state data sequence. Zhang 1 discloses generating, based on the sequence of position deltas generated using the sequential NN model, an intermediate target position. Zhang 1 in paragraph 94 states “In FIG. 2B, multiple sensory inputs including distance information from depth sensing camera 202, color images from visual-spectrum sensitive camera 204, wheel distance information from wheel encoder 251, and angular rate information from Inertial Measuring Unit (IMU) 261 are provided to a plurality of perception processors 280 including RBG camera-based SLAM 285, Gyro/wheel encoder odometry positions 286, depth camera SLAM 287, and scene understanding and obstacle detection 288. These outputs are provided to decision logics 290 of vSLAM based path planning 295, odometry based path planning 296 and scan based semantic path planning 297, respectively.” Zhang 1 in paragraph 90 states “In some implementations, AI core processor 221 implements selected ensemble neural networks implementing trained classifiers to determine a situation state of the environment encountered by the robot using sensory input of the robot and training of the classifiers. Implementations may use supervised machine learning (i.e., the machine learning task of learning a function that maps an input to an output based on example input-output pairs), un-supervised machine learning (i.e., the system discovers features of the input population without a prior set of categories defined), or combinations thereof to train classifiers…In one deep learning system implementation, a training stage of a deep neural network that trains the deep neural network to submit hundreds of training sensory input samples to multiple sensory input recognition engines”. The system collects sequences of sensory data including position changes (position deltas) from wheel encoders and IMU (pose sensors 241, wheel encoders 251, IMU 261) which reflect incremental changes in the robot’s position. These sequences of data are fed into deep neural networks embedded in the cognitive processor system 201 and perception processors 280. The neural network process this sequential sensory data, effectively using a form of sequential neural network to generate a perceptual understanding of the robot’s position an environment. The processed outputs, such as odometry based positions and semantic understandings, are input into decision logic 290, which generates path planning outputs. This path planning inherently involves determining intermediate target positions along the route the robot should take. These intermediate target positions are generated based on the position deltas interpreted by the neural network, thus meeting the limitation. Zhang 1 discloses the method comprising…in response to generating the intermediate target position: causing the low-level navigation policy to supplant a current goal position, being used by the low-level navigation policy as the corresponding goal position, with the intermediate target position. Zhang in paragraph 114 “At a step 480, the method includes responsive to receiving a command to travel to a location, using the occupancy grid to plan a path of travel to a location commanded to avoid colliding with obstructions.” This step describes a system that, upon receiving a command to travel to a location (analogous to generating an intermediate target position), uses an occupancy grid to plan a path to that location. This inherently involves updating the goal of the navigation system, i.e., supplanting the current goal with the new commanded location as the active target for movement. This mirrors the claimed limitation where the intermediate target position (newly commanded location) is generated. The low-level navigation policy (the component executing path planning based on the occupancy grid) is caused to use this new target instead of a prior one. Thus, the system supplants a previously used goal position with the newly generated immediate target position as input to the low-level policy.

As per claim 5, Zhang 1 discloses the method of claim 1, accordingly, the rejection of claim 1 above is incorporated. Zhang 1 further discloses a method wherein the low-level navigation policy generates low- level robot actions further in dependence on a corresponding goal orientation for the corresponding goal position. Zhang 1 in paragraph 114 states “At a step 480, the method includes responsive to receiving a command to travel to a location, using the occupancy grid to plan a path of travel to a location commanded to avoid colliding with obstructions.” The excerpt from step 480 discloses that the robot receives a command to travel to a location which aligns with the concept of having a goal position. It uses the occupancy grid to plan a path of travel, which corresponds to generating low-level robot actions. The path planning is performed “to a location commanded” clearly tying the robot’s low-level motion planning and action execution to a specific goal. The description of using an occupancy grid to generate a movement plan reflects a low-level, reactive navigation mechanism based on spatial data, consistent with a low level navigation policy. Zhang 1 also discloses a method further comprising: generating, based on at least some of the sequence of position deltas, an intermediate target orientation for the intermediate target position. Paragraph 233 describes performing robot movements (linear or rotational) and accumulating sensor data, followed by calculating reference poses using encoder data. This involves determining changes in position and orientation over each segment. Paragraphs 245-247 in the context of computing calibration parameters, explicitly states “storing angle data from encoder, calculated reference pose, and OFS readings for each segment. This suggests that the robot’s pose which includes orientation, is calculated from sensor data over a series of position deltas (i.e., movement data). Zhang 1 discloses a method causing the low-level navigation policy to supplant a current goal orientation, being used by the low-level navigation policy as the corresponding goal orientation for the current goal position, with the intermediate target orientation. Zhang in paragraph 114 states “At a step 480, the method includes responsive to receiving a command to travel to a location, using the occupancy grid to plan a path of travel to a location commanded to avoid colliding with obstructions.” This step describes a system that, upon receiving a command to travel to a location (analogous to generating an intermediate target position), uses an occupancy grid to plan a path to that location. This inherently involves updating the goal of the navigation system, i.e., supplanting the current goal with the new commanded location as the active target for movement. This mirrors the claimed limitation where the intermediate target position (newly commanded location) is generated. The low-level navigation policy (the component executing path planning based on the occupancy grid) is caused to use this new target instead of a prior one. Thus, the system supplants a previously used goal position with the newly generated immediate target position as input to the low-level policy.

As per claim 8, Zhang 1 discloses the method of claim 1, accordingly, the rejection of claim 1 above is incorporated. Zhang 1 discloses a method wherein the low-level navigation policy generates the low-level robot actions independent of any color images and independent of any data derived from any color images. Zhang 1 in paragraph 91 states “Utility processor system 202 includes a mobile platform processor 242 coupled to a set of pose sensors 241, a set of terrain sensors 243, power system 293 and a set of motor drivers (not shown in FIG. 2A for clarity's sake) that in turn drive various motors and actuators. In one representative example in which robot is equipped with an integrated cleaning system, processor 242 can control Drive motors 231, and cleaning system motors”, Zhang 1 in paragraph 92 states “Pose sensors 241 include wheel encoders 251… Pose sensor 241 also includes an Inertial measurement Unit (IMU) 261… Optical flow sensors 271… Other types of sensors not shown in FIG. 2A can also provide pose information to processor 241.” The utility processor system 202 is described as the module responsible for low-level robot functions such as motion control, motor driving, and interaction with pose and terrain sensors. Low-level robot actions are directly managed by this utility processor system 202 and its mobile platform processor 242. The pose sensors 241 and terrain sensors 243 used by the utility processor include encoders, IMUs, optical flow sensors, and contact/infrared sensors, all of which provide nonvisual, non color-based data. The utility processor system 202 and its sensors are independent of color images or data derived from color images for controlling motors and generating low-level actions. This, the low-level navigation policy embodied by the utility processor system generates robot control commands solely based on non-visual sensors, which inherently excludes dependence on color images.

As per claim 9, Zhang 1 discloses the method of claim 1, accordingly, the rejection of claim 1 above is incorporated. Zhang 1 further discloses a method wherein the low-level navigation policy generates the low-level robot actions further in dependence on corresponding occupancy maps each reflecting, for a corresponding area of the environment, occupied and/or unoccupied spaces of the corresponding area. Zhang 1 in paragraph 326 states “The method can include annotating, by a processor, the occupancy map with annotations of object identities at locations corresponding to at least some of the points in the 3D point cloud. The method can include using the occupancy map as annotated to plan paths to avoid certain ones of objects based upon identity and location.” The excerpt teaches the generation of low-level robot actions (path planning or avoidance behaviors) based on an occupancy map that identifies occupied and unoccupied spaces in the robot’s environment. The excerpt confirms that the robot relies on an occupancy map that has been enriched with object/location data (indicating unoccupied regions). It describes how the robot uses this map to determine navigation behavior (generating low-level robot actions). Since the map is annotated with object locations, it reflects occupied (obstacles) and unoccupied (pathways) spaces of the environment.

As per claim 10, Zhang 1 discloses the method of claim 9, accordingly, the rejection of claim 9 above is incorporated. Zhang 1 further discloses a method wherein each of the corresponding occupancy maps, of the sequence, is a corresponding two-dimensional project of a point cloud generated by a light detection and ranging (LiDAR) scanner of the mobile robot. Zhang 1 in paragraph 218 states “A person or entity can also command the robot to go to a specific point in the occupancy grid 2155. While traveling, the robot uses the descriptive point cloud 2145 to localize itself within the map as described herein above in the Tracking sections. The robot can update the map using the techniques described herein above in the Deep Learning Architecture sections. Further, some implementations equipped with active sensors (e.g., sonar, LIDAR) can update the map using information from these sensors as well.” The excerpt explicitly states that the robot is equipped with a LiDAR sensor and that data from this sensor is used to update the map. This satisfies the requirement that the LiDAR scanner of the mobile robot is the source of data. It also confirms that the robot utilizes a point cloud as part of its navigation and localization system. The excerpt describes the robot updating a map, which includes the occupancy grid 2155, using information from LiDAR. In mapping and robotics, it is standard practice for occupancy grids to be two-dimensional projections derived from three-dimensional point cloud data. Therefore, this implies that the occupancy map is a 2D projection of the point cloud generated by the LiDAR scanner.

As per claim 11, Zhang 1 discloses the method of claim 9, accordingly, the rejection of claim 9 above is incorporated. Zhang 1 further discloses a method wherein the sequence of state data further comprises: a sequence of the corresponding occupancy maps, including a current occupancy map and one or more previous occupancy maps previously utilized by the low-level navigation policy during the navigation of the mobile robot. Zhang 1 in paragraph 218 states “When the robot is activated in a previously mapped environment, the robot uses the technology described herein above in the Tracking sections to self-locate within the descriptive point cloud 2145. In cases where the robot finds itself in an unmapped environment, the occupancy grid and path planning can be used without a previously built map by using the SLAM system described herein above to build a map in real-time, thereby enabling the robot to localize itself in the unmapped environment.” The excerpt describes previously mapped scenarios which indicates the robot uses pre-existing occupancy maps which are stored from earlier navigation sessions (i.e., previous occupancy maps). The excerpt also describes an unmapped environment scenario in which the robot generates and updates occupancy maps in real time, meaning there is a temporal sequence of occupancy maps, created and updated during navigation. The current map is continuously updated and recent past maps influence SLAM based localization and planning decisions. The reference to path planning and localization in conjunction with the occupancy grid and SLAM indicates active decision-making during navigation, tasks handled by a low-level navigation policy. This, the occupancy maps, both current and past are being utilized during navigation by the system controlling movement.

As per claim 13, Zhang 1 discloses the method of claim 1, accordingly, the rejection of claim 1 above is incorporated. Zhang 1 further discloses a method wherein processing the sequence of state data, using the sequential NN model, to generate the sequence of position deltas, comprises: processing the sequence of color images, using an image processing tower of the sequential NN model, to generate a sequence of color image embeddings. Zhang 1 in paragraph 114 states “The process starts at a step 410 which includes receiving sets of image data from a tightly coupled pairing of at least one visual spectrum capable camera and at least one depth sensing camera, including feature points and depth information. The depth information includes a distance to objects relative to the depth camera. At a step 420, the method includes extracting, by a processor, from the images content to include in a point cloud of features in the environment. At a step 430, the method includes, determining, by a processor, a three-dimensional 3D point cloud of points having 3D information including location information from the depth camera and the at least one visual spectrum-capable camera.” and in paragraph 117 states “In one example, a neural network ensemble can implement a set of classifiers that are trained to classify situation states according to input data gathered from robot's sensors and to trigger learned behaviors based upon the situation state classification.” The described deep learning system includes a neural network ensemble structure to interpret sequential sensors data and generate behavioral triggers. In paragraph 114 it states, “receiving sets of image data from…visual spectrum capable camera”, this details that the system gathers multiple image frames from a visual camera that is capable of capturing images in the visible light spectrum (color images), which inherently creates a sequence over time. The image processing tower from the limitation corresponds to the part of the model processing visual inputs, i.e., CNN-like subnetworks that extract features from images before passing them downstream. Step 420 “extracting…content to include in a point cloud data” from paragraph 114 and the step of “input data gathered from robot’s sensors…encoded into fields of a vector (or tensor) representation” from paragraph 117 describe transforming taw image data into lower dimensional representations (embeddings). Step 480 from paragraph 114 describes “using the occupancy grid to plan a path of travel”, this describes that the system outputs positional changes (deltas) over time, derived from the sequence of visual inputs and their processed features. Zhang 1 discloses a method wherein processing the sequence of state data, using the sequential NN model, to generate the sequence of position deltas, comprises…processing the sequence of normalized position data instances, using a position processing tower of the sequential NN model, to generate a sequence of position data vector. Zhang 1 in paragraph 117 states “In one example, a neural network ensemble can implement a set of classifiers that are trained to classify situation states according to input data gathered from robot's sensors and to trigger learned behaviors based upon the situation state classification.” Zhang 1 in paragraph 116 states “Inputs whether structured or unstructured data type data points, can be encoded into fields of a vector (or tensor) representation.” These excerpts describe a neural network ensemble that receives sensor data inputs, processes them into vector representations, and uses them to classify situational states. Input data gathered from the robot’s sensors corresponds to a sequence of normalized position data instances. The position processing power of the sequential neural network model is represented by the trained neural network classifiers that classify situational states based on input sensor data. Processing the sequence of normalized position data instances to generate a sequence of position data vectors is shown in paragraph 116, which explains that inputs are encoded into vector representations. These vectors can be understood as the position data vectors resulting from processing the input instances. The use of time-series inputs (sensor data over time) implies a sequence, and the generated vectors correspond to the internal representations (encodings) used for downstream classification tasks. Zhang 1 further discloses a method wherein processing the sequence of state data, using the sequential NN model, to generate the sequence of position deltas, comprises…processing a sequence of fusions, using fusion layers of the sequential NN model, to generate the sequence of position deltas, each of the sequences of fusions comprising a corresponding one of the color image embeddings and a corresponding one of the position data vectors. The sequential neural network model is disclosed as “an ensemble of trained neural network classifiers” in paragraph 114. The deep learning architecture and ensemble of trained neural network classifiers suggest a structured model with ordered data flow, matching the concept of a sequential model composed of stacked layers or branches (e.g., processing towers). Color image embeddings is disclosed as “set of image data…from visual spectrum capable cameras” which is processed into “point cloud of features in the environment” and used as a neural network input. The sequence of position data vectors corresponds to “input data gathered from robot’s sensors” from paragraph 117. The sequence of position deltas maps to behavioral outputs from the classification process. Paragraph 125 describes a sequence of layers processing progressively complex fusions of features, starting with small local patterns and building into higher order representations. Paragraph 126 suggests a sequential model structure with interdependent layers, indicating that fused inputs are processed through this sequence to generate outputs like position deltas. It describes a layered architecture where fused input data is processed across multiple types of layers, consistent with a sequence of fusion layers. This paragraph also indicates that each layer receives fused features from the previous stage, supporting the notion of sequential fusion processing.

As per claim 14, Zhang 1 discloses the method of claim 13, accordingly, the rejection of claim 13 above is incorporated. Zhang 1 discloses a method wherein the sequence of color image embeddings are each a corresponding lower-dimensional encoding of a corresponding one of the color images of the sequence of color images. Zhang 1 in paragraph 126 states “A convolutional neural network learns highly non-linear mappings by interconnecting layers of artificial neurons arranged in many different layers with activation functions that make the layers dependent. It includes one or more convolutional layers, interspersed with one or more sub-sampling layers and non-linear layers, which are typically followed by one or more fully connected layers. Each element of the convolutional neural network receives inputs from a set of features in the previous layer.” Paragraph 114 describes that the system includes a “visual spectrum capable camera,” which indicates that the captures input images are in the visual (color) spectrum. Paragraph 126 explains that a convolutional neural network includes one or more convolutional layers, subsampling layers, and non-linear layers, which are typically followed by one or more fully connected layers. The reference indicates that when multi-dimensional input (such as images) enters the network, the CNN transforms this data into learned representations through layers of abstraction. These representations correspond to the “color image embeddings” as compact, numerical representations of images. The transformation into such embeddings inherently involves dimensionality reduction, aligning with the concept of “lower-dimensional encoding” as defined in the application. Furthermore, since the CNN processes image data through sequential layers, the reference indicates a sequence of input color images, each being encoded into a corresponding embedding. Zhang 1 further discloses a method wherein the sequence of position data vectors are each a corresponding higher-dimensional projection of a corresponding one of the normalized position data instances of the sequence of normalized position data instances. Zhang 1 in paragraph 122 states “some neural networks implementing AI core 221 are implemented as an ensemble of subnetworks trained using datasets widely chosen from appropriate conclusions about environmental conditions and incorrect conclusions about environmental conditions, with outputs including classifications of anomalies based upon the input sensed data”. Paragraph 122 of the reference explains that neural networks implementing the AI core are trained using datasets derived from environmental conditions and sensor input, with outputs including classifications and triggered remedial actions. These neural networks when implemented as convolutional neural networks, are described in paragraphs 123-126 as learning hierarchical and abstract representations from raw input data. This includes transformation of sensory input data into increasingly abstract representations through successive neural network layers. The sequence of normalized position data instances, from the limitation, corresponds to the structured sensor input data, such as the position, activity, or condition data collected by the robot’s sensors, that is used as input to the neural network. The sequence of position data vectors, as explained in the limitation corresponds to the learned representations derived from the input data, as the network processes and projects this input into higher-dimensional features spaces through convolution and fully connected layers. Accordingly, the reference teaches that each higher-dimensional learned representation (position data vector) is generated from and corresponds to a lower-dimensional, normalized input (position data instance), this disclosing the claimed relationship between the normalized position data and the projected higher dimensional vectors.

As per claim 15, Zhang 1 discloses the method of claim 14, accordingly, the rejection of claim 14 above is incorporated. Zhang 1 further discloses a method wherein the sequence of color image embeddings are each of a given dimension and wherein the sequence of position data vectors are also each of the given dimension. Zhang 1 in paragraph 116 states “An exemplary deep neural network implementation selects an appropriate classification from a set of environmental conditions using a set of inputs to the neural network-based classifier(s). Inputs whether structured or unstructured data type data points, can be encoded into fields of a vector (or tensor) representation. Implementations will employ various levels of abstraction in configuring, classification and anomaly detection tasks, e.g., in an elder home care application, data can be selected to describe detected condition of the cared for person, potentially medically significant changes to the cared for person, emergency as well as non-emergency changes to the environment and so forth.” Paragraphs 123-126, the convolutional neural network (CNN) is described as processing input images and transforming them through a hierarchy of convolution and nonlinear layers. This processing results in compact feature representations, commonly known as embeddings, in layer layers, especially after the fully connected layers in CNNs. Thus, the CNN produces fixed-length color image embeddings as output features from raw image inputs. The output from the CNN layers corresponds to the sequence of color image embeddings each of a given dimension, which is inherent to the architecture of CNNs. In paragraph 116, the disclosure explains that structured and unstructured inputs are encoded into fields of a vector representation. For time-series or positional data, the system would normalize and encode these sequences into vectors, which are also of fixed length, constituting position data vectors. Each normalized position instance is represented as a vector of a given dimension, forming a sequence of position data vector. The sequence of position data vectors of the limitation corresponds to vector-encoded representations of normalized time-ordered positional inputs (e.g., sensor data), as described in paragraph 116.

As per claim 19, Zhang 1 discloses a method implemented by one or more processors of a mobile robot during navigation of the mobile robot in an environment. Zhang 1 in paragraph 89 states “FIG. 2A depicts a representative robot architecture 200A suitable for implementing a multiple sensory capable home robot. Architecture 200A includes a higher-level cognitive level processor system 201 cooperatively coupled with a utility level processor system 202. Cognitive processor system 201 is preferably equipped to process nodes of one or more deep neural networks trained with responses to sensed inputs (e.g., 241, 251, 261, 271, 243, 244, 254, 264, 274, 284) from the robot's environment and commands from human or other supervisory users.” The passage describes both the cognitive processor system 201 and utility processor system 202, which include an application processor 222, and an AI core processor 221. These clearly constitute one or more processors. The processors are described as actively processing sensed inputs from the environment, understanding the environment, and controlling robot functions. These activities reflect implementation of computational methods. The entire excerpt is situated in the context of a mobile robot. Zhang 1 further discloses the method comprising: at each of a plurality of low-level iterations: processing, using a low-level navigation policy, a corresponding occupancy map and a corresponding goal position, to generate corresponding low-level robot actions, and providing the corresponding low-level robot actions to actuators of the mobile robot. Zhang 1 in paragraph 114 states “At a step 460, the method includes populating at least one layer of the occupancy map (e.g., a 2D occupancy grid) with points from the point cloud of features within a height range using ray tracing from an observed location of a point in the images aligned to a corresponding point in the occupancy grid and a location of a corresponding point reprojected on the layer of the occupancy grid. At a step 470, the method includes finding cells along a ray between the aligned observed point and the corresponding point reprojected on the layer and marking the found cells as empty. At a step 480, the method includes responsive to receiving a command to travel to a location, using the occupancy grid to plan a path of travel to a location commanded to avoid colliding with obstructions.” Step 480 describes using the occupancy grid to plan a path of travel. This shows the application of a local planning policy. The outcome of this path planning step would be low-level robot actions, such as incremental linear and angular velocity commands or waypoint following instructions. Zhang 1 discloses the method comprising…at each of a plurality of high-level iterations: identifying a corresponding sequence of state data, the corresponding sequence of state data comprising: a corresponding sequence of color images that are each captured by a camera of the robot, wherein the sequence of color images includes a current color image and one or more previous color images. Zhang 1 in paragraph 90 states “RGBD Sensors 203 include nominally one or more Red Green Blue (RGB) visual range cameras 204 configured to capture images of the environment surrounding the robot…Application processor 222, in conjunction with AI core 221, gathers information including images captured from RGBD sensors 203…AI core processor 221 implements selected ensemble neural networks implementing trained classifiers to determine a situation state of the environment encountered by the robot using sensory input”. Paragraph 90 discloses that “RGBD sensors 203” include “Red Green Blue (RGB) visual range cameras 204,” which directly supports the requirement for color images with color channels. It explains that these cameras are “configured to capture images of the environment,” confirming that images are actively acquired by the robot. The application processor and AI core “gather information including images captures from RGBD sensors” and these images are used as sensory input to determine a “situation state”. This indicates that images are part of a state data sequence. Zhang 1 discloses the corresponding sequence of state data comprising…processing the corresponding sequence of state data, using a neural network (NN) model, to generate a corresponding sequence of position deltas. Zhang 1 in paragraph 90 states “In some implementations, AI core processor 221 implements selected ensemble neural networks implementing trained classifiers to determine a situation state of the environment encountered by the robot using sensory input of the robot and training of the classifiers.” and in paragraph 94 states “wheel distance information from wheel encoder 251, and angular rate information from Inertial Measuring Unit (IMU) 261 are provided to a plurality of perception processors 280 including RBG camera-based SLAM 285, Gyro/wheel encoder odometry positions 286…These outputs are provided to decision logics 290 of vSLAM based path planning 295, odometry based path planning 296”. The sequence of state data is shown by the combination of “wheel distance information” and “angular rate information from Inertial Measuring Unit (IMU) 261”. These inputs (continuous over time) inherently form a sequence representing the robot’s motion state. Processing using a sequential neural network model is indicated by the term “ensemble neural networks implementing trained classifiers” used in the context of processing time-varying sensory inputs suggests sequential data processing. Generating a sequence of position deltas is indicated by “Gyro/wheel encoder odometry positions 286”, this shows the model or processing block outputs position deltas over time, i.e., the relative changes in position calculated from the state data sequence. Zhang 1 discloses the corresponding sequence of state data comprising…generating, based on the corresponding sequence of position deltas generated using the sequential NN model, a corresponding intermediate target position. Zhang 1 in paragraph 94 states “In FIG. 2B, multiple sensory inputs including distance information from depth sensing camera 202, color images from visual-spectrum sensitive camera 204, wheel distance information from wheel encoder 251, and angular rate information from Inertial Measuring Unit (IMU) 261 are provided to a plurality of perception processors 280 including RBG camera-based SLAM 285, Gyro/wheel encoder odometry positions 286, depth camera SLAM 287, and scene understanding and obstacle detection 288. These outputs are provided to decision logics 290 of vSLAM based path planning 295, odometry based path planning 296 and scan based semantic path planning 297, respectively.” and in paragraph 90 states “In some implementations, AI core processor 221 implements selected ensemble neural networks implementing trained classifiers to determine a situation state of the environment encountered by the robot using sensory input of the robot and training of the classifiers. Implementations may use supervised machine learning (i.e., the machine learning task of learning a function that maps an input to an output based on example input-output pairs), un-supervised machine learning (i.e., the system discovers features of the input population without a prior set of categories defined), or combinations thereof to train classifiers…In one deep learning system implementation, a training stage of a deep neural network that trains the deep neural network to submit hundreds of training sensory input samples to multiple sensory input recognition engines”. The system collects sequences of sensory data including position changes (position deltas) from wheel encoders and IMU (pose sensors 241, wheel encoders 251, IMU 261) which reflect incremental changes in the robot’s position. These sequences of data are fed into deep neural networks embedded in the cognitive processor system 201 and perception processors 280. The neural network process this sequential sensory data, effectively using a form of sequential neural network to generate a perceptual understanding of the robot’s position an environment. The processed outputs, such as odometry based positions and semantic understandings, are input into decision logic 290, which generates path planning outputs. This path planning inherently involves determining intermediate target positions along the route the robot should take. These intermediate target positions are generated based on the position deltas interpreted by the neural network, thus meeting the limitation. Zhang 1 discloses the corresponding sequence of state data comprising…causing the low-level navigation policy to, in at least a corresponding next of the low-level iterations, utilize the corresponding intermediate target position as the corresponding goal position. Zhang 1 in paragraph 114 states “At a step 480, the method includes responsive to receiving a command to travel to a location, using the occupancy grid to plan a path of travel to a location commanded to avoid colliding with obstructions.” This step describes a system that, upon receiving a command to travel to a location (analogous to generating an intermediate target position), uses an occupancy grid to plan a path to that location. This inherently involves updating the goal of the navigation system, i.e., supplanting the current goal with the new commanded location as the active target for movement. This mirrors the claimed limitation where the intermediate target position (newly commanded location) is generated. The low-level navigation policy (the component executing path planning based on the occupancy grid) is caused to use this new target instead of a prior one. Thus, the system supplants a previously used goal position with the newly generated immediate target position as input to the low-level policy.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., US Publication 2023/0363610 (hereinafter “Zhang 1”) in view of Zhang, US Publication 2021/0311469 (hereinafter “Zhang 2”).

As per claim 2, Zhang 1 discloses the method of claim 1, accordingly, the rejection of claim 1 above is incorporated. Zhang 2 further teaches a method wherein at least some of the color images of the sequence of color images collectively capture a human, in the environment, providing a particular gesture. Zhang 2 in paragraph 46 states “In some other implementations, determining the category and the direction of the gesture may also include…determining the category and the direction corresponding to the target gesture as the category and the direction of the gesture. As shown in FIG. 6 (c), the gesture direction is upward, and the gesture category is to raise the thumb.” The excerpt explains the presence of a human performing a particular gesture within the environment which is captured in an image. The overall context of the reference shows the system processes a series of input images (paragraphs 42 and 57), which indicates a sequence. Zhang 2 teaches a method wherein the sequence of position deltas, and the intermediate target position, correspond to the particular gesture. Zhang 2 in paragraph 32 states “An instruction corresponding to the pose information is sent to the intelligent vehicle to adjust the motion state of the intelligent vehicle. The motion state of the intelligent vehicle includes a stationary state, a steering state, a backward state, a forward state, and the like… An instruction corresponding to the category of the gesture is sent to the controller of the intelligent vehicle to control the motion direction of the intelligent vehicle.” The sequence of position deltas is reflected in how the intelligent vehicle receives motion and control instructions (e.g., move forward, steer, or stop). Each instruction changes the vehicle’s position incrementally over time, which is effectively a sequence of position deltas. The intermediate target position is disclosed by the instruction that directs the vehicle’s motion direction. These directional commands act as waypoints that guide the robot step by step in response to the recognized gesture, helping it progress toward an overall behavior goal. The requirement that position deltas and intermediate target position correspond to the particular gesture is satisfied because the system maps specific gestures (as classified by the neural network) to distinct control instructions that define how the robot should move and in which direction.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang 1 in view of Zhang 2 with a reasonable expectation of success, as both references are directed to methods and systems for robotic navigation and human-robot interaction in dynamic environments. Zhang 1 discloses a method for processing a sequence of state data using a sequential neural network model to generate position deltas and an intermediate target position during robot navigation. Zhang 2 teaches gesture based interaction between a human and a robot, wherein a human provides a gesture that influences the robot’s navigation behavior, including modifying the robot’s planned target position based on the gesture. The resulting combination would be a navigation method in which the robot processes a sequence of color images that capture a human gesture and correlates the gesture to an adjustment in its navigation trajectory, including updated position deltas and intermediate target positions. A person of ordinary skill in the art would have been motivated to combine the inventions because gesture based control enhances natural and intuitive human-robot interaction, particularly in home or service environments, allowing users to guide or influence robot behavior without requiring verbal commands or physical interfaces. The additional limitations of claim 2 would have been obvious to implement in the method disclosed by Zhang 1. Integrating Zhang 1’s disclosure with the teachings of Zhang 2 would yield predictable results and require no inventive effort. 

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., US Publication 2023/0363610 (hereinafter “Zhang 1”) in view of Zhang, US Publication 2021/0311469 (hereinafter “Zhang 2”) further in view of Yung et al., WIPO 2019/028075 (hereinafter “Yung”).

As per claim 3, Zhang 1 in view of Zhang 2 discloses the method of claim 2, accordingly, the rejection of claim 2 above is incorporated. Yung teaches wherein the sequential machine learning model has been previously trained based on supervised training data instances generated from imitation learning episodes in which a corresponding human operator controlled a corresponding mobile robot in dependence on a corresponding gesture provided by a corresponding human captured by a corresponding camera of the corresponding mobile robot. Yung in paragraph 67 states “In some embodiments, the robot can also have the capability of imitation learning, where a single assembly task or a sequence of assembly tasks can be modeled and learned by the robot. In a further embodiment, the robot can include an AI-enabled HMI that allows the robot to communicate with a human worker. For example, the human worker can input verbal or gesture-based commands to control movements of the robot. Moreover, a human worker can also demonstrate to the robot how to perform a certain task.” The excerpt describes a robot capable of imitation learning, wherein a human worker demonstrates tasks that the robot model includes an AI enabled human machine interface allowing it to receive and respond to gesture based command from a human worker to control its movements. This setup indicates that the robot captures human gestures and uses them to guide its behavior during demonstration episodes. These episodes constitute imitation learning scenarios where the robot observes and records the human operator’s gesture-based control of the robot, generating supervised training data from which the robot learns. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang 1 and Zhang 2 in view of Yung with a reasonable expectation of success, as all references are directed to methods and systems for robotic navigation and human-robot interaction in dynamic environments. Zhang 1 as modified discloses a method in which a mobile robot navigates to a goal location using low level navigation policies based on states data comprising sequences of color images and normalized position data, which are processed by a sequential neural network model to generate position deltas and an intermediate target position. Yung teaches training a sequential neural network model using supervised learning data generated from imitation learning episodes, where a human operator controls a mobile robot based on gestures captures by a camera of the robot. The resulting combination would be a method in which the mobile robot processes color images and position data using a previously trained sequential neural network model, the model having been trained via imitation learning episodes in which human operators guided robot behavior in response to gesture inputs. A person of ordinary skill in the art would have been motivated to combine the inventions because training sequential models using imitation learning improves the model’s ability to generalize from human demonstrations, especially for human-interactive mobile robot tasks. The additional limitations of claim 3 would have been obvious to implement in the method disclosed by Zhang 1 as modified. Integrating the disclosure of Zhang 1 as modified with the teachings of Yung would yield predictable results and require no inventive effort. 

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., US Publication 2023/0363610 (hereinafter “Zhang 1”) in view of Zhang, US Publication 2021/0311469 (hereinafter “Zhang 2”) and Yung et al., WIPO 2019/028075 (hereinafter “Yung”), further in view of Hajash et al., US Publication 2022/0314444 (hereinafter “Hajash”) and Plikynas et al., US Publication 2023/0050825 (hereinafter “Plikynas”).

As per claim 4, Zhang 1 as modified discloses the method of claim 3, accordingly, the rejection of claim 3 above is incorporated. Hajash teaches a method wherein a given supervised training data instance, of the supervised training data instances, is generated based on only a segment of one of the imitation learning episodes. Hajash in paragraph 43 states “In particular embodiments, the IL model may be trained through imitation learning or behavioral cloning with expert demonstrations. In imitation learning, the loss function may be KL-divergence between the expert and learned policies, the total sum of squared error for each episode between the expert and learned policies, or many other IL loss functions.” The excerpt states that loss function can be computed “for each episode”, which suggests use of full episodes, it also mentions that the imitation learning model is trained using behavioral cloning with expert demonstrations. Behavior cloning typically involves collecting pairs from demonstrations and using them as individual supervised training instances. This inherently supports the idea that only segments (e.g., individual steps or short sequences) from the expert demonstrations (episodes) are used as supervised data points. This indicates that supervised data instances are generated from segments of longer imitation learning episodes. This interpretation aligns with common practice in behavioral cloning, where each supervised training data instance corresponds to a moment within a longer trajectory. Hajash teaches a method wherein the segment consists of an earlier in time portion and a later in time portion that follows the earlier in time portion. Hajash in paragraph 45 states “Then, for each new episode, the parametric trajectory 330 may take an action from the ML agent 320 and observation 312 from the environment 310. In particular embodiments, the output of the parametric trajectory 330 may be a trajectory of waypoints defined by pose (position and orientation), referred as pose-based trajectory.” Hajash in paragraph 47 states “The aforementioned further specification may result in a fully specified trajectory 550 of waypoint poses (e.g., position and orientation)”. The disclosure in paragraph 45 and 47 discusses a trajectory of waypoints which are defined by pose and describes them as sequentially processed in time for each new episode with actions and observations shaping the output. These waypoints form a trajectory, which is inherently ordered in time. A trajectory made of waypoints represents a series of poses that the robot will follow in a specific temporal order. Because these waypoints represent a series of poses that the robot will follow in a specific temporal order. Because these waypoints are arranged along a path that the robot follows step by step (i.e., earlier and later positions in time), the trajectory segment clearly consists of an earlier in time portion (initial waypoints) and a later in time portion (subsequent waypoints that follow). Hajash further teaches a method wherein the given supervised training data instance comprises: training instance input that includes…an imitation earlier sequence of normalized position data instances that each reflect a corresponding earlier imitation position encountered by the corresponding mobile robot during the earlier in time portion of the segment. Hajash in paragraph 43 states “In yet another alternative embodiment, the second component may be an imitation learning (IL) agent comprising an IL model. The IL model may use loss as the learning signal 314. In particular embodiments, the IL model may be trained through imitation learning or behavioral cloning with expert demonstrations. In imitation learning, the loss function may be KL-divergence between the expert and learned policies, the total sum of squared error for each episode between the expert and learned policies, or many other IL loss functions.” The excerpt describes an imitation learning agent trained using expert demonstrations, which corresponds to the concept of imitation in the limitation. The earlier sequence of normalized position data instances corresponds to the “expert demonstrations” or “earlier imitation positions” that the mobile robot tries to mimic, since imitation learning is based on expert trajectories, and those are typically normalized for training consistency, it supports the inclusion of normalized position data instances. The IL model learns from a sequence of expert policy data (which are earlier in time trajectories or positions encountered by the robot or expert), consistent with the limitation that the robot imitates earlier encountered positions. The loss functions measure the difference between the robot’s learned trajectory and the expert’s earlier trajectory, ensuring the model imitates the earlier sequence of positions. This training method effectively captures the robot’s imitation of a sequence of positions encountered during prior demonstrations (the earlier in time portion of the segment). Hajash teaches a training instance output that includes: an imitation later sequence of normalized position data instances that each reflect a corresponding later imitation position encountered by the corresponding mobile robot during the later in time portion of the segment. Hajash in paragraph 43 states “In yet another alternative embodiment, the second component may be an imitation learning (IL) agent comprising an IL model. The IL model may use loss as the learning signal 314. In particular embodiments, the IL model may be trained through imitation learning or behavioral cloning with expert demonstrations. In imitation learning, the loss function may be KL-divergence between the expert and learned policies, the total sum of squared error for each episode between the expert and learned policies, or many other IL loss functions.” The excerpt directly teaches the limitation by describing an imitation learning model, training via behavioral cloning or expert demonstrations, and using loss functions that compare sequences of actions or policies. By referencing “expert and learned policies” over “each episode”, this indicates a time sequenced comparison, which matches the requirement that the training instance includes a sequence of later imitation positions (i.e., actions or positions taken at a later portion of the segment). Furthermore, since imitation learning is based on expert trajectories, and those are typically normalized for training consistency, it supports the inclusion of normalized position data instances. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang 1 as modified in view of Hajash with a reasonable expectation of success, as all references are directed to methods and systems for robotic navigation and human-robot interaction in dynamic environments. Zhang 1 as modified discloses a method in which a mobile robot navigates to a goal location using low level navigation policies based on states data comprising sequences of color images and normalized position data, which are processed by a sequential neural network model to generate position deltas and an intermediate target position. Hajash teaches further refinement of the supervised training data generation process by segmenting imitation learning episodes into temporally ordered segments and generating a given supervised training data instance from such a segment. The resulting combination would be a robot learning framework in which supervised training data is constructed from temporarily segmented portions of those episodes. A person of ordinary skill in the art would have been motivated to combine the inventions because segmenting episodes into earlier and later portions to generate more granular supervised data instances, enhances the quality and density of training data. This segmentation approach improves the model’s ability to learn time dependent behaviors and make temporarily extended predictions, thereby increasing the robustness and generalizability of the navigation model. The limitations of claim 4 taught by Hajash would have been obvious to implement in the method disclosed by Zhang 1 as modified. Integrating the disclosure of Zhang 1 as modified with the teachings of Hajash would yield predictable results and require no inventive effort. 
Plikynas teaches a method wherein the given supervised training data instance comprises: training instance input that includes: an imitation sequence of color images, captured by a corresponding camera of the corresponding mobile robot during the earlier in time portion of the segment. Plikynas in paragraph 163 states “The imitation-learning component allows to record and learn (modality#1, see FIG. 3) trajectory-conditioned BVI movement command controller (modality#2, see FIG. 4), which accepts camera image as input, and outputs the motion command (e.g., forward, left or right turn). In this case, to prepare a navigation module for a particular trajectory, sighted users collect training data of the corresponding trajectory (see FIG. 1 and FIG. 3), which consists of a set of images paired with the motion information (forward, left or right turn)”. The excerpt explicitly describes supervised training data, data collected by sighted users that includes a set of images (i.e., a sequence of color images from a camera) paired with corresponding motion commands (labels), which is classic supervised learning. The reference to the “imitation-learning component” that uses this data to train a trajectory-conditioned movement command controller further supports the imitation-based nature of the training data. The phrase “set of images paired with motion information” clearly maps to training instance input that includes an imitation sequence of color images. Paragraphs 157 through 160 establishes that the camera input used throughout the system is in the form of color images and paragraph 163 refers to training on “a set of images” captures from a camera. The system consistently uses color image input, as stated in paragraphs 157 through 160, the ”set of images” in paragraph 163 is interpreted as a sequence of color images. The collected training data corresponds to a trajectory segment recorded earlier, as the system is trained on this prior data before deployment. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang 1 as modified in view of Plikynas with a reasonable expectation of success, as all references are directed to methods and systems for robotic navigation and human-robot interaction in dynamic environments. Zhang 1 as modified discloses a method in which a mobile robot navigates to a goal location using low level navigation policies based on states data comprising sequences of color images and normalized position data, which are processed by a sequential neural network model to generate position deltas and an intermediate target position. Plikynas teaches using prior sequences of robot state data as training inputs and corresponding future sequences as training targets in supervised learning. The resulting combination would be a training method for a mobile robot wherein supervised training instances are structured such that the input includes earlier in time color image sequences and normalized position data, while the output includes corresponding later in time position sequences, with each instance extracted from a segment of an imitation learning episode. A person of ordinary skill in the art would have been motivated to combine the inventions because using temporarily segmented training data improves the model’s ability to learn relationships between earlier sensor states and future outcomes, thereby enhancing the accuracy of motion prediction and decision making in robot navigation. This structure simplifies the training process by aligning input-output pairs in a consistent, forward-looking temporal format. The limitation of claim 4 taught by Plikynas would have been obvious to implement in the method disclosed by Zhang 1 as modified. Integrating the disclosure of Zhang 1 as modified with the teachings of Plikynas would yield predictable results and require no inventive effort. 

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., US Publication 2023/0363610 (hereinafter “Zhang 1”) in view of Zak et al., US Publication 2023/0001587 (hereinafter “Zak”).

As per claim 6, Zhang 1 discloses the method of claim 5, accordingly, the rejection of claim 5 above is incorporated. Zak further teaches a method wherein generating the intermediate target orientation comprises generating the intermediate target orientation based on a trajectory of the sequence of position deltas. Zak in paragraph 56 states “According to an aspect, the step of performing adaptive trajectory planning may include the sub-steps of planning a plurality of possible paths 40 between the source location 26 and the destination incorporating geometrical information of the robot and source location 26 and the pick orientation and the destination 28 which may include the target location and the target orientation; and determining a best path 40 between the source location 26 and the destination 28”. The excerpt discusses generating paths between a source and a destination which indicates a sequence of positional changes, i.e., a trajectory of position deltas. Since the planning considers both target orientation and geometry, and these are used during the generation and evaluation paths, this indicates the target orientation (which would include intermediate target orientations along the way) is being generated based on the trajectory. The simulation of multiple paths and selection of a best path inherently involves determining how the robot should move (trajectory), what operation it must perform along the way, and using geometric and positional information (position deltas) to derive those orientations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang 1 in view of Zak with a reasonable expectation of success, as both references are directed to navigation and control of autonomous mobile robots using sensory input and neural network based processing. Zhang 1 discloses a method implemented by a mobile robot that includes navigation using a low-level navigation policy, capturing sequences of color images and position data, processing this data using sequential neural networks to generate position deltas, and determining intermediate target positions for navigation. Zak teaches adjusting a current goal position used by a navigation controller with a newly determined intermediate target position to enhance local navigation responsiveness and adaptability in dynamic environments. The resulting combination would be a mobile robot system that uses neural network-based position delta generation to compute an intermediate target position, and dynamically supplanting the original goal position with this intermediate target position during execution of the low-level navigation policy. A person of ordinary skill in the art would have been motivated to combine the inventions because integrating Zak’s technique allows the robot to dynamically adapt its navigation behavior by updating its immediate navigation goals, improving responsiveness, collision avoidance, and local path optimization, especially in environments with moving obstacles. The additional limitation of claim 6 would have been obvious to implement in the method disclosed by Zhang 1. Integrating the disclosure of Zhang 1 with the teachings of Zak would yield predictable results and require no inventive effort. 

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., US Publication 2023/0363610 (hereinafter “Zhang 1”) in view of Yang et al., US Publication 2024/0175702 (hereinafter “Yang”).

As per claim 7, Zhang 1 discloses the method of claim 1, accordingly, the rejection of claim 1 above is incorporated. Yang teaches a method wherein the low-level navigation policy is a model predictive control (MPC) policy. Yang in paragraph 62 states “In an exemplary embodiment, the navigation control instruction corresponding to the final navigation path in the next control step of the robot may be generated by using the local planning path, the first robot state, the second robot state, the preset robot model corresponding to the robot, and the control step based on a model predictive control algorithm…The MPC algorithm has a hard constraint satisfaction characteristic and a forward-looking closed-loop control characteristic. The preset robot model corresponding to the robot is introduced into the MPC algorithm, and collision-free trajectory generation and optimal control signal generation are implemented in conjunction with the A* algorithm and the MPC algorithm.” The excerpt directly discloses that the control commands (navigation and control instructions) are generated for each control step using a model predictive control algorithm. The MPC algorithm operates using a robot model which captures the robot’s dynamics, a local planning path, which the robot should follow, robot state information, and a control step, aligning with typical inputs to an MPC based controller. This satisfies the functional definition of a low level navigation policy implemented using model predictive control, therefore, the excerpt describes that the system’s low level control which is responsible for real time navigation decisions is governed by an MPC based policy.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang 1 in view of Yang with a reasonable expectation of success, as both references are directed to navigation and control of autonomous mobile robots using sensory input and neural network based processing. Zhang 1 discloses a method implemented by a mobile robot that includes navigation using a low-level navigation policy, capturing sequences of color images and position data, processing this data using sequential neural networks to generate position deltas, and determining intermediate target positions for navigation. Yang teaches the use of model predictive control as the low level navigation policy to execute actions that achieve short term goals in mobile robot navigation. The resulting combination would be a mobile robot system that employs the high-level perception and planning framework, while utilizing MPC as the underlying low-level control policy that generated robot actions in response to updated goal positions. A person of ordinary skill in the art would have been motivated to combine the inventions because MPC offers well-known advantages for robotic navigation, including the ability to handle dynamic environments, optimize trajectories over short horizons, and explicitly consider system constraints which would improve the execution of intermediate goals. The additional limitation of clam 7 would have been obvious to implement in the method disclosed by Zhang 1. Integrating the disclosure of Zhang 1 with the teachings of Yang would yield predictable results and require no inventive effort.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., US Publication 2023/0363610 (hereinafter “Zhang 1”) in view of Fujinawa et al., US Publication 20210064122 (hereinafter “Fujinawa”).

As per claim 12, Zhang 1 discloses the method of claim 1, accordingly, the rejection of claim 1 above is incorporated. Fujinawa further teaches a method wherein the corresponding normalized positions are each generated based on a difference between: a corresponding non-normalized position already encountered by the mobile robot at a corresponding time, and an overall goal position for the navigation. Fujinawa in paragraph 164 states “To move the shoot ball robot 200a to a desired position through the command, the control information generation unit 114 may use P (proportional) control that feeds back an output in accordance with the difference between a target position and a position before movement.” The excerpt references the “position before movement” which is the robot’s current or previously encountered position at a specific moment in time. This directly corresponds to the non-normalized position already encountered. The “target position” mentioned in the excerpt is the desired endpoint for the robot’s movement. This aligns with the overall goal position of the limitation. The excerpt further explains that a proportional control method is used to generate feedback output based on the difference between the target position and the position before movement. In control systems, this feedback is a transformed value representing how far the current position is from the goal. The feedback serves as a normalized position signal, since it encodes positional error relative to a fixed target, which can be used in navigation policies.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang 1 in view of Fujinawa with a reasonable expectation of success, as both references are directed to navigation and control of autonomous mobile robots using sensory input and neural network based processing. Zhang 1 discloses a method implemented by a mobile robot that includes navigation using a low-level navigation policy, capturing sequences of color images and position data, processing this data using sequential neural networks to generate position deltas, and determining intermediate target positions for navigation. Fujinawa teaches generating normalized position data by computing the difference between previously encountered positions and overall goal position. The resulting combination would be a system in which normalized positions are generated by calculating differences between prior robot positions and an overall goal position, and where those normalized values form part of the sequence of state data used by the robot’s neural network based navigation system. A person of ordinary skill in the art would have been motivated to combine the inventions because using a difference based normalization technique improves the consistency and interpretability of positional inputs to machine learning models and simplifies downstream computations for target selection and trajectory planning. The additional limitation of claim 12 would have been obvious to implement in the method disclosed by Zhang 1. Integrating the disclosure of Zhang 1 with the teachings of Fujinawa would yield predictable results and require no inventive effort.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., US Publication 2023/0363610 (hereinafter “Zhang 1”) in view of Gradoussoff et al., US Publication 2024/0061401 (hereinafter “Gradoussoff”).

As per claim 16, Zhang 1 discloses the method of claim 1, accordingly, the rejection of claim 1 above is incorporated. Gradoussoff further teaches a method wherein the low-level robot actions are torque commands. Gradoussoff in paragraph 113 states “The respective wiring diagrams 100, 102 involve a low-level robot control meta-component 110, which manages a set of low-level commands, for example, the interfacing 114 with the hardware, the monitoring of setpoints, the control-command of the motor torques 116”. The excerpt explicitly discloses that the low-level robot control component manages low level commands and among the examples of those commands is the “control-command of the motor torques”. This shows that low-level commands include control over motor torques. The control-command of torques refers to instructions or signals that control the torque output of motors. Since torque commands are a form of motor control at a low level, this supports the requirement that low-level robot actions include torque commands.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang 1 in view of Gradoussoff with a reasonable expectation of success, as both references are directed to methods and control architectures for mobile robots navigating in an environment using low-level motion policies based on sensory input. Zhang 1 discloses a method implemented by a mobile robot that includes navigation using a low-level navigation policy, capturing sequences of color images and position data, processing this data using sequential neural networks to generate position deltas, and determining intermediate target positions for navigation. Gradoussoff teaches that low level robot actions may include direct control signals such as torque commands issued to robot actuators, which are generated based on sensory input and navigation goals. The resulting combination would be a mobile robot system that processes image and position sequences to generate navigation targets using neural networks and uses torque commands as the low-level robot actions for physically achieving those targets. A person of ordinary skill in the art would have been motivated to combine the inventions because using torque commands provides a more direct and flexible control interface for physical movement, enabling smoother and more adaptive low-level navigation behavior, especially in dynamic environments. The additional limitation of claim 16 would have been obvious to implement in the method disclosed by Zhang 1. Integrating the disclosure of Zhang 1 with the teachings of Gradoussoff would yield predictable results and require no inventive effort.

Claims 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., US Publication 2023/0363610 (hereinafter “Zhang 1”) in view of Spector et al., US Publication 2022/0331964 (hereinafter “Spector”).

As per claim 17, Zhang 1 discloses the method of claim 1, accordingly, the rejection of claim 1 above is incorporated. Spector further teaches a method wherein generating, based on the sequence of position deltas generated using the sequential NN model, the intermediate target position, comprises: generating the intermediate target position based on a sum of the sequence of position deltas. Spector in paragraph 94 states “a neural network which is trained to derive, from camera images, movement vectors which specify movements from the positions at which the camera images are taken to insert objects into insertions.” and in paragraph 96 states “an insertion task is formulated as a regression task (determination of a movement vector) which is solved by a neural network. The movement vector may comprise translational and rotational components which may be represented using any suitable representation, e.g. coordinate system or system of angles.” The excerpts teach that the neural network is trained to derive motion vectors (i.e., position deltas). These motion vectors represent the movement needed from the current position to the target insertion position. The motion vector may include translation components (position deltas). The process is framed as a regression task, which commonly involves computing a series of predictions that are summed or accumulated to reach a final outcome. The interpretation of motion planning via multiple neural network derived deltas (especially in a sequential control or continuous feedback loop) suggests that the target or intermediate position is computed by accumulating (summing) a sequence of motion vectors over time or frames.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang 1 in view of Spector with a reasonable expectation of success, as both references are directed to robot navigation and control using sensor data and machine learning models. Zhang 1 discloses a method implemented by a mobile robot that includes navigation using a low-level navigation policy, capturing sequences of color images and position data, processing this data using sequential neural networks to generate position deltas, and determining intermediate target positions for navigation. Spector teaches generating an output position by summing a sequence of position delta vectors in a recurrent neural network based navigation model. The resulting combination would be robot navigation methos in which the intermediate target position is generated by computing the sum of position deltas output by a sequential neural network. A person of ordinary skill in the art would have been motivated to combine the inventions because summing a sequence of delta vectors to generate a position estimate or navigation target is a well known and computationally efficient technique for trajectory integration, which improves the accuracy and simplicity of estimating the robot’s future position. The additional limitation of claim 17 would have been obvious to implement in the method disclosed by Zhang 1. Integrating the disclosure of Zhang 1 with the teachings of Spector would yield predictable results and require no inventive effort.

As per claim 18, Zhang 1 discloses the method of claim 1, accordingly, the rejection of claim 1 above is incorporated. Spector further teaches a method wherein generating the intermediate target position based on the sum of the sequence of position deltas comprises: generating the intermediate target position as an overall sum of: the sum of the sequence of position deltas, and the current position data instance. Spector in paragraph 6 states “The movement vector may for example be a vector of the differences between the pose of the current position (position at which camera image is taken) and the pose of the insertion (e.g. in terms of Cartesian coordinates for the position and angles for the orientation).” and in paragraph 94 states “In 605, the camera image is fed into a neural network which is trained to derive, from camera images, movement vectors which specify movements from the positions at which the camera images are taken to insert objects into insertions.” The excerpts teach generating an intermediate target position (i.e., a next step in the robot’s movement) by adding: a motion vector, which represents the delta or change from the current position to the target position and the current position, which is explicitly stated to be the “position where the camera image was taken.” The motion vector is derived by the neural network based on input images and possibly additional sensor data, and it effectively encapsulates the sum of position deltas needed to reach the insertion point. Adding this motion vector to the current position yields the intermediate target position. Thus, the excerpts teach the limitation because it involves generating a motion vector (i.e., a cumulative or derived delta), it applies that delta vector to the current position, and the result is the position the robot should move to, i.e., an intermediate target position compounded by summing a sequence of deltas and the current position.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang 1 in view of Spector with a reasonable expectation of success, as both references are directed to robot navigation and control using sensor data and machine learning models. Zhang 1 discloses a method implemented by a mobile robot that includes navigation using a low-level navigation policy, capturing sequences of color images and position data, processing this data using sequential neural networks to generate position deltas, and determining intermediate target positions for navigation. Spector teaches generating an output position by summing a sequence of position delta vectors in a recurrent neural network based navigation model. The resulting combination would be robot navigation methods in which the intermediate target position is generated by computing the sum of position deltas output by a sequential neural network. A person of ordinary skill in the art would have been motivated to combine the inventions because summing a sequence of delta vectors to generate a position estimate or navigation target is a well known and computationally efficient technique for trajectory integration, which improves the accuracy and simplicity of estimating the robot’s future position. The additional limitation of claim 18 would have been obvious to implement in the method disclosed by Zhang 1. Integrating the disclosure of Zhang 1 with the teachings of Spector would yield predictable results and require no inventive effort.

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., US Publication 2023/0363610 (hereinafter “Zhang 1”) in view of Bitan et al., US Publication 20200349728 (hereinafter “Bitan”).

As per claim 20, Zhang 1 discloses the method of claim 19, accordingly, the rejection of claim 19 above is incorporated. Bitan further teaches a method wherein the low-level iterations are performed at a low- level frequency that is more frequent than a high-level frequency at which the high-level iterations are performed. Bitan in paragraph 37 states “A first iteration uses a low frequency with an associated large maximal range, to obtain a coarse depth measurement. Since depth quality in ToF systems is proportional to the modulation frequency as just mentioned, this first measurement may have low quality for the RoI in the observed scene. The second iteration selects a higher frequency such that its corresponding ambiguity range is derived from the precision of the previous iteration, to cover a measurement error of the previous iteration.” The first iteration is described as using a low frequency to obtain a coarse depth measurement. This matches the idea of a low-level iteration; it is the initial pass that gives a rough estimate using a low frequency. The second iteration uses a higher frequency, selected based on the outcome of the first (coarse) iteration. The excerpt clearly contrasts two different frequencies, this distinction teaches the concept that low-level and high-level iterations operate at different frequencies.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang 1 in view of Bitan with a reasonable expectation of success, as both references are directed to navigation and control of mobile robots using machine learning and policy based architectures. Zhang 1 discloses a method implemented by a mobile robot that includes navigation using a low-level navigation policy, capturing sequences of color images and position data, processing this data using sequential neural networks to generate position deltas, and determining intermediate target positions for navigation. Bitan teaches a hierarchical control framework for robotic navigation in which low-level iterations are executed at a higher frequency than high-level iterations. The resulting combination would be a hierarchical navigation system for a mobile robot that uses a sequential neural network for perceptual and positional processing while structuring the execution of high level and low level iterations at different frequencies, with the low level navigation policy being executed more frequently than the high level policy. A person of ordinary skill in the art would have been motivated to combine the inventions because using different execution frequencies for hierarchical policies improves system responsiveness and computational efficiency, allowing the robot to react quickly to dynamic environmental changes while still following long term objectives. The additional limitation of claim 20 would have been obvious to implement in the method disclosed by Zhang 1. Integrating the disclosure of Zhang 1 with the teachings of Bitan would yield predictable results and require no inventive effort.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Wade Miles whose telephone number is (571)270-7777. The examiner can normally be reached Monday-Friday 10:00 am - 7:00 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trammel can be reached at (571) 272-6712. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/WADE MILES/Supervisory Patent Examiner, Art Unit 3656
Read full office action
Prosecution Timeline

Aug 30, 2023
Application Filed
May 29, 2025
Non-Final Rejection — §102, §103
Sep 09, 2025
Response Filed
Jan 05, 2026
Final Rejection — §102, §103
Apr 08, 2026
Examiner Interview Summary
Apr 08, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

18/464,114
Patent 12446892
DUAL SIDE SPRING V-CLIP FOR SURGICAL TREATMENT OF LEFT ATRIAL APPENDAGE
2y 5m to grant Granted Oct 21, 2025
17/411,174
Patent 12193935
PULL-THROUGH CHORDAE TENDINEAE SYSTEM
2y 5m to grant Granted Jan 14, 2025
18/342,519
Patent 12193680
OCCLUSION CLIP
2y 5m to grant Granted Jan 14, 2025
18/346,044
Patent 12185959
ASPIRATION THROMBECTOMY SYSTEM AND METHODS FOR THROMBUS REMOVAL WITH ASPIRATION CATHETER
2y 5m to grant Granted Jan 07, 2025
17/326,219
Patent 12161343
Implant Detachment
2y 5m to grant Granted Dec 10, 2024
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
70%
Grant Probability
99%
With Interview (+48.1%)
3y 2m
Median Time to Grant
Moderate
PTA Risk
Based on 578 resolved cases by this examiner. Grant probability derived from career allow rate.