Last updated: April 19, 2026
Application No. 18/389,022
CONTROL POLICIES FOR ROBOTIC AGENTS

Final Rejection §103§DP
Filed
Nov 13, 2023
Examiner
ALGHAZZY, SHAMCY
Art Unit
2128
Tech Center
2100 — Computer Architecture & Software
Assignee
Google LLC
OA Round
2 (Final)
Interview Optional

— +0.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 62 resolved cases, 2023–2026
Examiner Intelligence

ALGHAZZY, SHAMCY View full profile →
Grants 48% of resolved cases
Career Allow Rate
30 granted / 62 resolved
-6.6% vs TC avg
Minimal +1% lift
Without
With
+0.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
25 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
34.9%
-5.1% vs TC avg
§103
39.3%
-0.7% vs TC avg
§102
11.1%
-28.9% vs TC avg
§112
10.0%
-30.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 62 resolved cases
Office Action

§103 §DP
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner's Note
The Examiner respectfully requests of the Applicant in preparing responses, to fully consider the entirety of the reference(s) as potentially teaching all or part of the claimed invention.  It is noted, REFERENCES ARE RELEVANT AS PRIOR ART FOR ALL THEY CONTAIN.  “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned.  They are part of the literature of the art, relevant for all they contain.”  In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)).  A reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art, including non-preferred embodiments (see MPEP 2123).  The Examiner has cited particular locations in the reference(s) as applied to the claim(s) above for the convenience of the Applicant.  Although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim(s), typically other passages and figures will apply as well.

Response to Arguments
Applicant’s arguments pertaining to rejection under 35 U.S.C. 103, pages 9-11 of the REMARKS filed 09/22nd/2025, have been considered but they are moot in view of the new grounds of rejection necessitated by the amendment.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 2-8, 10-16, 18-21 are non-provisionally (anticipated), and claims 9 and 17 are non-provisionally (obviousness) rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-5, 7, 9-12, 14-15, 17, and 19-21 of US Patent 11853876. Although the claims at issue are not identical, the limitations have similar subject matters. Both of the instant application and US patent are directed towards machine learning. One of ordinary skill in the art would conclude, after a cursory examination of the claims, that the two claimed inventions are obvious variants of each other. This is a non-provisional nonstatutory double patenting rejection.
Instant Application
US Patent 11853876
Claim 2
Claims 1
However, US Patent 11853876 fails to particularly teach the limitation of claim 1 in the instant application predict different motion directions for different pixels on a same object of the one or more objects given one or more actions to be performed by the robotic agent. 

On the other hand, JAIN teaches predict different motion directions for different pixels on a same object of the one or more objects given one or more actions to be performed by the robotic agent ([0074] The frame may be a two-dimensional (2D) slice or three-dimensional (3D) slide of spatio-temporal data. Furthermore, the frame may be referred to as an image. The attention map provides the recurrent neural network with the location of action in a frame using appearance motion information, such as optical flow. In one configuration, the optical flow may be described by a field of vectors that indicate a predicted movement of a pixel from one frame to the next frame. That is, the optical flow tracks action in the pixels and considers the motion to predict the features with the greatest saliency within the sequence of frames. It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified US Patent 11853876 to incorporate predict different motion directions for different pixels on a same object of the one or more objects given one or more actions to be performed by the robotic agent as taught by JAIN [0074] to preserve the spatial nature of an input, such as the spatial correlation of pixels in an image [0073].)
Claim 3
Claim 2
Claim 4
Claim 3
Claim 5
Claim 4
Claim 6
Claims 1 and 5
Claim 7
Claim 1
Claim 8
Claim 7
Claim 9
Claim 1
However, US Patent 11853876 fails to particularly teach the limitation of claim 9 in the instant application determining the respective likelihood for a given candidate sequence comprises recursively feeding as input to the neural network the actions in the sequence and the next images generated by the neural network for the actions. 

On the other hand, Walker teaches determining the respective likelihood for a given candidate sequence comprises recursively feeding as input to the neural network the actions in the sequence and the next images generated by the neural network for the actions ([Page 2449, Section 5] a modified Recurrent Neural Network, RNN, is proposed for multi-frame prediction. Examiner notes that the broadest reasonable interpretation of recursively feeding as input to the neural network the actions in the sequence and the next images generated is back propagation. Recurrent Neural Networks inherently exercise back propagation through loops and recurrence. It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified US Patent 11853876 to incorporate wherein determining the respective likelihood for a given candidate sequence comprises recursively feeding as input to the neural network the actions in the sequence and the next images generated by the neural network for the actions as taught by Walker [Page 2449, Section 5] in order to predict not just the next frame but a few seconds into future [Page 2449, Section 5].)
Claim 10
Claim 9
Claim 11
Claim 10
Claim 12
Claim 11
Claim 13
Claim 14
Claim 14
Claims 11 and 15
Claim 15
Claim 11
Claim 16
Claim 17
Claim 17
Claim 11
However, US Patent 11853876 fails to particularly teach the limitation of claim 17 in the instant application determining the respective likelihood for a given candidate sequence comprises recursively feeding as input to the neural network the actions in the sequence and the next images generated by the neural network for the actions. 

On the other hand, Walker teaches determining the respective likelihood for a given candidate sequence comprises recursively feeding as input to the neural network the actions in the sequence and the next images generated by the neural network for the actions ([Page 2449, Section 5] a modified Recurrent Neural Network, RNN, is proposed for multi-frame prediction. Examiner notes that the broadest reasonable interpretation of recursively feeding as input to the neural network the actions in the sequence and the next images generated is back propagation. Recurrent Neural Networks inherently exercise back propagation through loops and recurrence. It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified US Patent 11853876 to incorporate wherein determining the respective likelihood for a given candidate sequence comprises recursively feeding as input to the neural network the actions in the sequence and the next images generated by the neural network for the actions as taught by Walker [Page 2449, Section 5] in order to predict not just the next frame but a few seconds into future [Page 2449, Section 5].)
Claim 18
Claim 19
Claim 19
Claim 20
Claim 20
Claim 12
Claim 21
Claim 21


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-3, 5-10, 12-18, and 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (Visual measurement and prediction of ball trajectory for table tennis robot), in view of Tominaga (Image sequence prediction for remote robot control), further in view of Fasola (Fast goal navigation with obstacle avoidance using a dynamic local visual model), further in view of Walker (Dense optical flow prediction from a static image), further in view of Smith (US20160096272A1), and further in view of JAIN (US20170262996A1).

Regarding claim 2, Zhang teaches: receiving data identifying, for each of one or more objects, a respective target location to which a robotic agent interacting with a real-world environment should move the object; and causing the robotic agent to move the one or more objects to one or more target locations ([Page 3200, Section VI] The two smart cameras were mounted on the ceiling behind the table, as shown in Fig. 4. Their view fields were given in Fig. 5, which were only half of the table. The prediction result must be given before the ball flies through the middle line of the table to let the robot have enough time to move and hit the ball back. Therefore, the view fields of the two cameras were limited to the area of half the table. The examiner notes that Zhang teaches a robot that receives data about a ball it is interacting with in the real world to hit it in order for it to land within the limits of a ping pong table.)
However, Zhang is not relied upon to explicitly teach any of the following limitations:
receiving a current image of a current state of the real-world environment, determining, from the current image, a next sequence of actions to be performed by the robotic agent.
using a next image prediction neural network that is configured to predict different motion directions for different pixels on a same object of the one or more objects given one or more actions to be performed by the robotic agent.
wherein the next sequence is a candidate sequence from a plurality of candidate sequences that, if performed by the robotic agent starting from when the environment is in the current state, would be most likely to result in the one or more objects being moved to the respective target locations
directing the robotic agent to perform the next sequence of actions
wherein the next image prediction neural network has been trained on unlabeled training data to receive as input at least a given image and a given input action and to process the input to generate a next image that is an image of a predicted next state of the environment if the robotic agent performs the given input action when the environment is in the current state.
On the other hand, Tominaga teaches receiving a current image of a current state of the real-world environment, determining, from the current image, a next sequence of actions to be performed by the robotic agent ([Page 1135-1136, Section III] using a predicted image sequence in order to avoid time gaps between transmission and arrival… when operator makes moves or rotations, received images are manipulated via changing and/or scaling to construct a predicted image. The examiner notes that Zhang and Tominaga are both directed to robotics control and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate receiving a current image of a current state of the real-world environment, determining, from the current image, a next sequence of actions to be performed by the robotic agent as taught by Tominaga [Page 1135-1136, Section III] to avoid the time gap between image transmission and arrival time in case there was an image transmission delay [Page 1136, Section III.B].)
Furthermore, JAIN teaches using a next image prediction neural network that is configured to predict different motion directions for different pixels on a same object of the one or more objects given one or more actions to be performed by the robotic agent ([0074] The frame may be a two-dimensional (2D) slice or three-dimensional (3D) slide of spatio-temporal data. Furthermore, the frame may be referred to as an image. The attention map provides the recurrent neural network with the location of action in a frame using appearance motion information, such as optical flow. In one configuration, the optical flow may be described by a field of vectors that indicate a predicted movement of a pixel from one frame to the next frame. That is, the optical flow tracks action in the pixels and considers the motion to predict the features with the greatest saliency within the sequence of frames. The examiner notes that Zhang and JAIN are both directed to motion based prediction and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate using a next image prediction neural network that is configured to predict different motion directions for different pixels on a same object of the one or more objects given one or more actions to be performed by the robotic agent as taught by JAIN [0074] to preserve the spatial nature of an input, such as the spatial correlation of pixels in an image [0073].)
Furthermore, Fasola teaches wherein the next sequence is a candidate sequence from a plurality of candidate sequences that, if performed by the robotic agent starting from when the environment is in the current state, would be most likely to result in the one or more objects being moved to the respective target locations ([Page 3-4, Section 5 and Fig. 6] 3 different walking angles evaluated every 300ms… multiple states in navigation algorithm, of which one is chosen at a time to eventually reach the goal. The examiner notes that Zhang and Fasola are both directed to robotics control and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate wherein the next sequence is a candidate sequence from a plurality of candidate sequences that, if performed by the robotic agent starting from when the environment is in the current state, would be most likely to result in the one or more objects being moved to the respective target locations as taught by Fasola [Page 3-4, Section 5 and Fig. 6] to avoid obstacles which is a challenging problem for mobile robots [Page 1, Section 1].)
Furthermore, Fasola teaches directing the robotic agent to perform the next sequence of actions ([Fig. 6] Finite state machine description of the navigation algorithm. The robot starts out in the Walk to Goal state. States such as Localize and Turn in Place may transition to multiple different states depending on the situation and so these states are duplicated in the figure for the sake of clarity. The examiner notes that Fasola teaches that AIBO is directed to follow the algorithm, deciding to take one action after another. The examiner further notes that Zhang and Fasola are both directed to robotics control and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate directing the robotic agent to perform the next sequence of actions as taught by Fasola [Fig. 6] to avoid obstacles which is a challenging problem for mobile robots [Page 1, Section 1].)
Furthermore, Walker teaches the next image prediction neural network has been trained on unlabeled training data ([Page 2450, Section 6] In this paper we have presented an approach to generalized prediction in static scenes. By using an optical flow algorithm to label the data, we can train this model on a large number of unlabeled videos. Furthermore, our framework utilizes the success of deep networks to outperform contemporary approaches to motion prediction. We find that our network successfully predicts motion based on the context of the scene and the stage of the action taking place. The examiner notes that Zhang and Walker are both directed to image prediction and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate the next image prediction neural network has been trained on unlabeled training data as taught by Walker [Page 2450, Section 6] to outperform contemporary approaches to motion prediction [Page 2450, Section 6].)
Furthermore, SMITH teaches receive as input at least a given image and a given input action ([0017-0019] In some implementations, the control signal may be configured to cause the robot to execute the action. The first input type may comprise a digital image comprising a plurality of pixel values. The second input type may comprise a binary indication associated with the action being executed. In some implementations, the training may comprise a plurality of iterations configured based on the training signal. A given iteration may be characterized by a control command and a performance measure associated with the action execution based on the control command. In some implementations, the plurality of pixels may comprise at least 10 pixels. The random selection may be performed based on a random number generation operation. The examiner notes that Zhang and SMITH are both directed to robotics control and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate receive as input at least a given image and a given input action as taught by SMITH [0017-0019] to configure a control signal to cause the robot to execute the action [0017].)
Furthermore, Tominaga teaches process the input to generate a next image that is an image of a predicted next state of the environment if the robotic agent performs the given input action when the environment is in the current state ([Page 1135-1136, Section III] images are manipulated via changing and/or scaling when the operator makes a forward/backward or rotational movement. The examiner notes that Zhang and Tominaga are both directed to robotics control and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate process the input to generate a next image that is an image of a predicted next state of the environment if the robotic agent performs the given input action when the environment is in the current state as taught by Tominaga [Page 1135-1136, Section III] to avoid the time gap between image transmission and arrival time in case there was an image transmission delay [Page 1136, Section III.B].)

Regarding claim 3, Zhang teaches: the current image is an image captured by a camera of the robotic agent ([Page 3196, Section II] The two smart cameras (cameras A and B) and the PC are connected with a local area network based on the TCP/IP protocol [14]. In addition, the images captured by the cameras are also displayed on monitors A and B via the video output lines. The scheme diagram is shown in Fig. 1. The examiner notes that Zhang teaches the use of two cameras that are part of the robotic agent to capture current images.)

Regarding claim 5, Zhang teaches the method of claim 2, however, Zhang is not relied upon to explicitly teach wherein directing the robotic agent to perform the next sequence of actions comprises: directing the robotic agent to interrupt a current sequence of actions being performed by the robotic agent and to begin performing the next sequence of actions However, Fasola teaches wherein directing the robotic agent to perform the next sequence of actions comprises: directing the robotic agent to interrupt a current sequence of actions being performed by the robotic agent and to begin performing the next sequence of actions ([Page 3-4, Section 5 and Fig. 6] AIBO switches between goal-navigation mode and contour-following mode… and is directed to turn in place if obstacle is in front, with the possibility of additional instructions. The examiner notes that Zhang and Fasola are both directed to robotics control and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate wherein directing the robotic agent to perform the next sequence of actions comprises: directing the robotic agent to interrupt a current sequence of actions being performed by the robotic agent and to begin performing the next sequence of actions as taught by Fasola [Page 3-4, Section 5 and Fig. 6] to avoid obstacles which is a challenging problem for mobile robots [Page 1, Section 1].)

Regarding claim 6, Zhang teaches the method of claim 2. However, Zhang is not relied upon to explicitly teach:
the next image prediction neural network is a recurrent neural network that has been trained to
receive as input at least a current image and an input action, and process the input to generate a next image that is an image of a predicted next state of the environment if the robotic agent performs the input action when the environment is in the current state
as part of generating the next image, the recurrent neural network generates a flow map that identifies, for each of a plurality of pixels in the next image, a respective predicted likelihood of the pixel having moved from each of a plurality of pixels in the current image
However, Walker teaches the next image prediction neural network is a recurrent neural network that has been trained to ([Page 2449, Section 5] We present a proof-of-concept network to predict 6 future frames. In order to predict multiple frames into the future, we take our pre-trained single frame network and output the seventh feature layer into a ”temporally deep” network, using the implementation of [2]. This network architecture is the same as an unrolled recurrent neural network with some important differences. The examiner notes that Walker teaches modifying a recurrent neural network to predict multiple images. The examiner further notes that Zhang and Walker are both directed to image prediction and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate the next image prediction neural network is a recurrent neural network that has been trained to as taught by Walker [Page 2449, Section 5] in order to predict multiple frames into the future [Page 2449, Section 5].) 
Furthermore, Tominaga teaches receive as input at least a current image and an input action ([Page 1135-1136, Section III] captured image sequences are sent to the operator side computer… operator makes a forward/backward or rotation movement. The examiner notes that Zhang and Tominaga are both directed to robotics control and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate receive as input at least a current image and an input action as taught by Tominaga [Page 1135-1136, Section III] to avoid the time gap between image transmission and arrival time in case there was an image transmission delay [Page 1136, Section III.B].)
Furthermore, Tominaga teaches process the input to generate a next image that is an image of a predicted next state of the environment if the robotic agent performs the input action when the environment is in the current state ([Page 1135-1136, Section III] images are manipulated via changing and/or scaling when the operator makes a forward/backward or rotational movement. The examiner notes that Zhang and Tominaga are both directed to robotics control and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate process the input to generate a next image that is an image of a predicted next state of the environment if the robotic agent performs the input action when the environment is in the current state as taught by Tominaga [Page 1135-1136, Section III] to avoid the time gap between image transmission and arrival time in case there was an image transmission delay [Page 1136, Section III.B].)
Furthermore, Walker teaches as part of generating the next image, the recurrent neural network generates a flow map that identifies ([Page 2445, Fig. 2] network similar to standard 7-layer architecture… for every pixel in the image, distribution of motions are predicted with directions and magnitudes, etc.. The examiner notes that Zhang and Walker are both directed to image prediction and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate as part of generating the next image, the recurrent neural network generates a flow map that identifies as taught by Walker [Page 2445, Fig. 2] in order to perform many recognition tasks [Page 2445, Fig. 2].)
Furthermore, Walker teaches for each of a plurality of pixels in the next image, a respective predicted likelihood of the pixel having moved from each of a plurality of pixels in the current image ([Page 2445, Section 3.1, Fig. 2] optical flow vectors quantized… probability distribution over flow vectors for each pixel, average of the vectors taken to produce the final prediction output for each pixel. The examiner notes that Walker teaches the use of flow vectors for each pixel, giving a probability distribution a pixel will move in a certain direction. This is done for every single pixel in a chronological order (using a current image to predict a future image/frame based on movement probability). The examiner further notes that Zhang and Walker are both directed to image prediction and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate for each of a plurality of pixels in the next image, a respective predicted likelihood of the pixel having moved from each of a plurality of pixels in the current image as taught by Walker [Page 2445, Section 3.1, Fig. 2] in order to learn a mapping between the input RGB image and the output space which corresponds to the pre reformulating structured regression as a classification problem [Page 2444-2445, Section 3].)

Regarding claim 7, Zhang teaches the method of claim 6. However, Zhang is not relied upon to explicitly teach determining, using flow maps generated by the next image prediction neural network, a respective likelihood for each of the candidate sequences that performance of the actions in the candidate sequence by the robotic agent would result in the objects being moved to the target locations. However, Walker teaches determining, using flow maps generated by the next image prediction neural network, a respective likelihood for each of the candidate sequences that performance of the actions in the candidate sequence by the robotic agent would result in the objects being moved to the target locations ([Page 2447, Fig. 4] network finds active elements in the scene and correctly predicts future motion based on context. Examiner notes that the network taught by Walker does not only predict the overall motion of the image, but also individual elements in the scene, which can be mapped to the object being moved in the claim language. For example, in multiple scenes such as the surfing scene on the second row, the directions for the overall wave in addition to the surfing and the crashing waves are predicted. In addition, in the archery scene, the motion of the archer’s hand and the bow are both predicted. The object being moved by the target in the claim language could be mapped to an individual element, such as the bow. And the object that moves the bow could be the human. Both of these elements are successfully predicted by Walker. The examiner further notes that Zhang and Walker are both directed to image prediction and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate determining, using flow maps generated by the next image prediction neural network, a respective likelihood for each of the candidate sequences that performance of the actions in the candidate sequence by the robotic agent would result in the objects being moved to the target locations as taught by Walker [Page 2447, Fig. 4] in order to learn a mapping between the input RGB image and the output space which corresponds to the pre reformulating structured regression as a classification problem [Page 2444-2445, Section 3].)

Regarding claim 8, Zhang teaches: wherein determining the next sequence of actions comprises: determining one or more pixels in the current image that depict the one or more objects as currently located in the environment ([Page 3197, Section III, Subsection A] for ball recognition, pixel values are analyzed to recognize the ball from the background and players)

Regarding claim 9, Zhang teaches the method of claim 7. However, Zhang is not relied upon to explicitly teach wherein determining the respective likelihood for a given candidate sequence comprises recursively feeding as input to the neural network the actions in the sequence and the next images generated by the neural network for the actions. However, Walker teaches wherein determining the respective likelihood for a given candidate sequence comprises recursively feeding as input to the neural network the actions in the sequence and the next images generated by the neural network for the actions ([Page 2449, Section 5] a modified Recurrent Neural Network, RNN, is proposed for multi-frame prediction. Examiner notes that the broadest reasonable interpretation of recursively feeding as input to the neural network the actions in the sequence and the next images generated is back propagation. Recurrent Neural Networks inherently exercise back propagation through loops and recurrence. The examiner further notes that Zhang and Walker are both directed to image prediction and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate wherein determining the respective likelihood for a given candidate sequence comprises recursively feeding as input to the neural network the actions in the sequence and the next images generated by the neural network for the actions as taught by Walker [Page 2449, Section 5] in order to predict not just the next frame but a few seconds into future [Page 2449, Section 5].)

Regarding claim 10, Zhang teaches the method of claim 2, however, Zhang is not relied upon to explicitly teach sampling the candidate sequences from a distribution over possible action sequences. However, Fasola teaches sampling the candidate sequences from a distribution over possible action sequences ([Page 3-4, Section 5 and Fig. 6] multiple states in the navigation algorithm, and at each step, one of the candidates of steps is selected. The examiner notes that Zhang and Fasola are both directed to robotics control and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate sampling the candidate sequences from a distribution over possible action sequences as taught by Fasola [Page 3-4, Section 5 and Fig. 6] to avoid obstacles which is a challenging problem for mobile robots [Page 1, Section 1].)

Claims 12-18 are rejected based upon the same rationale as the rejection of claims 2, 5-10 since they are the system claims corresponding to the method claims.

Claims 20-21 are rejected based upon the same rationale as the rejection of claims 2, and 5 since they are the non-transitory computer storage media claims corresponding to the method claims.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang (Visual measurement and prediction of ball trajectory for table tennis robot), in view of Tominaga (Image sequence prediction for remote robot control), further in view of Fasola (Fast goal navigation with obstacle avoidance using a dynamic local visual model), further in view of Walker (Dense optical flow prediction from a static image), further in view of Smith (US20160096272A1), and further in view of JAIN (US20170262996A1), and further in view of Hohl (Aibo and Webots Simulation wireless remote control and controller transfer - 2006).

Regarding claim 4, Zhang teaches the method of claim 2. However, Zhang is not relied upon to explicitly teach providing, for presentation to a user, a user interface that allows the user to specify the objects to be moved and the target locations. However, Hohl teaches providing, for presentation to a user, a user interface that allows the user to specify the objects to be moved and the target locations ([Page 474-475, Section 3, Fig. 3] GUI runs on a host computer and commands can be given to the AIBO robot via the GUI. The examiner further notes that Zhang and Hohl are both directed to robotic control and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate providing, for presentation to a user, a user interface that allows the user to specify the objects to be moved and the target locations as taught by Hohl [Page 474-475, Section 3, Fig. 3] to allow a Webots user to control both a real robot and its physical simulation [Page 474, Section 3].)

Claims 11, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (Visual measurement and prediction of ball trajectory for table tennis robot), in view of Tominaga (Image sequence prediction for remote robot control), further in view of Fasola (Fast goal navigation with obstacle avoidance using a dynamic local visual model), further in view of Walker (Dense optical flow prediction from a static image), further in view of Smith (US20160096272A1), and further in view of JAIN (US20170262996A1), and further in view of Bartels (Smoothness constraints in recursive search motion estimation for picture rate conversion).

Regarding claim 11, Zhang teaches the method of claim 10. However, Zhang is not relied upon to explicitly teach wherein sampling the candidate sequences comprises: performing multiple iterations of sampling using a cross-entropy technique. However, Bartels teaches wherein sampling the candidate sequences comprises: performing multiple iterations of sampling using a cross-entropy technique ([Page 1312, Section III, Algorithm 1] vector candidate likelihood algorithm may be repeated over the same frame pair until convergence is reached… multiple iterations over the same frame pair improve convergence. The examiner notes that Bartels teaches the use of an algorithm that selects vector candidate sets and determines likelihood of each candidate in the set. Bartels further teaches that the algorithm can be repeated over the same pixels to improve convergence rather than running it only once. This also takes into account differences in estimation, otherwise known as a cross-entropy technique. The examiner further notes that Zhang and Bartels are both directed to motion detection and estimation and both are reasonably analogous to each other. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhang’s robotic controller to incorporate wherein sampling the candidate sequences comprises: performing multiple iterations of sampling using a cross-entropy technique as taught by Bartels [Page 1312, Section III, Algorithm 1] in order to maximizes the local conditional probabilities, by minimizing a variant of EL +ES [Page 1312, Section III, Algorithm 1].)

Claims 19 is rejected based upon the same rationale as the rejection of claim 11 since it is the system claim corresponding to the method claim.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
The following references have been determined to be related to the application, but were not applied in any specific rejection. They are nonetheless listed below for reference.
Meier (US20150217449A1)
“MEIER teaches a method that responds to users' corrective commands to generate and refine a policy for determining appropriate robotic actions based on sensor-data input”
STEIN (US20160325753A1)
“STEN teaches a method for determining a road profile along a predicted path”
Shirado (US 2011/0238211 Al)
“Shirado teaches a robot device including a behavior plan unit that detects a current status of the robot device or an external environment and that decides one behavior plan of a plurality of behavior plan candidates as a future behavior plan”
Benaim (US 2017/0136621 Al)
“Benaim teaches an adaptive learning interface system for end-users for controlling one or more machines or robots to perform a given task, combining identification of gaze patterns, EEG channel's signal patterns, voice commands and/or touch commands”
Powers (US 2012/0072052 Al)
“Powers teaches a control unit and a user interface that allow a user to identify a mode of display and interaction that narrows the user's options for his next interaction with the user interface”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAMCY ALGHAZZY whose telephone number is (571) 272-8824. The examiner can normally be reached Monday-Friday 8:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ RIVAS can be reached on (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAMCY ALGHAZZY/Examiner, Art Unit 2128       

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128
Read full office action
Prosecution Timeline

Nov 13, 2023
Application Filed
Jun 14, 2025
Non-Final Rejection — §103, §DP
Sep 22, 2025
Response Filed
Dec 27, 2025
Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/613,773
Patent 12596925
SINGLE-STAGE MODEL TRAINING FOR NEURAL ARCHITECTURE SEARCH
2y 5m to grant Granted Apr 07, 2026
18/612,881
Patent 12596922
ACCELERATING NEURAL NETWORKS IN HARDWARE USING INTERCONNECTED CROSSBARS
2y 5m to grant Granted Apr 07, 2026
19/236,733
Patent 12579408
ADAPTIVELY TRAINING OF NEURAL NETWORKS VIA AN INTELLIGENT LEARNING MANAGEMENT SYSTEM
2y 5m to grant Granted Mar 17, 2026
17/704,176
Patent 12572847
SYSTEMS AND METHODS FOR RESOURCE-AWARE MODEL RECALIBRATION
2y 5m to grant Granted Mar 10, 2026
16/678,038
Patent 12566966
TRAINING ADAPTABLE NEURAL NETWORKS BASED ON EVOLVABILITY SEARCH
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
48%
Grant Probability
49%
With Interview (+0.7%)
3y 11m
Median Time to Grant
Moderate
PTA Risk
Based on 62 resolved cases by this examiner. Grant probability derived from career allow rate.