Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed on 11/24/2025 has been entered. Claims 1-10, 12-20 remain pending in the application. based on the amendment, the 101 rejection has been withdrawn.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/24/2025 and 11/07/2024.The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 5-8, 10, 15, 17, 18, 19, 20 are rejected under 35 U.S.C. 103 as being unpatentable by Chen (US20220184806) in view of Holson (US20230015796) and Aparicio (US20240335941).
Regarding claim 1, Chen teaches a processing system for controlling a device using machine learning models, comprising:
at least one memory having executable instructions stored thereon; and one or more processors configured to execute the executable instructions to cause the processing system to ([0020], [0027]-[0028]):
access data characterizing a physical environment in which the device is operating ([0029] disclosing images of environment);
generate a first set of affordable actions based on processing the data via ([0030]-[0033] disclosing a set of potential actions “affordable” based on the current state of the environment, at least [0037] disclosing locations for the desired goal and the desired pickup);
generate, via a second set of machine learning([0030]-[0033] disclosing based on the set of affordable actions to select via a prediction model an action from the set of actions); and
cause the device to execute the first selected action (0032] disclosing an actuator performing the action).
While Chen does not explicitly disclose the machine learning model to generate candidate actions.
Holson teaches the neural network to generate candidate actions and select from the actions ([0040]-[0043], [0087]-[0090] disclosing machine learning to generate an operation to be carried and position to be selected based on a task and selecting the task based on confidence value ).
it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teaching of Chen to incorporate the teaching of Holson of machine learning model in order to generate the output that defines the operation to be carried out at the interaction point or environmental conditions as taught by Holson [0041], it would be obvious to combine or substitute the neural network with the model of Chen in order to generate tasks such as a pickup task for a specific object such as a cup or another physical object based on pre-trained data thus improving efficiency and accuracy of actions tailored to objects [0041].
Chen as modified by Holson does not teach wherein to generate the first set of affordable actions, the one or more processors are configured to cause the processing system to generate a set of output affordance maps, wherein each output affordance map of the set of output affordance maps corresponds to a respective affordable action of the first set of affordable actions, An affordance map of the set of output affordance maps comprises a pixel encoding indicating a region of a first object in the physical environment.
Aparicio teaches wherein to generate the first set of affordable actions, the one or more processors are configured to cause the processing system to generate a set of output affordance maps, wherein each output affordance map of the set of output affordance maps corresponds to a respective affordable action of the first set of affordable actions, ([0005], [0020]-[0030] disclosing affordance maps that defines actions to be performed, the respective grasps at pixels with their success metric).
An affordance map of the set of output affordance maps comprises a pixel encoding indicating a region of a first object in the physical environment (at least [0011]-[0030] disclosing the labeling of a region to be grasped on the physical object at a pixel, i.e., pixel encoding of the region of a physical object, in fact Aparicio teaches the affordance is determined for a first object, second object and plurality of objects and the control to grasp at least a first and second object).
It would have been obvious to incorporate the teaching of Aparicio in order to define actions that are feasible based on the affordance map as taught by Aparicio [0005]. The combination of Aparicio teaching of an affordance map is obvious yielding predictable results in order to determine graspable metric and probabilities associated with the affordance grasps improving grasping objects.
Regarding claim 5, Chen as modified by Holson and Aparicio teaches the processing system of claim 1, wherein the one or more processors are further configured to cause the processing system to:
determine that the device has executed the first selected action ([0043]-[0044]Chen disclosing the object has been removed);
access second data characterizing the physical environment after execution of the first selected action ([0043-[0044] disclosing the physical state is characterized as the object has been removed);
generate a second set of affordable actions based on processing the second data via the first set of machine learning models, wherein each respective affordable action of the second set of affordable actions indicates an action that can be performed conditioned on the device having executed the first selected action ([0043]-[0045] disclosing the second actions are determined for each state taking consideration of any previous state and action that has been performed);
generate, via the second set of machine learning models, a second selected action to be performed with a second object in the physical environment based on the task and the second set of affordable actions ([0043]-[0045] disclosing determining the action on an object, herein the second object is just interpreted as the object since there is no previous action on any first object and Aparicio in claim 1 already teaches the affordances for plurality of objects); and
cause the device to execute the second selected action (at least [0042]-[0045] disclosing the actuator performing the action).
Holson teaches the neural network to generate candidate actions ([0040]-[0043], [0087]-[0090] disclosing machine learning to generate an operation to be carried and position to be selected based on a task and selecting the task based on confidence value ).
it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teaching of Chen as modified by Holson and Aparicio to incorporate the teaching of Holson of machine learning model in order to generate the output that defines the operation to be carried out at the interaction point or environmental conditions as taught by Holson [0041], it would be obvious to combine or substitute the neural network with the model of Chen in order to generate tasks such as a pickup task for a specific object such as a cup or another physical object based on pre-trained data thus improving efficiency and accuracy of actions tailored to objects [0041].
Regarding claim 6, Chen as modified by Holson and Aparicio teaches the processing system of claim 5, wherein the second set of affordable actions includes at least one action that is not included in the first set of affordable actions, and wherein the first set of affordable actions includes at least one action that is not included in the second set of affordable actions (Chen [0043]-[0046] disclosing the second action is selected after the object is removed thus any action for the object is not included).
Regarding claim 7, Chen as modified by Holson and Aparicio teaches the processing system of claim 5, wherein:
the device comprises a robot, to cause the device to execute the first selected action, the one or more processors are configured to cause the processing system to cause the robot to grasp the second object located in the physical environment, and the second selected action comprises an action performed by the robot with the second object (Chen [0043]-[0044] disclosing an action of picking and grasping the object and placing the object as another action).
Regarding claim 8, Chen as modified by Holson and Aparicio teaches the processing system of claim 7, wherein the one or more processors are further configured to cause the processing system to: generate, via the second set of machine learning models, a third selected action to be performed based on the task and a third set of affordable actions, wherein each respective affordable action of the third set of affordable actions indicates an action that can be performed conditioned on the device having executed the second selected action; and cause the device to execute the third selected action (Chen [0043]-[0047] disclosing plurality of actions to be performed after each other and depend on current state and task and previous state which is interpreted to include a third action from which the next action is selected and it is interpreted that a next action is conditioned on the completion of the previous action).
Regarding claim 10, Chen as modified by Holson and Aparicio teaches the processing system of claim 1, wherein each respective affordable action of the first set of affordable actions further indicates a respective probability that the affordable action can be performed at the location in the physical environment, and wherein the first selected action comprises an affordable action having a highest probability of the first set of affordable actions (Chen [0038] at least disclosing the probability that the first action has the most success to achieve the desired task).
Regarding claim 15, Chen as modified by Holson and Aparicio teaches the processing system of claim 1, wherein each affordable action of the first set of affordable actions corresponds to a respective set of action parameters, and wherein each set of action parameters corresponds to at least one of a location of a second object in the physical environment, an orientation in which to interact with the second object in the physical environment, or a force to be applied to the second object in the physical environment (Chen [0027]-[0044] disclosing the parameter includes a location of an object in the environment).
Regarding claim 17, Chen as modified by Holson and Aparicio teaches the processing system of claim 1, wherein to generate the first set of affordable actions, the one or more processors are further configured to cause the processing system to:
generate, via a first generative artificial intelligence model, an execution plan including a plurality of sub-actions to complete a task in the physical environment, each respective sub-action identifying a respective operation to perform on a respective object in the physical environment ([0027]-[0044] disclosing a plurality of actions and sub actions to perform on an object in an environment); and
Aparicio further teaches for each respective sub-action in the execution plan, generate a respective affordable action based on an object map identifying a location of the respective object in the physical environment and an identified actionable point associated with the respective object ([0022]-[0026] disclosing selecting an action of grasping an object in an environment based on its location and a grasping point).
It would have been obvious to one of ordinary skill in the art to have modified the teaching Chen as modified by Holson to incorporate the teaching of Aparicio of for each respective sub-action in the execution plan, generate a respective affordable action based on an object map identifying a location of the respective object in the physical environment and an identified actionable point associated with the respective object in order to securely grasp an object at a position indicated in an image and at a affordable point thus securing the grasping action as taught by Aparicio [0022]-[0026]. It is also obvious to combine the teaching of Aparicio yielding predictable results, the depth map aids in identifying the pose of an object which improves object manipulation.
Regarding claim 18, Chen as modified by Holson and Aparicio teaches processing system of claim 17, wherein to generate the first selected action to be performed in the physical environment, the one or more processors are configured to cause the processing system to generate, via a second generative artificial intelligence model, executable code for performing the first selected action based on a first sub-action in the execution plan (Chen [0027]-[0044] disclosing each action is performed before the other action).
Claim 19 is rejected for similar reasons as claim 1.
Claim 20 is rejected for similar reasons as claim 17, see above rejection.
Claims 2 are rejected under 35 U.S.C. 103 as being unpatentable by Chen (US20220184806) in view of Holson (US20230015796) and Aparicio (US20240335941) and Nagarajan (US11097418).
Regarding claim 2, Chen as modified by Holson as modified by Holson and Apricio teaches the processing system of claim 1, wherein the one or more sub-tasks comprises a sequence of sub-tasks, wherein the one or more processors are further configured to cause the processing system to decompose, via the second set of machine learning models, the task into the sequence of sub-tasks (Chen [0036]-[0037] disclosing predicting promising action sequence. [0038] disclosing a task includes more than one motion. [0041]-[0042] action sequence).
Nagarajan teaches a neural network decomposing a task (col. 3.-col. 4 disclosing machine learning selecting a grasp strategy including sliding a plate to an edge and then grasping, i.e., decomposing the task of grasping, at least col. 16 lines 50-60 discloses the machine learning is a neural network).
it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teaching of Chen as modified by Holson and Apricio to incorporate the teaching of Nagarajan of a neural network decomposing a task in order to determine the best strategy such as decomposing the grasp into a pregrasp and then a grasp based on classification of an object thus improving robotic action and facilitating an action that otherwise would be difficult as taught by Nagarajan col. 4. It is also obvious to try the method of using the neural network to predict the motion instead of an algorithm with reasonable expectation of success and improving the action for objects by tailoring a decomposed action for them based on classification.
Claims 3, 4, 9 are rejected under 35 U.S.C. 103 as being unpatentable by Chen (US20220184806) in view of Holson (US20230015796) and Aparicio (US20240335941) and Su (US20210122039).
Regarding claim 3, Chen as modified by Holson and Aparicio does not teach the processing system of claim 1, wherein: each affordable action of the first set of affordable actions corresponds to a respective set of action parameters. to generate, via the second set of machine learning models, the first selected action, the one or more processors are configured to cause the processing system to: select a first affordable action included in the first set of affordable actions; and modify the set of action parameters associated with the first affordable action to generate a set of modified action parameters. the device executes the first selected action based on the set of modified action parameters
Su teaches each affordable action of the first set of affordable actions corresponds to a respective set of action parameters ([0158] disclosing a set of grasps that can be applied that has adjusted parameter such as increasing a force and to reorient the object).
to generate, via the second set of machine learning models, the first selected action, the one or more processors are configured to cause the processing system to: select a first affordable action included in the first set of affordable actions; and modify the set of action parameters associated with the first affordable action to generate a set of modified action parameters ([0129 disclosing the machine learning. [0157]-[0159] disclosing selecting a first way to hold the object such as from sides or top, orientation “parameter” and deciding to reorienting the object to avoid slip and or using force and or reducing the force which are adjusted parameters of an action, also the adjusted orientation or grasp is an adjusted parameter).
the device executes the first selected action based on the set of modified action parameters ([0157]-[0159] disclosing changing an orientation, see figure 6c showing the change of orientation and the points of grasp).
It would have been obvious to one of ordinary skill in the art to have modified the teaching of Chen as modified by Holson and Aparicio to incorporate the teaching of Wen of to generate, via the second set of machine learning models, the first selected action, the one or more processors are configured to cause the processing system to: select a first affordable action included in the first set of affordable actions; and modify the set of action parameters associated with the first affordable action to generate a set of modified action parameters . The combination is obvious to one of ordinary skill in the art to modify the action in order to reduce slip thus improving an action performed by a robot such as securely holding an object yielding predictable results and helps the robot to carry the object in a current possible way due to limitations in the environment and further adjust the grasp as taught by Su [0157]-[0159].
Regarding claim 4, Chen as modified by Holson and Aparicio and Su teaches the processing system of claim 3,
Specifically Su teaches wherein to cause the device to execute the first selected action, the one or more processors are configured to cause the processing system to convert the set of modified action parameters into one or more control signals for output to the device in the physical environment (at least [0129] disclosing the processor implements machine learning. [0158]-[0159] disclosing the robotic system carrying the maneuver based on the parameter).
The combination is obvious to one of ordinary skill in the art to modify the action in order to reduce slip thus improving an action performed by a robot such as securely holding an object yielding predictable results.
Regarding claim 9, Chen as modified by Holson as modified by Holson and Aparicio teaches the processing system of claim 1,
Su teaches wherein the one or more processors are configured to cause the processing system to, while the device is executing the first selected action:
monitor, via the second set of machine learning models, a state of the device; and in response to a determination, via the second set of machine learning models, that the first selected action should be modified: adjust, via the second set of machine learning models, one or more action parameters corresponding to the first selected action, or cause the device to stop performing the first selected action ([0158]-[0159] disclosing a set of grasps that can be applied that has adjusted parameter such as increasing a force and to reorient the object while monitoring the forces and moments by the computer and performing the action).
The combination is obvious to one of ordinary skill in the art to modify the action in order to reduce slip thus improving an action performed by a robot such as securely holding an object yielding predictable results.
Claims 12 are rejected under 35 U.S.C. 103 as being unpatentable by Chen (US20220184806) in view of Holson (US20230015796) and Aparicio (US20240335941) and Garg (US20240386733).
Regarding claim 12, Chen as modified by Holson and Aparicio teaches the processing system of claim 1. But does not teach wherein the one or more processors are further configured to cause the processing system to decompose the set of output affordance maps into a plurality of patches.
Garg further teaches decompose the set of output affordance maps into a plurality of patches ([0018] disclosing the affordances “i.e., a map is decomposed into affordances” are given to the machine learning );
generate a plurality of embeddings based on the plurality of the patches; and output the plurality of embeddings to the second set of machine learning models ([0018] disclosing generating knowledge as embeddings)
it would have been obvious to one of ordinary skill in the art to have modified the teaching of Chen as modified by Holson and Aparicio to incorporate the affordances map of Aparicio decomposed into the affordances of Garg in order to facilitate the action planning as taught by Garg [0018].
Claims 13 are rejected under 35 U.S.C. 103 as being unpatentable by Chen (US20220184806) in view of Holson (US20230015796) and Aparicio (US20240335941) and Fulda (“NPL” what can you do with a rock? Affordance extraction via word embeddings)
Regarding claim 13, Chen as modified by Holson and Aparicio teaches processing system of claim 1,
Fulda teaches wherein to generate the first set of affordable actions, the one or more processors are configured to cause the processing system to generate a set of word tokens, wherein each word token of the set of word tokens corresponds to a textual description of a respective affordable action of the first set of affordable actions (at least abstract disclosing affordable actions to devices based on the object as text enabling an agent to perform tasks).
It would have been obvious to incorporate the teaching of Fulda in order to facilitate robotic acquisition of actionable insights without the perquisite of offline reinforcement learning or large number of data and in addition to make the agent selection resemble human selection as taught by Fulda (abstract).
Claims 14 are rejected under 35 U.S.C. 103 as being unpatentable by Chen (US20220184806) in view of Holson (US20230015796) and Aparicio (US20240335941) and Garg (US20240386733) and Fulda (“NPL” what can you do with a rock? Affordance extraction via word embeddings).
Regarding claim 14, Chen as modified by Holson and Aparicio and Fulda teaches the processing system of claim 13,
Garg teaches wherein the set of word tokens is based on a set of features extracted from the set of output affordance maps (Garg [0018] disclosing the text is based on affordances generated by machine learning).
it would have been obvious to one of ordinary skill in the art to have modified the teaching of Chen as modified by Holson and Aparicio and Fulda to incorporate the affordances map of Aparicio decomposed into the affordances of Garg in order to facilitate the action planning as taught by Garg [0018].
Claims 16 are rejected under 35 U.S.C. 103 as being unpatentable by Chen (US20220184806) in view of Holson (US20230015796) and Aparicio (US20240335941) and Claussen (US20200254609).
Regarding claim 16, Chen as modified by Holson and Aparicio teaches the processing system of claim 1,
Claussen teaches wherein the first set of machine learning models comprises a set of convolutional neural networks, and wherein each convolutional neural network included in the set of convolutional neural networks corresponds to a different type of action that can be performed by the device (abstract disclosing a plurality of task specific neural networks to perform a task for the robot, see claim 12 disclosing making a full neural network task specific).
It would have been obvious to have modified the teaching of Chen as modified by Holson to incorporate the teaching of Claussen in order to reduce the load since training a single algorithm can become prohibitive as taught by Claussen [0003].
Response to Arguments
Applicant’s arguments filed on 11/24/2025 have been fully considered but they are not persuasive.
With respect to applicant’s arguments regarding the 101 rejection on record, the rejection has been withdrawn.
With respect to applicant’s arguments regarding the 103 rejection, specifically regarding amended claim 1 incorporating the subject matter of cancelled claim 11, Aparicio in at least [0011]-[0020] teaches “generating the affordable action maps wherein the affordance map comprises a pixel encoding indicating a region of a first object in the physical space”, Aparicio teaches the labeling of a region to be grasped on the physical object at a pixel, i.e., pixel encoding of the region of a physical object). The combination of Aparicio is obvious in order to label regions and indicate a success metric of each region which aids in the selection of the best candidate grasp improving the grasp ability of robots.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to
applicant's disclosure. The prior art cited in PTO-892 and not mentioned above disclose related devices and methods.
NPL “Robot-enabled construction assembly with automated sequence planning based on ChatGPT: ROBOGPT” disclosing LLM to simplify robot programming into natural language sequential tasks.
US20240351218 disclosing the LLM machine learning generates a sequence of tasks to accomplish a task.
US20230080768 disclosing changing the sequence of tasks on the fly based on the current state of the object.
US20230104775 disclosing CNN NEURAL NETWORK decomposing tasks from videos of humans.
US20240131698 disclosing specific neural network for each task and subtask.
US20230278213 disclosing adjusting a regrasp in order to perform an assembly operation.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMAD O EL SAYAH whose telephone number is (571)270-7734. The examiner can normally be reached on M-Th 6:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ramon Mercado can be reached on (571) 270-5744. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MOHAMAD O EL SAYAH/Examiner, Art Unit 3658B