DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/29/2026 has been entered.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dan et al. (US 20230373098 A1, hereinafter Dan) in view of Sun (US 10789543 B1).
Regarding claim 1, Dan discloses:
A non-transitory machine-readable medium including instructions, which when executed by processing circuitry, cause the processing circuitry to perform operations to:
identify a current context of a robotic device, the current context including identification of at least one object present in a field of view (at least as in paragraph 0021, “Information about one or more objects in the workspace of the robot 18, such as identification and/or location information, may also be received from one or more environmental sensors 12, such as a robot vision system 12 (22), or captured by an environmental sensor module 12… The method may then determine whether the operator input is sufficient to translate the operator input into software commands that will direct the robot 10 to perform the desired robot movement (24)”; at least as in paragraph 0015, “the environment sensor module 12 may utilize different sensors including a camera for object identification”; at least as in paragraph 0012, “several choices may be offered (e.g., by showing the objects that are visible to the robot's camera)”);
receive a user input comprising user motion corresponding to the robotic device or the current context (at least as in paragraph 0021, “In the method, and operator input which represents a desired robot movement is received (20), for example, from an operator communication module 10”; at least as in paragraph 0015, “It is understood that the list of operator communication channels (i.e., inputs to the operator module 10) is not limited to the examples in FIG. 1, but depending on the preferred means of operator input and feedback, may include a microphone and speakers for audio, cameras (either stand-alone or integrated into augmented-/virtual reality glasses) for gesture recognition or superposition of robotic tasks in space (without the robot being present or having been programmed), haptic gloves for tactile feedback of manipulation of robotic tasks, monitors for the implementation of user interfaces (UI), touch screens or a tablet for touch selection, etc.”);
automatically extract, from a basic skills library, one or more possible actions for the at least one object from video data of the at least one object being manipulated (at least as in paragraph 0022, “The operator inputs (and additional operator inputs) are then translated into software commands that will direct the robot 18 to perform the desired robot movement (26), for example, by an actuation module 10… It may be desirable to store the operator inputs, additional operator inputs and/or translated software commands in a library storage with data links therebetween so that future operator inputs may be able to be autonomously linked to such additional operator inputs and/or software commands (28). The library storage may be included as part of the actuation module 14 and may be a skill set library that recognizes operator inputs”; at least as in paragraph 0017, “The set of operator commands that are recognized by the controller module 16 may be called a skill set and may be dynamically extended over the robot's lifetime”; at least as in paragraph 0020, “The controller module 16 registers the additional operator inputs in an appropriate form for future use... Importantly, the format allows learned skills to be transferrable between robots 18”);
determine from the current context and the user input, a set of actions performable by the robotic device corresponding to the at least one object and predicted to correspond to the user input based on a trajectory of the user motion, the set of actions including the one or more possible actions (at least as in paragraph 0018, “If the skill (i.e., pick) and object (i.e., tube) are known to the controller module 16, the controller module 16 may proceed to request a task (or set of tasks) execution(s) to the actuation module 14 which will translate the tasks into low-level robot software commands for the robot 18 controller”; at least as in paragraph 0021, “The method may then determine whether the operator input is sufficient to translate the operator input into software commands that will direct the robot 10 to perform the desired robot movement (24)”);
output control signals that cause the robotic device to perform the action to interact with the at least one object (at least as in paragraph 0023, “Once the operator inputs have been translated into software commands understandable to the robot 18, the robot 18 may be programmed with such software instructions (30). Subsequent to such programming, the robot 18 may be operated using the software commands in order to perform the desired robot movement (32)”).
But, Dan does not explicitly teach:
automatically select an action of the set of actions based on an acyclic graph describing action paths.
However, Sun, in the same field of endeavor of controlling robots to perform on objects to complete a desired task based on functional relationships between objects, specifically teaches:
automatically select an action of the set of actions based on an acyclic graph describing action paths (at least as in col. 13-14, ln. 62-40, wherein after receiving a command to perform a given task, “the robot can search the FOON for the object node associated with the goal of the task and add it to a task tree, as indicated in block 32… Referring next to block 34, the child motion node associated with the goal object node is identified and added to the task tree… Next, as indicated in block 36, the one or more child object nodes associated with the motion node is/are identified and added to the task tree… Flow from this point depends upon whether or not each of the child objects associated with the motion node are available or not, as indicated in decision block 38… flow can continue from decision block 38 to block 40 at which each manipulation motion of the task tree can be performed in reverse order using the various required objects” thus the robot continues to add nodes to the task tree until all objects associated with the node are available in the environment; see also Fig. 10 and 13; at least as in col. 5, ln. 9-14, “A FOON may be a directed semi-acyclic graph as there may be some instances of loops when an object does not change states from taking part in some action. A motion does not necessarily cause a change in an object, as certain objects will remain in the same state”; at least as in col. 12, ln. 28-31, “Once the task tree has been generated, it can then be used to generate a task sequence that contains a series of motions, objects, and their states, which provides step-by-step commands for a robot to execute”; at least as in col. 12, ln. 3-27, wherein multiple task trees can be generated and the cost of each tree can be compared to determine the most efficient tree).
Therefore, it would have been obvious to one of the ordinary skill in the art at the effective filing date of the instant invention to modify the teachings of Dan, to include Sun's teaching of a robot system generating a task sequence utilizing a task tree and FOON, since Sun teaches wherein the system can provide robots with the ability to understand a task in the format of object states and object-related functional motions and generate motions based on the learning motions to manipulate the objects properly for a desired task thus improving object and action recognition.
Regarding claim 2, in view of the above combination of Dan and Sun, Dan further discloses:
The non-transitory machine-readable medium of claim 1, wherein the basic skills library includes one or more dynamic movement actions including at least one of grasping an object, moving an object, or releasing an object (at least as in paragraph 0018, “the operator may desire the robot to “pick the tube and place it in the red bin” or “after a tube is available in the blue bin, pick it and shake it”… If the skill (i.e., pick) and object (i.e., tube) are known to the controller module 16, the controller module 16 may proceed to request a task (or set of tasks) execution(s) to the actuation module 14 which will translate the tasks into low-level robot software commands for the robot 18 controller”; at least as in paragraph 0012, “A first step of the interaction involves the operator instructing the robot to perform a sequence of operations by providing high level concepts (e.g., pick, move, place)”).
Regarding claim 3, in view of the above combination of Dan and Sun, Dan further discloses:
The non-transitory machine-readable medium of claim 1, wherein the current context includes identification of all objects that are in a field of view, a trajectory of the robotic device, and a task to be completed (at least as in paragraph 0021, “operator input which represents a desired robot movement is received (20), for example, from an operator communication module 10. As described above, the operator input may be received as a voice input 10A that is transcribed to text or may be received from a display screen 10B, such as a touchscreen, or any other possible user interface. Information about one or more objects in the workspace of the robot 18, such as identification and/or location information, may also be received from one or more environmental sensors 12, such as a robot vision system 12 (22), or captured by an environmental sensor module 12”; at least as in paragraph 0015, “It is understood that the list of operator communication channels (i.e., inputs to the operator module 10) is not limited to the examples in FIG. 1, but depending on the preferred means of operator input and feedback, may include a microphone and speakers for audio, cameras (either stand-alone or integrated into augmented-/virtual reality glasses) for gesture recognition or superposition of robotic tasks in space (without the robot being present or having been programmed), haptic gloves for tactile feedback of manipulation of robotic tasks, monitors for the implementation of user interfaces (UI), touch screens or a tablet for touch selection, etc. Likewise, the environment sensor module 12 may utilize different sensors including a camera for object identification, robotic skin or force sensors for contact, gripping or collision detection, lidar/radar for distance estimation, thermal sensors to identify possible operating hazards or process steps, etc. Each channel and sensor may be handled by a specific module 10, 12.”).
Regarding claim 4, in view of the above combination of Dan and Sun, Dan further discloses:
The non-transitory machine-readable medium of claim 1, wherein the operations further include operations to receive user confirmation of the action before outputting the control signals (at least as in paragraph 0018, “If a command is insufficient, e.g., the action is not understood or any information is missing, e.g. the object is unknown to the system, the controller module 16 may prompt one or more additional operator inputs to provide complementary input”; at least as in paragraph 0019, “Feedback requests and operator input may be provided by any of the operator communication and sensing modules 10, 12 best suited for a natural and intuitive information flow”).
Regarding claim 5, in view of the above combination of Dan and Sun, Dan further discloses:
The non-transitory machine-readable medium of claim 1, wherein the set of actions are generated by automatically extracting tasks from videos captured of the robotic device (at least as in paragraph 0015, “It is understood that the list of operator communication channels (i.e., inputs to the operator module 10) is not limited to the examples in FIG. 1, but depending on the preferred means of operator input and feedback, may include a microphone and speakers for audio, cameras (either stand-alone or integrated into augmented-/virtual reality glasses) for gesture recognition or superposition of robotic tasks in space (without the robot being present or having been programmed), haptic gloves for tactile feedback of manipulation of robotic tasks, monitors for the implementation of user interfaces (UI), touch screens or a tablet for touch selection, etc.”).
Regarding claim 6, in view of the above combination of Dan and Sun, Dan further discloses:
The non-transitory machine-readable medium of claim 1, wherein the robotic device is controlled using a gesture via an augmented reality interface, and wherein the current context includes identification of the gesture (at least as in paragraph 0015, “It is understood that the list of operator communication channels (i.e., inputs to the operator module 10) is not limited to the examples in FIG. 1, but depending on the preferred means of operator input and feedback, may include a microphone and speakers for audio, cameras (either stand-alone or integrated into augmented-/virtual reality glasses) for gesture recognition or superposition of robotic tasks in space (without the robot being present or having been programmed), haptic gloves for tactile feedback of manipulation of robotic tasks, monitors for the implementation of user interfaces (UI), touch screens or a tablet for touch selection, etc.”).
Regarding claim 7, in view of the above combination of Dan and Sun, Sun further discloses:
The non-transitory machine-readable medium of claim 1, wherein the operations to automatically select the action of the set of actions include operations to determine that two or more actions of the acyclic graph are selectable based on the current context, and select the action based on a probability of each of the two or more actions (at least as in col. 12, ln. 3-27, wherein multiple task trees can be generated and the cost of each tree can be compared to determine the most efficient tree).
Regarding claim 8, in view of the above combination of Dan and Sun, Dan further discloses:
The non-transitory machine-readable medium of claim 1, wherein the action is a contingent action based on a previously performed action by the robotic device (at least as in paragraph 0018, “The operator may also describe a sequence of robot movements and/or conditions for that robot movement to be executed. For example, the operator may desire the robot to “pick the tube and place it in the red bin” or “after a tube is available in the blue bin, pick it and shake it”).
Regarding claim 9, Dan discloses:
A non-transitory machine-readable medium, including instructions, which when executed by processing circuitry, cause the processing circuitry to perform operations to:
generate a basic skill library of primitives performable by a robotic device (at least as in paragraph 0017, “The set of operator commands that are recognized by the controller module 16 may be called a skill set and may be dynamically extended over the robot's lifetime”; at least as in paragraph 0022, “It may be desirable to store the operator inputs, additional operator inputs and/or translated software commands in a library storage with data links therebetween so that future operator inputs may be able to be autonomously linked to such additional operator inputs and/or software commands (28). The library storage may be included as part of the actuation module 14 and may be a skill set library that recognizes operator inputs”; at least as in paragraph 0020, “The controller module 16 registers the additional operator inputs in an appropriate form for future use... Importantly, the format allows learned skills to be transferrable between robots 18”);
automatically extract, from the basic skill library of primitives, a set of possible actions for at least one object from video data of the at least one object being manipulated (at least as in paragraph 0022, “The operator inputs (and additional operator inputs) are then translated into software commands that will direct the robot 18 to perform the desired robot movement (26), for example, by an actuation module 10”);
identify a task to be completed by the robotic device, including the at least one object involved in the task (at least as in paragraph 0016, “the actuation module 14 translates task requests into software commands that are understandable by the robot 18”; at least as in paragraph 0021, “Information about one or more objects in the workspace of the robot 18, such as identification and/or location information, may also be received from one or more environmental sensors 12, such as a robot vision system 12 (22), or captured by an environmental sensor module 12… The method may then determine whether the operator input is sufficient to translate the operator input into software commands that will direct the robot 10 to perform the desired robot movement (24)”; at least as in paragraph 0015, “the environment sensor module 12 may utilize different sensors including a camera for object identification”; at least as in paragraph 0012, “several choices may be offered (e.g., by showing the objects that are visible to the robot's camera)”)…
But, Dan does not explicitly teach:
generate an acyclic action graph for the task based on the set of possible actions for the at least one object; and
output the acyclic action graph to generate a suggested action for the robotic device in completing the task.
However, Sun, in the same field of endeavor of controlling robots to perform on objects to complete a desired task based on functional relationships between objects, specifically teaches:
generate an acyclic action graph for the task based on the set of possible actions for the at least one object (at least as in col. 13-14, ln. 62-40, wherein after receiving a command to perform a given task, “the robot can search the FOON for the object node associated with the goal of the task and add it to a task tree, as indicated in block 32… Referring next to block 34, the child motion node associated with the goal object node is identified and added to the task tree… Next, as indicated in block 36, the one or more child object nodes associated with the motion node is/are identified and added to the task tree… Flow from this point depends upon whether or not each of the child objects associated with the motion node are available or not, as indicated in decision block 38… flow can continue from decision block 38 to block 40 at which each manipulation motion of the task tree can be performed in reverse order using the various required objects” thus the robot continues to add nodes to the task tree until all objects associated with the node are available in the environment; see also Fig. 10 and 13; at least as in col. 5, ln. 9-14, “A FOON may be a directed semi-acyclic graph as there may be some instances of loops when an object does not change states from taking part in some action”); and
output the acyclic action graph to generate a suggested action for the robotic device in completing the task (at least as in col. 12, ln. 28-31, “Once the task tree has been generated, it can then be used to generate a task sequence that contains a series of motions, objects, and their states, which provides step-by-step commands for a robot to execute”; at least as in col. 12, ln. 3-27, wherein multiple task trees can be generated and the cost of each tree can be compared to determine the most efficient tree).
Therefore, it would have been obvious to one of the ordinary skill in the art at the effective filing date of the instant invention to modify the teachings of Dan, to include Sun's teaching of a robot system generating a task sequence utilizing a task tree and FOON, since Sun teaches wherein the system can provide robots with the ability to understand a task in the format of object states and object-related functional motions and generate motions based on the learning motions to manipulate the objects properly for a desired task thus improving object and action recognition.
Regarding claim 10, in view of the above combination of Dan and Sun, Dan further discloses:
The non-transitory machine-readable medium of claim 9, wherein the set of possible actions are specific to a position or an orientation of the at least one object (at least as in paragraph 0017, “The operator may also describe a sequence of robot movements and/or conditions for that robot movement to be executed. For example, the operator may desire the robot to “pick the tube and place it in the red bin” or “after a tube is available in the blue bin, pick it and shake it”.”).
Regarding claim 11, in view of the above combination of Dan and Sun, Dan further discloses:
The non-transitory machine-readable medium of claim 9, wherein the acyclic action graph includes at least one action that is only available after at least one other action has been completed (at least as in paragraph 0018, “The operator may also describe a sequence of robot movements and/or conditions for that robot movement to be executed. For example, the operator may desire the robot to “pick the tube and place it in the red bin” or “after a tube is available in the blue bin, pick it and shake it””).
Regarding claim 12, in view of the above combination of Dan and Sun, Sun further discloses:
The non-transitory machine-readable medium of claim 9, wherein the acyclic action graph includes weights assigned to respective possible actions of the acyclic action graph (at least as in col. 12, ln. 3-27, wherein multiple task trees can be generated and the cost of each tree can be compared to determine the most efficient tree).
Regarding claim 13, Dan discloses:
An apparatus comprising:
means for identifying a current context of a robotic device, the current context including identification of at least one object present in a field of view (at least as in paragraph 0021, “Information about one or more objects in the workspace of the robot 18, such as identification and/or location information, may also be received from one or more environmental sensors 12, such as a robot vision system 12 (22), or captured by an environmental sensor module 12… The method may then determine whether the operator input is sufficient to translate the operator input into software commands that will direct the robot 10 to perform the desired robot movement (24)”; at least as in paragraph 0015, “the environment sensor module 12 may utilize different sensors including a camera for object identification”; at least as in paragraph 0012, “several choices may be offered (e.g., by showing the objects that are visible to the robot's camera)”);
means for receiving a user input comprising a user motion corresponding to the robotic device or the current context (at least as in paragraph 0021, “In the method, and operator input which represents a desired robot movement is received (20), for example, from an operator communication module 10”; at least as in paragraph 0015, “It is understood that the list of operator communication channels (i.e., inputs to the operator module 10) is not limited to the examples in FIG. 1, but depending on the preferred means of operator input and feedback, may include a microphone and speakers for audio, cameras (either stand-alone or integrated into augmented-/virtual reality glasses) for gesture recognition or superposition of robotic tasks in space (without the robot being present or having been programmed), haptic gloves for tactile feedback of manipulation of robotic tasks, monitors for the implementation of user interfaces (UI), touch screens or a tablet for touch selection, etc.”);
means for automatically extracting, from a basic skills library, one or more possible actions for the at least one object from video data of the at least one object being manipulated (at least as in paragraph 0022, “The operator inputs (and additional operator inputs) are then translated into software commands that will direct the robot 18 to perform the desired robot movement (26), for example, by an actuation module 10… It may be desirable to store the operator inputs, additional operator inputs and/or translated software commands in a library storage with data links therebetween so that future operator inputs may be able to be autonomously linked to such additional operator inputs and/or software commands (28). The library storage may be included as part of the actuation module 14 and may be a skill set library that recognizes operator inputs”; at least as in paragraph 0017, “The set of operator commands that are recognized by the controller module 16 may be called a skill set and may be dynamically extended over the robot's lifetime”; at least as in paragraph 0020, “The controller module 16 registers the additional operator inputs in an appropriate form for future use... Importantly, the format allows learned skills to be transferrable between robots 18”);
means for determining, using processing circuitry, from the current context and the user input, a set of actions performable by the robotic device corresponding to the at least one object and predicted to correspond to the user input based on a trajectory of the user motion, the set of actions including the one or more possible actions (at least as in paragraph 0018, “If the skill (i.e., pick) and object (i.e., tube) are known to the controller module 16, the controller module 16 may proceed to request a task (or set of tasks) execution(s) to the actuation module 14 which will translate the tasks into low-level robot software commands for the robot 18 controller”; at least as in paragraph 0021, “The method may then determine whether the operator input is sufficient to translate the operator input into software commands that will direct the robot 10 to perform the desired robot movement (24)”);
means for outputting control signals that cause the robotic device to perform the action to interact with the at least one object (at least as in paragraph 0023, “Once the operator inputs have been translated into software commands understandable to the robot 18, the robot 18 may be programmed with such software instructions (30). Subsequent to such programming, the robot 18 may be operated using the software commands in order to perform the desired robot movement (32)”).
But, Dan does not explicitly teach:
means for automatically selecting, using the processing circuitry, an action of the set of actions based on an acyclic graph describing action paths.
However, Sun, in the same field of endeavor of controlling robots to perform on objects to complete a desired task based on functional relationships between objects, specifically teaches:
means for automatically selecting, using the processing circuitry, an action of the set of actions based on an acyclic graph describing action paths (at least as in col. 13-14, ln. 62-40, wherein after receiving a command to perform a given task, “the robot can search the FOON for the object node associated with the goal of the task and add it to a task tree, as indicated in block 32… Referring next to block 34, the child motion node associated with the goal object node is identified and added to the task tree… Next, as indicated in block 36, the one or more child object nodes associated with the motion node is/are identified and added to the task tree… Flow from this point depends upon whether or not each of the child objects associated with the motion node are available or not, as indicated in decision block 38… flow can continue from decision block 38 to block 40 at which each manipulation motion of the task tree can be performed in reverse order using the various required objects” thus the robot continues to add nodes to the task tree until all objects associated with the node are available in the environment; see also Fig. 10 and 13; at least as in col. 5, ln. 9-14, “A FOON may be a directed semi-acyclic graph as there may be some instances of loops when an object does not change states from taking part in some action. A motion does not necessarily cause a change in an object, as certain objects will remain in the same state”; at least as in col. 12, ln. 28-31, “Once the task tree has been generated, it can then be used to generate a task sequence that contains a series of motions, objects, and their states, which provides step-by-step commands for a robot to execute”; at least as in col. 12, ln. 3-27, wherein multiple task trees can be generated and the cost of each tree can be compared to determine the most efficient tree).
Therefore, it would have been obvious to one of the ordinary skill in the art at the effective filing date of the instant invention to modify the teachings of Dan, to include Sun's teaching of a robot system generating a task sequence utilizing a task tree and FOON, since Sun teaches wherein the system can provide robots with the ability to understand a task in the format of object states and object-related functional motions and generate motions based on the learning motions to manipulate the objects properly for a desired task thus improving object and action recognition.
Regarding claim 14, in view of the above combination of Dan and Sun, Dan further discloses:
The apparatus of claim 13, wherein the basic skills library includes one or more dynamic movement actions associated with at least one of: grasping the at least one object, moving the at least one object, or releasing the at least one object (at least as in paragraph 0018, “the operator may desire the robot to “pick the tube and place it in the red bin” or “after a tube is available in the blue bin, pick it and shake it”… If the skill (i.e., pick) and object (i.e., tube) are known to the controller module 16, the controller module 16 may proceed to request a task (or set of tasks) execution(s) to the actuation module 14 which will translate the tasks into low-level robot software commands for the robot 18 controller”; at least as in paragraph 0012, “A first step of the interaction involves the operator instructing the robot to perform a sequence of operations by providing high level concepts (e.g., pick, move, place)”).
Regarding claim 15, in view of the above combination of Dan and Sun, Dan further discloses:
The apparatus of claim 13, wherein the current context includes information corresponding to an identification of all objects that are in the field of view, a trajectory of the robotic device, and a task to be completed (at least as in paragraph 0021, “operator input which represents a desired robot movement is received (20), for example, from an operator communication module 10. As described above, the operator input may be received as a voice input 10A that is transcribed to text or may be received from a display screen 10B, such as a touchscreen, or any other possible user interface. Information about one or more objects in the workspace of the robot 18, such as identification and/or location information, may also be received from one or more environmental sensors 12, such as a robot vision system 12 (22), or captured by an environmental sensor module 12”; at least as in paragraph 0015, “It is understood that the list of operator communication channels (i.e., inputs to the operator module 10) is not limited to the examples in FIG. 1, but depending on the preferred means of operator input and feedback, may include a microphone and speakers for audio, cameras (either stand-alone or integrated into augmented-/virtual reality glasses) for gesture recognition or superposition of robotic tasks in space (without the robot being present or having been programmed), haptic gloves for tactile feedback of manipulation of robotic tasks, monitors for the implementation of user interfaces (UI), touch screens or a tablet for touch selection, etc. Likewise, the environment sensor module 12 may utilize different sensors including a camera for object identification, robotic skin or force sensors for contact, gripping or collision detection, lidar/radar for distance estimation, thermal sensors to identify possible operating hazards or process steps, etc. Each channel and sensor may be handled by a specific module 10, 12.”).
Regarding claim 16, in view of the above combination of Dan and Sun, Dan further discloses:
The apparatus of claim 13, further comprising means for receiving user confirmation of the action before outputting the control signals (at least as in paragraph 0018, “If a command is insufficient, e.g., the action is not understood or any information is missing, e.g. the object is unknown to the system, the controller module 16 may prompt one or more additional operator inputs to provide complementary input”; at least as in paragraph 0019, “Feedback requests and operator input may be provided by any of the operator communication and sensing modules 10, 12 best suited for a natural and intuitive information flow”).
Regarding claim 17, in view of the above combination of Dan and Sun, Dan further discloses:
The apparatus of claim 13, wherein the set of actions are generated by automatically extracting tasks from video or image data captured of the robotic device (at least as in paragraph 0015, “It is understood that the list of operator communication channels (i.e., inputs to the operator module 10) is not limited to the examples in FIG. 1, but depending on the preferred means of operator input and feedback, may include a microphone and speakers for audio, cameras (either stand-alone or integrated into augmented-/virtual reality glasses) for gesture recognition or superposition of robotic tasks in space (without the robot being present or having been programmed), haptic gloves for tactile feedback of manipulation of robotic tasks, monitors for the implementation of user interfaces (UI), touch screens or a tablet for touch selection, etc.”).
Regarding claim 18, in view of the above combination of Dan and Sun, Dan further discloses:
The apparatus of claim 13, wherein the robotic device is controlled using a gesture via an augmented reality interface, and wherein the current context includes identification of the gesture (at least as in paragraph 0015, “It is understood that the list of operator communication channels (i.e., inputs to the operator module 10) is not limited to the examples in FIG. 1, but depending on the preferred means of operator input and feedback, may include a microphone and speakers for audio, cameras (either stand-alone or integrated into augmented-/virtual reality glasses) for gesture recognition or superposition of robotic tasks in space (without the robot being present or having been programmed), haptic gloves for tactile feedback of manipulation of robotic tasks, monitors for the implementation of user interfaces (UI), touch screens or a tablet for touch selection, etc.”).
Regarding claim 19, in view of the above combination of Dan and Sun, Dan further discloses:
The apparatus of claim 13, wherein automatically selecting the action of the set of actions includes determining that two or more actions of the acyclic graph are selectable based on the current context, and selecting the action based on a probability of each of the two or more actions(at least as in paragraph 0048, wherein “Provided that some measure describing which one of the alternate options is more likely is known, the options may be weighted according to their respective likelihood”; at least as in paragraph 0050, wherein “In case there exists a large number of possible alternate options, a weighting of the alternate options can be included into an optimization problem, providing solutions that are void of the options with the least quality, and maintain the options which are most promising”; at least as in paragraph 0063, wherein “The method for controlling at least one effector trajectory of an embodiment generates the at least one effector trajectory by applying a trajectory generation algorithm on the generated task description, including applying weights to the cost function for the at least one first posture and the at least one second posture”; at least as in paragraph 0236, wherein “In case there exists a large number of possible alternate options, a weighting of the alternate options can be included into an optimization problem, providing solutions that are void of the options with the least quality, and maintain the options which are most promising”).
Regarding claim 20, in view of the above combination of Dan and Sun, Dan further discloses:
The apparatus of claim 13, wherein the action is a contingent action based on a previously performed action by the robotic device (at least as in paragraph 0018, “The operator may also describe a sequence of robot movements and/or conditions for that robot movement to be executed. For example, the operator may desire the robot to “pick the tube and place it in the red bin” or “after a tube is available in the blue bin, pick it and shake it”).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RICARDO ICHIKAWA VISCARRA whose telephone number is (571)270-0154. The examiner can normally be reached M-F 9-12 & 2-4 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Adam Mott can be reached on (571) 270-5376. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RICARDO I VISCARRA/Examiner, Art Unit 3657
/ADAM R MOTT/Supervisory Patent Examiner, Art Unit 3657