Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: Tactile Adaptation from Visual Incentives in claims 7 and 17. While the examiner recognizes that this is a technique presented in a research paper, the examiner does not believe that it is considered a term of the art, and that the implications of the research paper can be read into the term in the application. Therefore, the examiner is treating this term as what is provided in the claim language (that it is a tactile adaptation from visual incentives) and not more from that.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because they are directed towards a mental process without significantly more.
Claim 1 cites:
A method for learning finger gaiting skills for multi-fingered robot hands comprising:
decomposing a finger-gaiting task into shorter tasks by contact groups;
augmenting a reference trajectory for each shorter task; and
using representation pretraining and exploration for learning.
Step 2A prong one evaluation: Judicial Exception – Yes – Mental Processes
The Office submits that the foregoing bolded limitation(s) constitutes judicial exceptions in terms of “mental processes” because under its broadest reasonable interpretation, the claim covers performance using mental.
The claims recite decomposing a finger-gaiting task into shorter tasks by contact groups. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider the finger gaiting task, consider when fingers would likely contact the object, and consider 2 or more smaller sub tasks. Thus this step is directed to a mental process.
The claims recite augmenting a reference trajectory for each shorter task. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider reference trajectories and consider how they could be better and augment them. Thus this step is directed to a mental process.
The claims recite using representation pretraining and exploration for learning. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider reference trajectories and their success and failures, and consider how different and new trajectories would perform and learn what might be successful trajectories. Thus this step is directed to a mental process.
Step 2A Prong Two evaluations
Claims are evaluated whether as a whole it integrates the recited judicial exception into a practical application. As noted in the 2019 PEG, it must be determined whether any additional elements in the claim beyond the abstract idea integrate the exception into a practical application in a manner that imposes a meaningful limit on the judicial exception. The courts have indicated that additional elements merely using a computer to implement an abstract idea or adding/performing insignificant extra solution activity, or generally linking use of a judicial exception to a particular technological environment or field of use do not integrate a judicial exception into a “practical application.”
In the present case, the additional limitations beyond the above-noted abstract idea are as follows (where the underlined portions are the “additional limitations” while the bolded portions continue to represent the “abstract idea”).
The claims recite decomposing tasks, augmenting trajectories, and using representation pretraining and exploration for learning using a device, a processor, a memory, a computer, processing circuitry, and a non-transitory computer readable storage medium. The above listed actions are recited at a high level of generality. The computer/circuitry that facilitate the steps are described by the specification at a high level of generality. The generically recited computer merely describes how to generally “apply” the otherwise mental/extra solution processes using a generic or general-purpose processor. Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The claim is not patent eligible.
2B Evaluation: Inventive Concept – No
Claims are evaluated as to whether the claims as a whole amount to significantly more than the recited exception, i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim.
As discussed with respect to Step 2A Prong Two, the additional elements in the claim amount to no more than possible uses for the output of the abstract idea. The same analysis applies here in 2B, i.e., possible uses for information or mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Thus the claims are not patent eligible.
Claim 12 cites:
A method for learning finger gaiting skills for multi-fingered robot hands, the method implemented using a computer system including a processor communicatively coupled to a memory device, the method comprising:
decomposing long-horizon finger gating tasks into sequences of shorter-horizon tasks by treating each subsequent set of contacting bodies as a separate task;
augmenting a reference trajectory for each shorter task; and
using representation pretraining and exploration for learning.
Step 2A prong one evaluation: Judicial Exception – Yes – Mental Processes
The Office submits that the foregoing bolded limitation(s) constitutes judicial exceptions in terms of “mental processes” because under its broadest reasonable interpretation, the claim covers performance using mental.
The claims recite decomposing a finger-gaiting task into shorter tasks by contact groups. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider the finger gaiting task, consider when fingers would likely contact the object, and consider 2 or more smaller sub tasks. Thus this step is directed to a mental process.
The claims recite augmenting a reference trajectory for each shorter task. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider reference trajectories and consider how they could be better and augment them. Thus this step is directed to a mental process.
The claims recite using representation pretraining and exploration for learning. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider reference trajectories and their success and failures, and consider how different and new trajectories would perform and learn what might be successful trajectories. Thus this step is directed to a mental process.
Step 2A Prong Two evaluations
Claims are evaluated whether as a whole it integrates the recited judicial exception into a practical application. As noted in the 2019 PEG, it must be determined whether any additional elements in the claim beyond the abstract idea integrate the exception into a practical application in a manner that imposes a meaningful limit on the judicial exception. The courts have indicated that additional elements merely using a computer to implement an abstract idea or adding/performing insignificant extra solution activity, or generally linking use of a judicial exception to a particular technological environment or field of use do not integrate a judicial exception into a “practical application.”
In the present case, the additional limitations beyond the above-noted abstract idea are as follows (where the underlined portions are the “additional limitations” while the bolded portions continue to represent the “abstract idea”).
The claims recite decomposing tasks, augmenting trajectories, and using representation pretraining and exploration for learning using a device, a processor, a memory, a computer, processing circuitry, and a non-transitory computer readable storage medium. The above listed actions are recited at a high level of generality. The computer/circuitry that facilitate the steps are described by the specification at a high level of generality. The generically recited computer merely describes how to generally “apply” the otherwise mental/extra solution processes using a generic or general-purpose processor. Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The claim is not patent eligible.
2B Evaluation: Inventive Concept – No
Claims are evaluated as to whether the claims as a whole amount to significantly more than the recited exception, i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim.
As discussed with respect to Step 2A Prong Two, the additional elements in the claim amount to no more than possible uses for the output of the abstract idea. The same analysis applies here in 2B, i.e., possible uses for information or mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Thus the claims are not patent eligible.
Claim 19 cites:
A non-transitory computer readable medium comprising a plurality of instructions which, when executed by a processor, cause the processor to:
decompose long-horizon finger gating tasks into sequences of shorter-horizon tasks by treating desired movements in each subsequent set of contacting bodies as a separate task;
augment a reference trajectory for each shorter task; and
use representation pretraining and exploration by pretraining on the reference trajectory of each shorter task; and
use exploration for learning by generating exploratory actions based on the reference trajectory of the shorter task.
Step 2A prong one evaluation: Judicial Exception – Yes – Mental Processes
The Office submits that the foregoing bolded limitation(s) constitutes judicial exceptions in terms of “mental processes” because under its broadest reasonable interpretation, the claim covers performance using mental.
The claims recite decomposing a finger-gaiting task into shorter tasks by contact groups. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider the finger gaiting task, consider when fingers would likely contact the object, and consider 2 or more smaller sub tasks. Thus this step is directed to a mental process.
The claims recite augmenting a reference trajectory for each shorter task. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider reference trajectories and consider how they could be better and augment them. Thus this step is directed to a mental process.
The claims recite using representation pretraining and exploration for learning. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider reference trajectories and their success and failures, and consider how different and new trajectories would perform and learn what might be successful trajectories. Thus this step is directed to a mental process.
Step 2A Prong Two evaluations
Claims are evaluated whether as a whole it integrates the recited judicial exception into a practical application. As noted in the 2019 PEG, it must be determined whether any additional elements in the claim beyond the abstract idea integrate the exception into a practical application in a manner that imposes a meaningful limit on the judicial exception. The courts have indicated that additional elements merely using a computer to implement an abstract idea or adding/performing insignificant extra solution activity, or generally linking use of a judicial exception to a particular technological environment or field of use do not integrate a judicial exception into a “practical application.”
In the present case, the additional limitations beyond the above-noted abstract idea are as follows (where the underlined portions are the “additional limitations” while the bolded portions continue to represent the “abstract idea”).
The claims recite decomposing tasks, augmenting trajectories, and using representation pretraining and exploration for learning using a device, a processor, a memory, a computer, processing circuitry, and a non-transitory computer readable storage medium. The above listed actions are recited at a high level of generality. The computer/circuitry that facilitate the steps are described by the specification at a high level of generality. The generically recited computer merely describes how to generally “apply” the otherwise mental/extra solution processes using a generic or general-purpose processor. Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The claim is not patent eligible.
2B Evaluation: Inventive Concept – No
Claims are evaluated as to whether the claims as a whole amount to significantly more than the recited exception, i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim.
As discussed with respect to Step 2A Prong Two, the additional elements in the claim amount to no more than possible uses for the output of the abstract idea. The same analysis applies here in 2B, i.e., possible uses for information or mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Thus the claims are not patent eligible.
Step 2A prong one evaluation: Judicial Exception – Yes – Mental Processes
The Office submits that the foregoing bolded limitation(s) constitutes judicial exceptions in terms of “mental processes” because under its broadest reasonable interpretation, the claim covers performance using mental.
Claim 2 cites:
The method of Claim 1, wherein decomposing reference finger-gaiting trajectories into shorter tasks by contact group comprises decomposing the finger-gaiting task into sequences of shorter tasks by treating each subsequent set of contacting bodies as a separate task.
Claim 3 cites:
The method of Claim 1, customizing the representation pretraining and exploration processes for learning efficiency based on reference finger-gaiting trajectories.
Claim 4 cites:
The method of Claim 1, comprising learning sub policies to transition between sets of contacting groups formed by decomposing the finger-gaiting task into shorter tasks by contact groups.
Claim 5 cites:
The method of Claim 4, wherein each contact group is a set of bodies in contact such as the multi-fingered robot hand, the object, and the environment.
Claim 6 cites:
The method of Claim 4, wherein for a desired contact group, the object remains manipulatable/controllable during exploration.
Claim 7 cites:
The method of Claim 1, wherein tactile adaptation from visual incentives (TAVI) is used for learning.
Claim 8 cites:
The method of Claim 7, comprising providing an option to be predicated by contact groups.
Claim 9 cites:
The method of Claim 1, comprising adding domain randomization to an initial state of the object and robot joint positions to cover a post image of a previous contact group.
Claim 10 cites:
The method of Claim 1, comprising reversing initial states, goal states, and reference trajectory for each primitive skill during training to expand ways to compose the primitive skill and to interface with user specified commands.
Claim 11 cites:
The method of Claim 1, comprising providing a task graph for user interactions to choose a primitive skill to use and to pause or reverse the skill.
Claim 13 cites:
The method of Claim 12, comprising learning sub policies to transition between sets of contacting groups formed by decomposing the finger-gaiting task into shorter tasks by contact groups, wherein each contact group is a set of bodies in contact such as the multi-fingered robot hand, the object, and the environment.
Claim 14 cites:
The method of Claim 12, wherein for a desired contact group, the object remains manipulatable/controllable during exploration.
Claim 15 cites:
The method of Claim 12, comprising adding domain randomization to an initial state of the object and robot joint positions to cover a post image of a previous contact group.
Claim 16 cites:
The method of Claim 12, comprising reversing initial states, goal states, and reference trajectory for each primitive skill during training to expand ways to compose the primitive skill and to interface with user specified commands.
Claim 17 cites:
The method of Claim 12, wherein tactile adaptation from visual incentives (TAVI) is used for learning.
Claim 18 cites:
The method of Claim 12, comprising providing a task graph for user interactions to choose a primitive skill to use and to pause or reverse the skill.
Claim 20 cites:
The non-transitory computer readable medium according to Claim 19, wherein the instructions, when executed by the processor, causes the processor to provide a task graph for user interactions to choose a primitive skill to use and to pause or reverse the skill.
Claims 2 and 13 cite decomposing the finger gaiting tasks by treating each set of contacting bodies as a separate task. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider what tasks need to be performed, what objects will be grabbed, by what fingers, and decompose the tasks into sub tasks. Thus this step is directed to a mental process.
Claim 3 cites customizing the representation retraining and exploration processed for learning efficiency based on reference finger gaiting trajectories. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider existing finger gaiting trajectories, and use that to customize the data and process for pretraining. Thus this step is directed to a mental process.
Claim 4 cites learning sub policies to transition between contact groups. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider what has been working and what hasn’t, and consider policies that would be more effective for moving from one contact group to another. Thus this step is directed to a mental process.
Claims 7 and 17 cite that TAVI is used for learning. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider visual data and adapt tactile strategies or forces based off of that while learning effective policies. Thus this step is directed to a mental process.
Claims 9 and 15 cite adding domain randomization to an initial state. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider the state of an object, and randomize some elements in order to change them slightly. Thus this step is directed to a mental process.
Claims 10 and 16 cite reversing initial states goals and reference trajectories. This limitation, as drafted, is a simple process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind, but for the limitation that processing circuitry be programed to perform the task. That is, other than reciting “processor”, or “memory”, nothing in the claim precludes the element being done in the mind. A person could mentally consider the initial states and reverse them to other values if desired. Thus this step is directed to a mental process.
Step 2A Prong Two evaluations
Claims are evaluated whether as a whole it integrates the recited judicial exception into a practical application. As noted in the 2019 PEG, it must be determined whether any additional elements in the claim beyond the abstract idea integrate the exception into a practical application in a manner that imposes a meaningful limit on the judicial exception. The courts have indicated that additional elements merely using a computer to implement an abstract idea or adding/performing insignificant extra solution activity, or generally linking use of a judicial exception to a particular technological environment or field of use do not integrate a judicial exception into a “practical application.”
In the present case, the additional limitations beyond the above-noted abstract idea are as follows (where the underlined portions are the “additional limitations” while the bolded portions continue to represent the “abstract idea”).
The claims cite “providing a task graph for user interactions”. This is listed at a high level of generality and there is nothing that indicates that this is more than mere data sending and receiving. Therefore it is insignificant extra solution activity.
The claims recite decomposing tasks, customizing pretraining and exploration processes, learning sub policies, using TAVI for learning, adding domain randomization, and reversing initial states using a device, a processor, a memory, a computer, processing circuitry, and a non-transitory computer readable storage medium. The above listed actions are recited at a high level of generality. The computer/circuitry that facilitate the steps are described by the specification at a high level of generality. The generically recited computer merely describes how to generally “apply” the otherwise mental/extra solution processes using a generic or general-purpose processor. Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The claim is not patent eligible.
2B Evaluation: Inventive Concept – No
Claims are evaluated as to whether the claims as a whole amount to significantly more than the recited exception, i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim.
As discussed with respect to Step 2A Prong Two, the additional elements in the claim amount to no more than possible uses for the output of the abstract idea. The same analysis applies here in 2B, i.e., possible uses for information or mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Thus the claims are not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 7-8, 12-13, 17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Stouraitis et al (Multi-mode Trajectory Optimization for Impact-aware Manipulation, please refer to attached NPL) in light of Handa et al (US Pub 2021/0122045 A1), hereafter known as Handa.
For Claim 1, Stouraitis teaches A method for learning finger gaiting skills for multi-fingered robot hands comprising:
decomposing a task into shorter tasks by contact groups; (
Page 9426, Column 1, Section II A.
A. Hybrid dynamic systems
As described in Section I, motion planning concepts for manipulation are often based on trajectories that guide an object to its desired state. In this work, we consider a class of systems where the trajectories include discontinuous transitions between different contact states. Similar to [6], [19], we describe systems with hybrid dynamics as ˙ x(t) = fk(x(t),u(t),v(t)), if (x(t),u(t)) ∈ Dk, (1) where x(t) ∈ Rn is the state of the system, u(t) ∈ Rm is the control actions of the plant, v(t) ∈ Rν is the control input applied on the environment, n,m,ν ∈ R define the dimensions of each quantity and k ∈ {0,1} indexes to the different sets Dk. Each Dk ⊂ Rn× m defines the domain (relative to x(t) and u(t)) of a contact state, i.e. free-motion or in-contact. Note that (1) defines both the plant’s and environment’s dynamics.)
augmenting a reference trajectory for each shorter task; and (.
Page 9428, Column 2, Section C.
Multi-mode trajectory optimization for hybrid systems To solve the continuous optimization problem in (3), we discretize the trajectory according to direct transcription [32]. The transcription of our hybrid parametric optimization prob lem is an extension of the phase-based parameterization used in our previous work [7] and is similar in spirit to [6]. For each ith knot2, the decision variables are (i) the pose of the object yi, (ii) the velocity of the object ˙yi ∈ Rν, (iii) action timings ∆Ti, (iv) the end-effector’s position ci, (v) the contact force fi and the cd-DS parameter α. We group these quantities into three vectors xi = yi ˙yi ci ˙ci ¨ci T , ui = αi ∆Ti T , vi = fi ˙ fi ¨ fi T , (15) (16) (17) where ∀i ∈ N, the trajectories of xi, ui and vi describes a multi-mode motion. In addition to the decision variables, the transcription of the continuous problem can be customized through the mode sequence z. This results in a TO problem that is separated into modes with different constraints. Mode-free constraints: Here we introduce all the c on straints that are applied independently to the modes of the trajectory, i.e., constraints that are free of parameter set z. We note that ψc ∈ R2ν defines the reachable area of the agent’s end-effectors, referred to as workspace. • Initial state of the object: y0 = y∗ 0 and ˙y0 = ˙y∗ 0. yN = ˙y∗ N. • Desired final state of the object: yN = y∗ N and / or ˙ • Kinematic limits of the end-effector: ci ∈ ψc, approxi mated with box bounds. • Lower and upper bound on time between each knot: ∆Tl ≤∆Ti ≤∆Tu, ∀i∈{0,...,N}.)
Stouraitis does not teach that it is finger gaiting
using representation pretraining and exploration for learning.
Hana, however, does teach that it is finger gaiting ([0128] In some examples, for each object, in both simulation and real-world experiments, 2 demonstrations of 2 types of manipulation trajectories may be utilized: 1) pick and place with finger-grasp and in-hand object rotation, and 2) the same but with finger tips breaking and re-establishing contact during the grasp (finger gaiting). This may give a total of 24 trajectories for analysis for both simulation and real-world experiments. In both trajectory types, the object may undergo translational and rotational slippage from both inertial forces and push-contacts with the table. Each trajectory may last about a minute. In various embodiments, the pose estimation algorithm may be run at approximately 30 Hz, which may result in a total of about 2k frames per trajectory.)
using representation pretraining and exploration for learning. ([0153] FIG. 15 illustrates training and deployment of a deep neural network, according to at least one embodiment. In at least one embodiment, untrained neural network 91506 is trained using a training dataset 1502. In at least one embodiment, training framework 1504 is a PyTorch framework, whereas in other embodiments, training framework 1504 is a Tensorflow, Boost, Caffe, Microsoft Cognitive Toolkit/CNTK, MXNet, Chainer, Keras, Deeplearning4j, or other training framework. In at least one embodiment training framework 1504 trains an untrained neural network 1506 and enables it to be trained using processing resources described herein to generate a trained neural network 1508. In at least one embodiment, weights may be chosen randomly or by pre-training using a deep belief network. In at least one embodiment, training may be performed in either a supervised, partially supervised, or unsupervised manner.
[0115] In various embodiments, a population-based optimization (“PBO”) algorithm may be utilized. In at least one embodiment, the PBO algorithm ranks all simulations by their average costs and finds the top K.sub.best simulations with the lowest costs. In at least one embodiment, the algorithm exploits by replacing the remaining K-K.sub.best simulations with copies of the K.sub.best ones, sampled with replacement, and explores by perturbing the K.sub.best simulations in the same way as WRS. In at least one embodiment, PBO effectively uses a shaped cost that depends only on the relative ordering of the simulation costs and not their magnitudes, potentially making the optimizer more robust to noisy costs.
[0116] In at least one embodiment, the above described optimizers utilize a distribution-shaping hyperparameter used to balance exploration with exploitation. In at least one embodiment, various embodiments may use combinations of additional hyperparameters such as the following: [0117] T, which may represent the time steps an algorithm may wait for every update. [0118] K, which may represent the number of concurrent simulations. [0119] θ.sub.0, which may represent the initial normal distribution over simulation parameters. [0120] Σ.sub.p, which may represent the diagonal covariance matrix for the normal distribution over initial pose perturbation. [0121] Σ.sub.θ and Σ.sub.v, which may represent the diagonal covariances of normal distributions of perturbations used for exploration.
[0122] A larger K may be generally better than a smaller K, with the caveat that the resulting simulation may be slower and may not be practical in application. Σ.sub.p may be large enough such that the actual initial pose is well represented in the initial pose distribution. However, K may be increased with a larger Σ.sub.p and the convariance of θ.sub.0 to ensure that the density of the samples may be high enough to capture a wider distribution.
[0123] In at least one embodiment, there are two additional trade-offs with these hyperparameters. In at least one embodiment, one trade-off is the exploration-exploitation trade-off in the context of optimizing for θ, and the other is the trade-off between optimizing for θ and for p.sub.t.sup.(i*). In at least one embodiment, making Σ.sub.θ or Σ.sub.v wider increases the speed at which the set of simulation parameters “move,” and the optimizer may explore more than it exploits. In at least one embodiment, increasing T improves the optimization for θ as the optimizer may have more samples to evaluate each simulation. However, updating the simulation parameters too slowly may lead to drift in pose estimation if the least-cost simulation is sufficiently different from the real world, potentially leading to divergent behavior in some examples. In some examples, divergent behavior may occur when force perturbation or some simulation parameters lead to an irrecoverable configuration, where the object falls out of the hand or brings the object into a pose such that small force perturbations cannot bring it back to the correct pose. In some examples, this may be acceptable if a few samples become divergent. Their costs may be high, so in some embodiments, they may be discarded and replaced by ones that are not divergent during optimizer updates.
)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Handa such that the task is for finger gaiting, and that representation pretraining and exploration for learning is used because objects are frequently picked up using fingered robots, and therefore it would also be expected to be useful to use the methods of Stouraitis with fingered robots, and representation pretraining and exploration learning are known tools for teaching robots to perform tasks, and it would also be expected to be successful at pick and place tasks that include fingers.
For Claim 2, Stouraitis teaches The method of Claim 1, wherein decomposing reference trajectories into shorter tasks by contact group comprises decomposing the task into sequences of shorter tasks by treating each subsequent set of contacting bodies as a separate task. ((
Page 9426, Column 1, Section II A.
A. Hybrid dynamic systems As described in Section I, motion planning concepts for manipulation are often based on trajectories that guide an object to its desired state. In this work, we consider a class of systems where the trajectories include discontinuous transitions between different contact states. Similar to [6], [19], we describe systems with hybrid dynamics as ˙ x(t) = fk(x(t),u(t),v(t)), if (x(t),u(t)) ∈ Dk, (1) where x(t) ∈ Rn is the state of the system, u(t) ∈ Rm is the control actions of the plant, v(t) ∈ Rν is the control input applied on the environment, n,m,ν ∈ R define the dimensions of each quantity and k ∈ {0,1} indexes to the different sets Dk. Each Dk ⊂ Rn× m defines the domain (relative to x(t) and u(t)) of a contact state, i.e. free-motion or in-contact. Note that (1) defines both the plant’s and environment’s dynamics.))
Stouraitis does not teach that the skills are finger-gaiting.
Handa, however, does teach that the skills are finger-gaiting. ([0128] In some examples, for each object, in both simulation and real-world experiments, 2 demonstrations of 2 types of manipulation trajectories may be utilized: 1) pick and place with finger-grasp and in-hand object rotation, and 2) the same but with finger tips breaking and re-establishing contact during the grasp (finger gaiting). This may give a total of 24 trajectories for analysis for both simulation and real-world experiments. In both trajectory types, the object may undergo translational and rotational slippage from both inertial forces and push-contacts with the table. Each trajectory may last about a minute. In various embodiments, the pose estimation algorithm may be run at approximately 30 Hz, which may result in a total of about 2k frames per trajectory.)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Handa such that the task is for finger gaiting, because objects are frequently picked up using fingered robots, and therefore it would also be expected to be useful to use the methods of Stouraitis with fingered robots as it would be expected to be successful at pick and place tasks that include fingers.
For Claim 3, Stouraitis teaches The method of Claim 1,
Stouraitis does not teach customizing the representation pretraining and exploration processes for learning efficiency based on reference finger-gaiting trajectories.
Handa, however, does teach customizing the representation pretraining and exploration processes for learning efficiency based on reference finger-gaiting trajectories. ([0065] In at least one embodiment, to evaluate the proposed algorithm, a total of 24 in-hand manipulation trajectories with three different objects in simulation and in the real world were collected. In at least one embodiment, a Kuka IIWA7 arm with the 4-finger Wonik Robotics Allegro hand as the end-effector was used, with each finger outfitted with a SynTouch BioTac contact sensor. In at least one embodiment, object manipulation trajectories are human demonstrations collected via a hand-tracking teleoperation system. In at least one embodiment, because ground-truth object poses in simulation are available, detailed ablation studies in simulation experiments to study the properties of the proposed algorithm are performed. In at least one embodiment, for real-world experiments, a vision-based algorithm is used to obtain the object pose in the first and last frame of the collected trajectories, where the object is not in occlusion. In at least one embodiment, the pose in the first frame is used to initialize the simulations, and the pose in the last frame is used to evaluate the accuracy of the proposed contact-based algorithm.
[0066] Various examples identify in-hand object-pose with vision only, usually by first segmenting out the robot or human hand in an image before performing pose estimation. However, vision-only approaches may degrade in performance for larger occlusions. Some embodiments use tactile feedback to aid object pose estimation. Tactile perception can identify object properties such as materials and pose, as well as provide feedback during object manipulation.
[0067] In at least one embodiment, experiments with dynamics models and particle filter techniques reveal that adding noise to applied forces instead of the underlying dynamics yield more accurate tracking results. At least one embodiment combines tactile feedback with a vision-based object tracker to track object trajectories during planar pushing tasks, and another applies incremental smoothing and mapping (“iSAM”) to combine global visual pose estimations with local contact pose readings.
)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Handa such that the representation pretraining and exploration are based on existing finger gaiting trajectories because these techniques frequently use existing data or trajectories in order to set a starting point or reward function for neural networks or models, and using finger gaiting trajectories would provide trajectories that should be similar to what the robot wants to do, with similar structure, and could provide information on what works and what fails.
For Claim 4, Stouraitis teaches The method of Claim 1, comprising a transition between sets of contacting groups formed by decomposing the finger-gaiting task into shorter tasks by contact groups. (
Page 9428, Column 1, Section B
B. Contact force transmission model For a smooth transition from free-motion to contact, the contact duration and contact force profile should obey the impact model shown in Fig. 3. As such (6) becomes M∆¨c+B∆˙c+K∆c =fd, (9) where fd is the desired contact force, ∆c, ∆˙c, and ∆¨c are the deformed position, velocity and acceleration. In order to plan smooth contact force without oscillations, we model the force transmission as a second-order critically damped dynamical system (cd-DS). A cd-DS [31] was first used to guarantee that the motor position is tractable, and it was further used to provide constraint consistent output for any admissible input. In this paper, we formulate a cd-DS for contact force transmission as ¨ f(t) + 2α˙ f(t) + α2f(t) = α2fd, (10) where the contact force f(t) satisfies f(t) ∈ [0,fd], while ˙ f(t) and ¨ f(t) are its first and second derivatives. For any α > 0, the contact force f(t) is critically damped.)
Stouraitis does not teach the use of learning sub policies.
Handa, however, does teach the use of policies. ([0063] In at least one embodiment, performing dexterous manipulation policies benefits from a robust estimate of the pose of the object held in-hand. However, in many implementations, in-hand object pose tracking still presents a challenge due to significant occlusions. In such implementations, works that require in-hand object poses may be limited to experiments where the object is mostly visible or rely on multiple cameras, or the hand-object transform is fixed or known. In some examples, the issue of visual occlusions is mitigated by studying object pose estimation via contacts or tactile feedback, often by using particle filters and knowledge of the object geometry and contact locations. In at least one embodiment, these techniques may be applied to a static-grasp setting, where an object is stationary and in-grasp. In at least one embodiment, these techniques are extended to tracking object poses during in-hand manipulation, requiring modeling of complex object-hand contact dynamics.)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Handa such that there is the use of learning sub policies. It would be obvious to one of ordinary skill in the art prior to the effective filing date because neural networks can be utilized to control robotic arms, and utilizing them for the periods between known trajectories could provide an effective trajectory and control scheme when a robotic controller may be at it’s most accident prone (between contact groups).
For Claim 5, Stouraitis teaches The method of Claim 4, wherein each contact group is a set of bodies in contact such as the multi-fingered robot hand, the object, and the environment. (
Page 9425, Column 2, Paragraph 1
In this work, we try to address this problem at the level of ‘impact-aware’ manipulation planning. We ask ourselves, ”How could we plan hybrid motions, such that they are easily executable by out-of-the-box controllers?”, which can be re-framed as a problem of planning such that consistent contact can be maintained during and after impact—even for tasks with contacts at speed, i.e. moving objects. As a typical example scenario, consider an agent that attempts to stop an object in motion, as shown in Figs. 1 and 2. In such a case, the agent needs to address the following challenges: • Plan discontinuous motions through contact events and physical impacts, which may result in state triggered velocity jumps described by jump maps [13], i.e., jointly plan continuous motions (flows) and contacts (jumps) to perform a task. • Track discontinuous reference motions, where the actual time of the jumps (impact) may not coincide with the jump time (impact time) of the reference motion.)
For Claim 7, Stouraitis teaches The method of Claim 1,
Stouraitis does not teach wherein tactile adaptation from visual incentives (TAVI) is used for learning.
Haldar, however, does teach wherein tactile adaptation from visual incentives (TAVI) is used for learning. (
Page 3, Section III, Intro and Section A
III. APPROACH Given a few demonstrations for complex, contact-rich ma nipulation that covers a small subset of possible object con f igurations, we seek to learn a robot policy that can gener alize to a larger set of configurations not seen during the demonstrations. To enable this, we propose Fast Imitation of Skills from Humans (FISH). FISH operates in two phases. In the first phase, a weak base policy is trained on the few demonstrations using supervised learning. This weak policy, while being poor in generalization, serves as a useful prior for subsequent adaptation. In the second phase, a residual policy is trained to adapt the base policy to new object configurations. This is done by RL on the robot with these configurations using visual trajectory matching scores as the reward signal. A. Phase 1: Non-parametric base policy The expert demonstrations are first used to derive an imper fect base policy πb. In this work, we stick to non-parametric base policies owing to their proven robustness in the low data regime [43, 6, 5] as compared to parametric alternatives such as Behavior Cloning (BC). We observe that different base policies perform differently across robots and thus, we employ Phase 1: Offline Imitation b Phase 2: Online Imitation Rewards from OT Matching r Fig. 3: A schematic of FISH. The first phase obtains a base policy through offline imitation from demonstrations. The second phase learns a residual model from online interactions two variants of non-parametric base policies in this work an open-loop policy and closed-loop Visual Imitation through Nearest Neighbors (VINN) [43]. More details about these base policies have been provided in Section IV-G. Visual representation learning: Since we operate in the visual domain, a BC policy is trained on the expert demonstra tions and we use the encoder from the BC policy to encode the visual observations o into lower dimensional representations z. The encoded representation z is provided as an input to both the base policy πb and the residual policy πr. An ablation study comparing the use of such a BC encoder with other self-supervised learning techniques [24] as well as pretrained encoders [17, 71, 49, 40] is provided in Section IV.)
Therefore it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Handa such that TAVI is used for learning because visual data is one of the types of data that can be used to provide feedback to a robotic control system. Therefore, it would be expected to be useful to use visual data to generate feedback for Stouraitis. Changing the grasping (tactile augmentation) is one method of adjust a robotic control based off of this, and would be expected to be useful at adjusting the control scheme be more effective at the task.
For Claim 8, Stouraitis teaches The method of Claim 7, comprising providing an option to be predicated by contact groups. (
Page 9427, Column 1, Section III
III. PROBLEM FORMULATION Initial v− One can observe from (1) and (2), that the investigated system has a variety of different contact states and controllers that can alter the system’s behaviour along the time axis. We refer to a single combination of a contact state and a controller as a mode of the system. The proposed notion for contact-control modes is similar to the notion of physical interaction modes introduced in [6] (see Section II-A). Here, we only consider a limited number of contact states as physical interaction modes, but we extend the notion of mode by considering a variety of different controllers. v− v+ Max deformation v =0 Final v+ Impact Deformation Restitution Fig. 3. Correspondence between Newton’s restitution model and the mass spring-damper system. The sequential arrangement of these modes zj = {(kj,lj)} defines the outline of the trajectory, while for each different sequence of contact-control modes z : {z0,z1,...zJ} there is a different optimal solution of state ∗x(t) and control ∗u(t) trajectories. J ∈ Z+ describes the total number of modes of the trajectory. Given a mode sequence, the multi-mode trajectories de scribed by (1) and (2) can be explicitly expressed as a function of the initial state and the plant’s action sequence. Inspired by [6], [8], [19], we think of impact-aware manipulation planning as a special form of Parametric Programming (PP) [28], where the sequence of modes z is encoded in the problem as For a moving object that experiences an impact, the dissipated energy—due to the impulsive force—during the collision is EΛ = 1 2Mv−2 − 1 2Mv+2 . (5) In this paper, we adopt the mass-spring-damper system to model real-world collisions [29]. The equation of motion for a mass-spring-damper system shown in Fig. 3 is written as M¨x +B˙x+Kx=−Mg, min c(x(t),u(t),v(t),z) (3a) (6) where K,B,M are the stiffness, damping and mass respec tively; g is gravity and x is the state of the system. The energy dissipation of such a system is caused by the damper and can be calculated as follows x(t),u(t),v(t) s.t. (3a)- (3d) are piecewise functions from which the appropri ate piece (interval) can be selected based on z. (3a) defines the objective function, and g(·) in (3d) represents both the equality and the inequality constraints of the system. It is worth pointing out that Optimal Control (OC) problems with hybrid dynamics are usually written as in (3), excluding (3c), while OC problems with hybrid control are usually written as in (3), excluding (3b). The formulation above defines an OC problem where both dynamics and control are hybrid. Further, we enforce (2) as a dynamical system through (3c). The details on this decision are given in the next section. Next, we consider one instantiation of such a problem— characterised as halting a moving object. For this task, the robot has to be initially soft to absorb the impact and then, stiff to accurately manipulate the object.)
For Claim 12, Stouraitis teaches A method for learning skills for robot hands, the method implemented using a computer system including a processor communicatively coupled to a memory device, the method comprising: (Page 9429 Column 2, Section V Paragraph 2
Implementation setup: We use CasADi [35] and its automatic differentiation capabilities to realize the multi mode TO method. Motion planning is done in the task space and the motions are projected into the configuration space of the robot with IK. All simulations are conducted on a 64-bit Intel Quad-Core i9 3.60GHz computer with 64GB RAM and are realized with the Bullet physics simulation library)
decomposing long-horizon tasks into sequences of shorter-horizon tasks by treating each subsequent set of contacting bodies as a separate task; ((
Page 9426, Column 1, Section II A.
A. Hybrid dynamic systems
As described in Section I, motion planning concepts for manipulation are often based on trajectories that guide an object to its desired state. In this work, we consider a class of systems where the trajectories include discontinuous transitions between different contact states. Similar to [6], [19], we describe systems with hybrid dynamics as ˙ x(t) = fk(x(t),u(t),v(t)), if (x(t),u(t)) ∈ Dk, (1) where x(t) ∈ Rn is the state of the system, u(t) ∈ Rm is the control actions of the plant, v(t) ∈ Rν is the control input applied on the environment, n,m,ν ∈ R define the dimensions of each quantity and k ∈ {0,1} indexes to the different sets Dk. Each Dk ⊂ Rn× m defines the domain (relative to x(t) and u(t)) of a contact state, i.e. free-motion or in-contact. Note that (1) defines both the plant’s and environment’s dynamics.))
augmenting a reference trajectory for each shorter task; and ((.
Page 9428, Column 2, Section C.
Multi-mode trajectory optimization for hybrid systems To solve the continuous optimization problem in (3), we discretize the trajectory according to direct transcription [32]. The transcription of our hybrid parametric optimization prob lem is an extension of the phase-based parameterization used in our previous work [7] and is similar in spirit to [6]. For each ith knot2, the decision variables are (i) the pose of the object yi, (ii) the velocity of the object ˙yi ∈ Rν, (iii) action timings ∆Ti, (iv) the end-effector’s position ci, (v) the contact force fi and the cd-DS parameter α. We group these quantities into three vectors xi = yi ˙yi ci ˙ci ¨ci T , ui = αi ∆Ti T , vi = fi ˙ fi ¨ fi T , (15) (16) (17) where ∀i ∈ N, the trajectories of xi, ui and vi describes a multi-mode motion. In addition to the decision variables, the transcription of the continuous problem can be customized through the mode sequence z. This results in a TO problem that is separated into modes with different constraints. Mode-free constraints: Here we introduce all the c on straints that are applied independently to the modes of the trajectory, i.e., constraints that are free of parameter set z. We note that ψc ∈ R2ν defines the reachable area of the agent’s end-effectors, referred to as workspace. • Initial state of the object: y0 = y∗ 0 and ˙y0 = ˙y∗ 0. yN = ˙y∗ N. • Desired final state of the object: yN = y∗ N and / or ˙ • Kinematic limits of the end-effector: ci ∈ ψc, approxi mated with box bounds. • Lower and upper bound on time between each knot: ∆Tl ≤∆Ti ≤∆Tu, ∀i∈{0,...,N}.)
Stouraitis does not teach that it is finger gaiting
using representation pretraining and exploration for learning.
Hana, however, does teach that it is finger gaiting ([0128] In some examples, for each object, in both simulation and real-world experiments, 2 demonstrations of 2 types of manipulation trajectories may be utilized: 1) pick and place with finger-grasp and in-hand object rotation, and 2) the same but with finger tips breaking and re-establishing contact during the grasp (finger gaiting). This may give a total of 24 trajectories for analysis for both simulation and real-world experiments. In both trajectory types, the object may undergo translational and rotational slippage from both inertial forces and push-contacts with the table. Each trajectory may last about a minute. In various embodiments, the pose estimation algorithm may be run at approximately 30 Hz, which may result in a total of about 2k frames per trajectory.)
using representation pretraining and exploration for learning. ([0153] FIG. 15 illustrates training and deployment of a deep neural network, according to at least one embodiment. In at least one embodiment, untrained neural network 91506 is trained using a training dataset 1502. In at least one embodiment, training framework 1504 is a PyTorch framework, whereas in other embodiments, training framework 1504 is a Tensorflow, Boost, Caffe, Microsoft Cognitive Toolkit/CNTK, MXNet, Chainer, Keras, Deeplearning4j, or other training framework. In at least one embodiment training framework 1504 trains an untrained neural network 1506 and enables it to be trained using processing resources described herein to generate a trained neural network 1508. In at least one embodiment, weights may be chosen randomly or by pre-training using a deep belief network. In at least one embodiment, training may be performed in either a supervised, partially supervised, or unsupervised manner.
[0115] In various embodiments, a population-based optimization (“PBO”) algorithm may be utilized. In at least one embodiment, the PBO algorithm ranks all simulations by their average costs and finds the top K.sub.best simulations with the lowest costs. In at least one embodiment, the algorithm exploits by replacing the remaining K-K.sub.best simulations with copies of the K.sub.best ones, sampled with replacement, and explores by perturbing the K.sub.best simulations in the same way as WRS. In at least one embodiment, PBO effectively uses a shaped cost that depends only on the relative ordering of the simulation costs and not their magnitudes, potentially making the optimizer more robust to noisy costs.
[0116] In at least one embodiment, the above described optimizers utilize a distribution-shaping hyperparameter used to balance exploration with exploitation. In at least one embodiment, various embodiments may use combinations of additional hyperparameters such as the following: [0117] T, which may represent the time steps an algorithm may wait for every update. [0118] K, which may represent the number of concurrent simulations. [0119] θ.sub.0, which may represent the initial normal distribution over simulation parameters. [0120] Σ.sub.p, which may represent the diagonal covariance matrix for the normal distribution over initial pose perturbation. [0121] Σ.sub.θ and Σ.sub.v, which may represent the diagonal covariances of normal distributions of perturbations used for exploration.
[0122] A larger K may be generally better than a smaller K, with the caveat that the resulting simulation may be slower and may not be practical in application. Σ.sub.p may be large enough such that the actual initial pose is well represented in the initial pose distribution. However, K may be increased with a larger Σ.sub.p and the convariance of θ.sub.0 to ensure that the density of the samples may be high enough to capture a wider distribution.
[0123] In at least one embodiment, there are two additional trade-offs with these hyperparameters. In at least one embodiment, one trade-off is the exploration-exploitation trade-off in the context of optimizing for θ, and the other is the trade-off between optimizing for θ and for p.sub.t.sup.(i*). In at least one embodiment, making Σ.sub.θ or Σ.sub.v wider increases the speed at which the set of simulation parameters “move,” and the optimizer may explore more than it exploits. In at least one embodiment, increasing T improves the optimization for θ as the optimizer may have more samples to evaluate each simulation. However, updating the simulation parameters too slowly may lead to drift in pose estimation if the least-cost simulation is sufficiently different from the real world, potentially leading to divergent behavior in some examples. In some examples, divergent behavior may occur when force perturbation or some simulation parameters lead to an irrecoverable configuration, where the object falls out of the hand or brings the object into a pose such that small force perturbations cannot bring it back to the correct pose. In some examples, this may be acceptable if a few samples become divergent. Their costs may be high, so in some embodiments, they may be discarded and replaced by ones that are not divergent during optimizer updates.
)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Handa such that the task is for finger gaiting, and that representation pretraining and exploration for learning is used because objects are frequently picked up using fingered robots, and therefore it would also be expected to be useful to use the methods of Stouraitis with fingered robots, and representation pretraining and exploration learning are known tools for teaching robots to perform tasks, and it would also be expected to be successful at pick and place tasks that include fingers.
For Claim 13, Stouraitis teaches The method of Claim 12, comprising learning sub policies to transition between sets of contacting groups formed by decomposing the task into shorter tasks by contact groups, wherein each contact group is a set of bodies in contact such as the robot hand, the object, and the environment. ((
Page 9426, Column 1, Section II A.
A. Hybrid dynamic systems As described in Section I, motion planning concepts for manipulation are often based on trajectories that guide an object to its desired state. In this work, we consider a class of systems where the trajectories include discontinuous transitions between different contact states. Similar to [6], [19], we describe systems with hybrid dynamics as ˙ x(t) = fk(x(t),u(t),v(t)), if (x(t),u(t)) ∈ Dk, (1) where x(t) ∈ Rn is the state of the system, u(t) ∈ Rm is the control actions of the plant, v(t) ∈ Rν is the control input applied on the environment, n,m,ν ∈ R define the dimensions of each quantity and k ∈ {0,1} indexes to the different sets Dk. Each Dk ⊂ Rn× m defines the domain (relative to x(t) and u(t)) of a contact state, i.e. free-motion or in-contact. Note that (1) defines both the plant’s and environment’s dynamics.)
Page 9425, Column 2, Paragraph 1
In this work, we try to address this problem at the level of ‘impact-aware’ manipulation planning. We ask ourselves, ”How could we plan hybrid motions, such that they are easily executable by out-of-the-box controllers?”, which can be re-framed as a problem of planning such that consistent contact can be maintained during and after impact—even for tasks with contacts at speed, i.e. moving objects. As a typical example scenario, consider an agent that attempts to stop an object in motion, as shown in Figs. 1 and 2. In such a case, the agent needs to address the following challenges: • Plan discontinuous motions through contact events and physical impacts, which may result in state triggered velocity jumps described by jump maps [13], i.e., jointly plan continuous motions (flows) and contacts (jumps) to perform a task. • Track discontinuous reference motions, where the actual time of the jumps (impact) may not coincide with the jump time (impact time) of the reference motion.))
Stouraitis does not teach that the skills are finger-gaiting.
Handa, however, does teach that the skills are finger-gaiting. ([0128] In some examples, for each object, in both simulation and real-world experiments, 2 demonstrations of 2 types of manipulation trajectories may be utilized: 1) pick and place with finger-grasp and in-hand object rotation, and 2) the same but with finger tips breaking and re-establishing contact during the grasp (finger gaiting). This may give a total of 24 trajectories for analysis for both simulation and real-world experiments. In both trajectory types, the object may undergo translational and rotational slippage from both inertial forces and push-contacts with the table. Each trajectory may last about a minute. In various embodiments, the pose estimation algorithm may be run at approximately 30 Hz, which may result in a total of about 2k frames per trajectory.)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Handa such that the task is for finger gaiting, because objects are frequently picked up using fingered robots, and therefore it would also be expected to be useful to use the methods of Stouraitis with fingered robots as it would be expected to be successful at pick and place tasks that include fingers.
For Claim 17, Stouraitis teaches The method of Claim 12,
Stouraitis does not teach wherein tactile adaptation from visual incentives (TAVI) is used for learning.
Haldar, however, does teach wherein tactile adaptation from visual incentives (TAVI) is used for learning. (
Page 3, Section III, Intro and Section A
III. APPROACH Given a few demonstrations for complex, contact-rich ma nipulation that covers a small subset of possible object con f igurations, we seek to learn a robot policy that can gener alize to a larger set of configurations not seen during the demonstrations. To enable this, we propose Fast Imitation of Skills from Humans (FISH). FISH operates in two phases. In the first phase, a weak base policy is trained on the few demonstrations using supervised learning. This weak policy, while being poor in generalization, serves as a useful prior for subsequent adaptation. In the second phase, a residual policy is trained to adapt the base policy to new object configurations. This is done by RL on the robot with these configurations using visual trajectory matching scores as the reward signal. A. Phase 1: Non-parametric base policy The expert demonstrations are first used to derive an imper fect base policy πb. In this work, we stick to non-parametric base policies owing to their proven robustness in the low data regime [43, 6, 5] as compared to parametric alternatives such as Behavior Cloning (BC). We observe that different base policies perform differently across robots and thus, we employ Phase 1: Offline Imitation b Phase 2: Online Imitation Rewards from OT Matching r Fig. 3: A schematic of FISH. The first phase obtains a base policy through offline imitation from demonstrations. The second phase learns a residual model from online interactions two variants of non-parametric base policies in this work an open-loop policy and closed-loop Visual Imitation through Nearest Neighbors (VINN) [43]. More details about these base policies have been provided in Section IV-G. Visual representation learning: Since we operate in the visual domain, a BC policy is trained on the expert demonstra tions and we use the encoder from the BC policy to encode the visual observations o into lower dimensional representations z. The encoded representation z is provided as an input to both the base policy πb and the residual policy πr. An ablation study comparing the use of such a BC encoder with other self-supervised learning techniques [24] as well as pretrained encoders [17, 71, 49, 40] is provided in Section IV.)
Therefore it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Handa such that TAVI is used for learning because visual data is one of the types of data that can be used to provide feedback to a robotic control system. Therefore, it would be expected to be useful to use visual data to generate feedback for Stouraitis. Changing the grasping (tactile augmentation) is one method of adjust a robotic control based off of this, and would be expected to be useful at adjusting the control scheme be more effective at the task.
For Claim 19, Stouraitis teaches A non-transitory computer readable medium comprising a plurality of instructions which, when executed by a processor, cause the processor to: (Page 9429 Column 2, Section V Paragraph 2
Implementation setup: We use CasADi [35] and its automatic differentiation capabilities to realize the multi mode TO method. Motion planning is done in the task space and the motions are projected into the configuration space of the robot with IK. All simulations are conducted on a 64-bit Intel Quad-Core i9 3.60GHz computer with 64GB RAM and are realized with the Bullet physics simulation library)
decompose long-horizon tasks into sequences of shorter-horizon tasks by treating desired movements in each subsequent set of contacting bodies as a separate task; ((
Page 9426, Column 1, Section II A.
A. Hybrid dynamic systems
As described in Section I, motion planning concepts for manipulation are often based on trajectories that guide an object to its desired state. In this work, we consider a class of systems where the trajectories include discontinuous transitions between different contact states. Similar to [6], [19], we describe systems with hybrid dynamics as ˙ x(t) = fk(x(t),u(t),v(t)), if (x(t),u(t)) ∈ Dk, (1) where x(t) ∈ Rn is the state of the system, u(t) ∈ Rm is the control actions of the plant, v(t) ∈ Rν is the control input applied on the environment, n,m,ν ∈ R define the dimensions of each quantity and k ∈ {0,1} indexes to the different sets Dk. Each Dk ⊂ Rn× m defines the domain (relative to x(t) and u(t)) of a contact state, i.e. free-motion or in-contact. Note that (1) defines both the plant’s and environment’s dynamics.))
augment a reference trajectory for each shorter task; and ((.
Page 9428, Column 2, Section C.
Multi-mode trajectory optimization for hybrid systems To solve the continuous optimization problem in (3), we discretize the trajectory according to direct transcription [32]. The transcription of our hybrid parametric optimization prob lem is an extension of the phase-based parameterization used in our previous work [7] and is similar in spirit to [6]. For each ith knot2, the decision variables are (i) the pose of the object yi, (ii) the velocity of the object ˙yi ∈ Rν, (iii) action timings ∆Ti, (iv) the end-effector’s position ci, (v) the contact force fi and the cd-DS parameter α. We group these quantities into three vectors xi = yi ˙yi ci ˙ci ¨ci T , ui = αi ∆Ti T , vi = fi ˙ fi ¨ fi T , (15) (16) (17) where ∀i ∈ N, the trajectories of xi, ui and vi describes a multi-mode motion. In addition to the decision variables, the transcription of the continuous problem can be customized through the mode sequence z. This results in a TO problem that is separated into modes with different constraints. Mode-free constraints: Here we introduce all the c on straints that are applied independently to the modes of the trajectory, i.e., constraints that are free of parameter set z. We note that ψc ∈ R2ν defines the reachable area of the agent’s end-effectors, referred to as workspace. • Initial state of the object: y0 = y∗ 0 and ˙y0 = ˙y∗ 0. yN = ˙y∗ N. • Desired final state of the object: yN = y∗ N and / or ˙ • Kinematic limits of the end-effector: ci ∈ ψc, approxi mated with box bounds. • Lower and upper bound on time between each knot: ∆Tl ≤∆Ti ≤∆Tu, ∀i∈{0,...,N}.))
Stouraitis does not teach that it is finger gaiting
use representation pretraining and exploration by pretraining on the reference trajectory of each shorter task; and
use exploration for learning by generating exploratory actions based on the reference trajectory of the shorter task.
Hana, however, does teach that it is finger gaiting ([0128] In some examples, for each object, in both simulation and real-world experiments, 2 demonstrations of 2 types of manipulation trajectories may be utilized: 1) pick and place with finger-grasp and in-hand object rotation, and 2) the same but with finger tips breaking and re-establishing contact during the grasp (finger gaiting). This may give a total of 24 trajectories for analysis for both simulation and real-world experiments. In both trajectory types, the object may undergo translational and rotational slippage from both inertial forces and push-contacts with the table. Each trajectory may last about a minute. In various embodiments, the pose estimation algorithm may be run at approximately 30 Hz, which may result in a total of about 2k frames per trajectory.)
use representation pretraining and exploration by pretraining on the reference trajectory of each task; and
use exploration for learning by generating exploratory actions based on the reference trajectory of the task.
. ([0153] FIG. 15 illustrates training and deployment of a deep neural network, according to at least one embodiment. In at least one embodiment, untrained neural network 91506 is trained using a training dataset 1502. In at least one embodiment, training framework 1504 is a PyTorch framework, whereas in other embodiments, training framework 1504 is a Tensorflow, Boost, Caffe, Microsoft Cognitive Toolkit/CNTK, MXNet, Chainer, Keras, Deeplearning4j, or other training framework. In at least one embodiment training framework 1504 trains an untrained neural network 1506 and enables it to be trained using processing resources described herein to generate a trained neural network 1508. In at least one embodiment, weights may be chosen randomly or by pre-training using a deep belief network. In at least one embodiment, training may be performed in either a supervised, partially supervised, or unsupervised manner.
[0115] In various embodiments, a population-based optimization (“PBO”) algorithm may be utilized. In at least one embodiment, the PBO algorithm ranks all simulations by their average costs and finds the top K.sub.best simulations with the lowest costs. In at least one embodiment, the algorithm exploits by replacing the remaining K-K.sub.best simulations with copies of the K.sub.best ones, sampled with replacement, and explores by perturbing the K.sub.best simulations in the same way as WRS. In at least one embodiment, PBO effectively uses a shaped cost that depends only on the relative ordering of the simulation costs and not their magnitudes, potentially making the optimizer more robust to noisy costs.
[0116] In at least one embodiment, the above described optimizers utilize a distribution-shaping hyperparameter used to balance exploration with exploitation. In at least one embodiment, various embodiments may use combinations of additional hyperparameters such as the following: [0117] T, which may represent the time steps an algorithm may wait for every update. [0118] K, which may represent the number of concurrent simulations. [0119] θ.sub.0, which may represent the initial normal distribution over simulation parameters. [0120] Σ.sub.p, which may represent the diagonal covariance matrix for the normal distribution over initial pose perturbation. [0121] Σ.sub.θ and Σ.sub.v, which may represent the diagonal covariances of normal distributions of perturbations used for exploration.
[0122] A larger K may be generally better than a smaller K, with the caveat that the resulting simulation may be slower and may not be practical in application. Σ.sub.p may be large enough such that the actual initial pose is well represented in the initial pose distribution. However, K may be increased with a larger Σ.sub.p and the convariance of θ.sub.0 to ensure that the density of the samples may be high enough to capture a wider distribution.
[0123] In at least one embodiment, there are two additional trade-offs with these hyperparameters. In at least one embodiment, one trade-off is the exploration-exploitation trade-off in the context of optimizing for θ, and the other is the trade-off between optimizing for θ and for p.sub.t.sup.(i*). In at least one embodiment, making Σ.sub.θ or Σ.sub.v wider increases the speed at which the set of simulation parameters “move,” and the optimizer may explore more than it exploits. In at least one embodiment, increasing T improves the optimization for θ as the optimizer may have more samples to evaluate each simulation. However, updating the simulation parameters too slowly may lead to drift in pose estimation if the least-cost simulation is sufficiently different from the real world, potentially leading to divergent behavior in some examples. In some examples, divergent behavior may occur when force perturbation or some simulation parameters lead to an irrecoverable configuration, where the object falls out of the hand or brings the object into a pose such that small force perturbations cannot bring it back to the correct pose. In some examples, this may be acceptable if a few samples become divergent. Their costs may be high, so in some embodiments, they may be discarded and replaced by ones that are not divergent during optimizer updates.
([0065] In at least one embodiment, to evaluate the proposed algorithm, a total of 24 in-hand manipulation trajectories with three different objects in simulation and in the real world were collected. In at least one embodiment, a Kuka IIWA7 arm with the 4-finger Wonik Robotics Allegro hand as the end-effector was used, with each finger outfitted with a SynTouch BioTac contact sensor. In at least one embodiment, object manipulation trajectories are human demonstrations collected via a hand-tracking teleoperation system. In at least one embodiment, because ground-truth object poses in simulation are available, detailed ablation studies in simulation experiments to study the properties of the proposed algorithm are performed. In at least one embodiment, for real-world experiments, a vision-based algorithm is used to obtain the object pose in the first and last frame of the collected trajectories, where the object is not in occlusion. In at least one embodiment, the pose in the first frame is used to initialize the simulations, and the pose in the last frame is used to evaluate the accuracy of the proposed contact-based algorithm.
[0066] Various examples identify in-hand object-pose with vision only, usually by first segmenting out the robot or human hand in an image before performing pose estimation. However, vision-only approaches may degrade in performance for larger occlusions. Some embodiments use tactile feedback to aid object pose estimation. Tactile perception can identify object properties such as materials and pose, as well as provide feedback during object manipulation.
[0067] In at least one embodiment, experiments with dynamics models and particle filter techniques reveal that adding noise to applied forces instead of the underlying dynamics yield more accurate tracking results. At least one embodiment combines tactile feedback with a vision-based object tracker to track object trajectories during planar pushing tasks, and another applies incremental smoothing and mapping (“iSAM”) to combine global visual pose estimations with local contact pose readings.)
)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Handa such that the task is for finger gaiting, and use representation pretraining and exploration by pretraining on the reference trajectory of each shorter task; and
use exploration for learning by generating exploratory actions based on the reference trajectory of the shorter task.
It would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Handa in this way because objects are frequently picked up using fingered robots, and therefore it would also be expected to be useful to use the methods of Stouraitis with fingered robots, and representation pretraining and exploration learning are known tools for teaching robots to perform tasks, and it would also be expected to be successful at pick and place tasks that include fingers. The use of a reference trajectories would be expected to be useful. these techniques frequently use existing data or trajectories in order to set a starting point or reward function for neural networks or models, and using finger gaiting trajectories would provide trajectories that should be similar to what the robot wants to do, with similar structure, and could provide information on what works and what fails.
Claims 6 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Stouraitis in light of Handa in light of Ciocarlie et al (US Pub 2024/0278423 A1), hereafter known as Ciocarlie.
For Claim 6, Stouraitis teaches The method of Claim 4,
Stouraitis does not teach wherein for a desired contact group, the object remains manipulatable/controllable during exploration.
Ciocarlie, however, does teach wherein for a desired contact group, the object remains manipulatable/controllable during exploration.
([0111] Reinforcement Learning (RL) of robot sensorimotor control policies has seen great advances in recent years, demonstrated for a wide range of motor tasks. In the case of manipulation, this has translated in higher levels of dexterity than previously possible, typically demonstrated by the ability to re-orient a grasped object in-hand using complex finger movements. However, training a sensorimotor policy is still a difficult process, particularly for problems where the underlying state space exhibits complex structure, such as “narrow passages” between parts of the space are accessible or useful. Manipulation is indeed such a problem: even when starting with the object secured between the digits, a random action can easily lead to a drop, and thus to an irrecoverable state. Finger-gaiting further implies transitions between different subsets of fingers used to hold the object, all while maintaining stability. This leads to difficulty in exploration during training, since random perturbations in the policy action space are unlikely to discover narrow passages in state space. Current studies address this difficult through a variety of means: using simple, convex objects to reduce the difficulty of the task, reliance on support surfaces to reduce the chances of a drop, object pose tracking through extrinsic sensing, etc.)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Ciocarlie such that for a contact group the object remains manipulable or controllable during exploration because exploration frequently occurs while the robotic arm is carrying out a control plan and data is being gathered while observing it. It would be expected to be useful because this would allow the system to gain real time authentic data, as opposed to mere simulation examples.
For Claim 14, Stouraitis teaches The method of Claim 12,
Stouraitis does not teach wherein for a desired contact group, the object remains manipulatable/controllable during exploration.
Ciocarlie, however, does teach wherein for a desired contact group, the object remains manipulatable/controllable during exploration.
([0111] Reinforcement Learning (RL) of robot sensorimotor control policies has seen great advances in recent years, demonstrated for a wide range of motor tasks. In the case of manipulation, this has translated in higher levels of dexterity than previously possible, typically demonstrated by the ability to re-orient a grasped object in-hand using complex finger movements. However, training a sensorimotor policy is still a difficult process, particularly for problems where the underlying state space exhibits complex structure, such as “narrow passages” between parts of the space are accessible or useful. Manipulation is indeed such a problem: even when starting with the object secured between the digits, a random action can easily lead to a drop, and thus to an irrecoverable state. Finger-gaiting further implies transitions between different subsets of fingers used to hold the object, all while maintaining stability. This leads to difficulty in exploration during training, since random perturbations in the policy action space are unlikely to discover narrow passages in state space. Current studies address this difficult through a variety of means: using simple, convex objects to reduce the difficulty of the task, reliance on support surfaces to reduce the chances of a drop, object pose tracking through extrinsic sensing, etc.)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Ciocarlie such that for a contact group the object remains manipulable or controllable during exploration because exploration frequently occurs while the robotic arm is carrying out a control plan and data is being gathered while observing it. It would be expected to be useful because this would allow the system to gain real time authentic data, as opposed to mere simulation examples.
Claims 9 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Stouraitis in light of Handa in light of Tremblay et al (US Pub 2021/0125052 A1), hereafter known as Tremblay.
For Claim 9, Stouraitis teaches The method of Claim 1,
Stouraitis does not teach comprising adding domain randomization to an initial state of the object and robot joint positions to cover a post image of a previous contact group.
Tremblay, however, does teach comprising adding domain randomization to an initial state of the object and robot joint positions to cover a post image of a previous contact group. ([0097] In at least one embodiment, a proximal policy optimization (PPO) algorithm is used to learn a policy. In at least one embodiment, a policy is represented as a simple multi-layered perceptron (MLP) with 2 hidden layers containing 128 neurons each. In at least one embodiment, during training, at a beginning of each rollout a new cuboid object is generated with dimensions uniformly sampled from a pre-specified range, keypoints of an object are estimated—noise is sampled and added to keypoint locations to simulate sensor noise present in a physical system—and passed as context to a policy. In at least one embodiment, keypoint values then remain same throughout that rollout. In at least one embodiment, to deploy a policy learned in simulation on a real robot, domain randomization is applied to objects to account for a discrepancy between a simulator and physical world. In at least one embodiment, in addition to keypoint location noise, uniform noise is added to object mass, friction coefficients between fingers and object, PD gains of a robot, and damping coefficients of robot joints. In at least one embodiment, range of uniform distribution is manually specified based on initial results on a robot.)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Tremblay such that domain randomization to initial states to cover a post image is used because it would allow the system to slightly randomize the initialization of the system before it performs the task. This allows the system to hopefully avoid local maxima/minima, and can help identify potential issues that my only arise under certain circumstances that may not be present if there is no randomization present.
For Claim 15, Stouraitis teaches The method of Claim 12,
Stouraitis does not teach comprising adding domain randomization to an initial state of the object and robot joint positions to cover a post image of a previous contact group.
Tremblay, however, does teach comprising adding domain randomization to an initial state of the object and robot joint positions to cover a post image of a previous contact group. ([0097] In at least one embodiment, a proximal policy optimization (PPO) algorithm is used to learn a policy. In at least one embodiment, a policy is represented as a simple multi-layered perceptron (MLP) with 2 hidden layers containing 128 neurons each. In at least one embodiment, during training, at a beginning of each rollout a new cuboid object is generated with dimensions uniformly sampled from a pre-specified range, keypoints of an object are estimated—noise is sampled and added to keypoint locations to simulate sensor noise present in a physical system—and passed as context to a policy. In at least one embodiment, keypoint values then remain same throughout that rollout. In at least one embodiment, to deploy a policy learned in simulation on a real robot, domain randomization is applied to objects to account for a discrepancy between a simulator and physical world. In at least one embodiment, in addition to keypoint location noise, uniform noise is added to object mass, friction coefficients between fingers and object, PD gains of a robot, and damping coefficients of robot joints. In at least one embodiment, range of uniform distribution is manually specified based on initial results on a robot.)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Tremblay such that domain randomization to initial states to cover a post image is used because it would allow the system to slightly randomize the initialization of the system before it performs the task. This allows the system to hopefully avoid local maxima/minima, and can help identify potential issues that my only arise under certain circumstances that may not be present if there is no randomization present.
Claims 10-11, 16, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Stouraitis in light of Handa in light of Thomaz et al (US Pub 2024/0157552 A1), hereafter known as Thomaz.
For Claim 10, Stouraitis teaches The method of Claim 1,
Stouraitis does not teach comprising reversing initial states, goal states, and reference trajectory for each primitive skill during training to expand ways to compose the primitive skill and to interface with user specified commands.
Thomaz, however, does teach comprising reversing initial states, goal states, and reference trajectory for each primitive skill during training to expand ways to compose the primitive skill and to interface with user specified commands. ([0224] At 2420, the user can optionally select collision primitives and/or the guided teach door state machine can optionally prompt the user to teach collision primitives. For example, the guided teach door state machine can prompt the user to place the tip of the end effector at the center of the obstacle. The guided teach door state machine can then prompt the user to enter the obstacle (e.g., by pressing an enter button on a display) when the user is ready to save the obstacle as a collision primitive. In some embodiments, the guided teach door state machine can prompt the user to provide the length, height, and/or other geometric or spatial features of the obstacle. In some embodiments, the width (or other geometric or spatial features) of the obstacle can be inferred from the placement of the end effector.)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Thomaz so that the system will reverse initial states to expand ways to compose primitive skills because it would allow more situations to be used to test and improve upon the primitive trajectories. If the system were not able to reset, the system would not be able to continue experimentations from a particular point. Users would not be able to determine new conditions, because the system would have to continue from where it is when learning. Therefore, this would be expected to be useful.
For Claim 11, Stouraitis teaches The method of Claim 1,
Stouraitis does not teach comprising providing a task graph for user interactions to choose a primitive skill to use and to pause or reverse the skill.
Thomaz, however, does teach comprising providing a task graph for user interactions to choose a primitive skill to use and to pause or reverse the skill. ([0079] Graphics processor 205 can be any suitable processing device configured to run and/or execute one or more display functions, e.g., functions associated with display device 242. In some embodiments, graphics processor 205 can be a low-powered graphics processing unit such as, for example, a dedicated graphics card or an integrated graphics processing unit.
[0224] At 2420, the user can optionally select collision primitives and/or the guided teach door state machine can optionally prompt the user to teach collision primitives. For example, the guided teach door state machine can prompt the user to place the tip of the end effector at the center of the obstacle. The guided teach door state machine can then prompt the user to enter the obstacle (e.g., by pressing an enter button on a display) when the user is ready to save the obstacle as a collision primitive. In some embodiments, the guided teach door state machine can prompt the user to provide the length, height, and/or other geometric or spatial features of the obstacle. In some embodiments, the width (or other geometric or spatial features) of the obstacle can be inferred from the placement of the end effector.)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Thomaz such that there is a task graph for a user to select a primitive skill to pause or reverse the skill. Allowing the user to set the primitive skill would allow the user some control over what kind of skills the system is learning, and what kind of approaches the system should be focusing on. By having a graph for the user to interact with the system, they can input that, and by changing it, they can reset other strategies that are proving less effective.
For Claim 16, Stouraitis teaches The method of Claim 12,
Stouraitis does not teach comprising reversing initial states, goal states, and reference trajectory for each primitive skill during training to expand ways to compose the primitive skill and to interface with user specified commands.
Thomaz, however, does teach comprising reversing initial states, goal states, and reference trajectory for each primitive skill during training to expand ways to compose the primitive skill and to interface with user specified commands. ([0224] At 2420, the user can optionally select collision primitives and/or the guided teach door state machine can optionally prompt the user to teach collision primitives. For example, the guided teach door state machine can prompt the user to place the tip of the end effector at the center of the obstacle. The guided teach door state machine can then prompt the user to enter the obstacle (e.g., by pressing an enter button on a display) when the user is ready to save the obstacle as a collision primitive. In some embodiments, the guided teach door state machine can prompt the user to provide the length, height, and/or other geometric or spatial features of the obstacle. In some embodiments, the width (or other geometric or spatial features) of the obstacle can be inferred from the placement of the end effector.)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Thomaz so that the system will reverse initial states to expand ways to compose primitive skills because it would allow more situations to be used to test and improve upon the primitive trajectories. If the system were not able to reset, the system would not be able to continue experimentations from a particular point. Users would not be able to determine new conditions, because the system would have to continue from where it is when learning. Therefore, this would be expected to be useful.
For Claim 18, Stouraitis teaches The method of Claim 12,
Stouraitis does not teach comprising providing a task graph for user interactions to choose a primitive skill to use and to pause or reverse the skill.
Thomaz, however, does teach comprising providing a task graph for user interactions to choose a primitive skill to use and to pause or reverse the skill. ([0079] Graphics processor 205 can be any suitable processing device configured to run and/or execute one or more display functions, e.g., functions associated with display device 242. In some embodiments, graphics processor 205 can be a low-powered graphics processing unit such as, for example, a dedicated graphics card or an integrated graphics processing unit.
[0224] At 2420, the user can optionally select collision primitives and/or the guided teach door state machine can optionally prompt the user to teach collision primitives. For example, the guided teach door state machine can prompt the user to place the tip of the end effector at the center of the obstacle. The guided teach door state machine can then prompt the user to enter the obstacle (e.g., by pressing an enter button on a display) when the user is ready to save the obstacle as a collision primitive. In some embodiments, the guided teach door state machine can prompt the user to provide the length, height, and/or other geometric or spatial features of the obstacle. In some embodiments, the width (or other geometric or spatial features) of the obstacle can be inferred from the placement of the end effector.)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Thomaz such that there is a task graph for a user to select a primitive skill to pause or reverse the skill. Allowing the user to set the primitive skill would allow the user some control over what kind of skills the system is learning, and what kind of approaches the system should be focusing on. By having a graph for the user to interact with the system, they can input that, and by changing it, they can reset other strategies that are proving less effective.
For Claim 20, Stouraitis teaches The non-transitory computer readable medium according to Claim 19,
Stouraitis does not teach wherein the instructions, when executed by the processor, causes the processor to provide a task graph for user interactions to choose a primitive skill to use and to pause or reverse the skill.
Thomaz, however, does teach wherein the instructions, when executed by the processor, causes the processor to provide a task graph for user interactions to choose a primitive skill to use and to pause or reverse the skill. ([0079] Graphics processor 205 can be any suitable processing device configured to run and/or execute one or more display functions, e.g., functions associated with display device 242. In some embodiments, graphics processor 205 can be a low-powered graphics processing unit such as, for example, a dedicated graphics card or an integrated graphics processing unit.
[0224] At 2420, the user can optionally select collision primitives and/or the guided teach door state machine can optionally prompt the user to teach collision primitives. For example, the guided teach door state machine can prompt the user to place the tip of the end effector at the center of the obstacle. The guided teach door state machine can then prompt the user to enter the obstacle (e.g., by pressing an enter button on a display) when the user is ready to save the obstacle as a collision primitive. In some embodiments, the guided teach door state machine can prompt the user to provide the length, height, and/or other geometric or spatial features of the obstacle. In some embodiments, the width (or other geometric or spatial features) of the obstacle can be inferred from the placement of the end effector.)
Therefore, it would be obvious to one of ordinary skill in the art prior to the effective filing date to modify Stouraitis in light of Thomaz such that there is a task graph for a user to select a primitive skill to pause or reverse the skill. Allowing the user to set the primitive skill would allow the user some control over what kind of skills the system is learning, and what kind of approaches the system should be focusing on. By having a graph for the user to interact with the system, they can input that, and by changing it, they can reset other strategies that are proving less effective.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Payton et al (US Pub 2015/0336268 A1) relates to learning of tasks for a robot with fingers.
Solowjow et al (US Pub 2021/0107142 A1) relates to reinforcement learning for a robot.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRISTAN J GREINER whose telephone number is (571)272-1382. The examiner can normally be reached Mon - Fri 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Khoi Tran can be reached at Monday-Thursday. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/T.J.G./Examiner, Art Unit 3656 /KHOI H TRAN/Supervisory Patent Examiner, Art Unit 3656