Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In the response filed 3/7/25, applicant cancelled claims 2 and 4, amended claims 1, 3, 5-12, and did not add any new claims. Claims 1, 3, and 5-12 are pending.
Response to Arguments
In view of the amendments filed 3/7/25 and the interview conducted 2/21/25, the rejection of claims 1, 3 and 5-12 under 35 USC 101 has been withdrawn.
Applicant’s arguments with respect to claim(s) 1,3 and 5-12 have been considered but are moot; Applicant argues that independent claims as amended are patentable, thus the other claims are patentable at least because they depend from patentable base independent claims. The rejection of claims 1, 3 and 5-12 under 35 USC 102(a)(2) is withdrawn; a new rejection under 35 USC 103 is set forth as necessitated by amendment.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1, 3 and 5-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ijiri et al. (US PGPub 2021/0283771) in view of Tan et al. (US PGPub 20200039076) in view of Hoshino et al. (US Patent 7,848,850).
Regarding Claim 1, Ijiri teaches a control device (see at least Fig. 8, “The control apparatus 80”) comprising:
at least one memory storing instructions; and at least one processor configured to execute the instructions to (see at least paragraph 92: “The control apparatus 80 includes, as[AltContent: ] functional configurations, a data obtaining unit 81, a success judgment unit 82, a controller 84, and a control command transmission unit 89. The controller 84 has an operation determination unit 87 and a plurality of policies (learning models) 88. The functional configurations are realized by the CPU 31 reading a learning program stored in the ROM 32 or the storage 34, and loading to and executing the learning program in the RAM 33. Note that, at least one or all of the functions may be realized by a dedicated hardware apparatus”):
acquire operation policies relating to an operation of a robot (see at least paragraph 95: “The controller 84 has the operation determination unit 87 and the policies (learning models) 88. The operation determination unit 87 determines the next action to be made by the robot 10, based on state observation data obtained by the data obtaining unit 81 and the policies 88”);
generate a control command of the robot by combining the operation policies (see at least paragraph 97: “The control command transmission unit 89 generates and transmits commands to the robot 10 based on actions output according to the policies 88”).
Although not explicitly taught by Ijiri, Tan et al. teaches:
the operational policies including at least two of attraction, avoidance, and retention (paragraph 32 – The processing unit can generate an environmental model using the environmental information. The environmental information includes information describing, depicting or corresponding to the environment surrounding the arm and/or the target, which may be used to determine or plan a path from the arm to the target that may be followed by the arm. In some embodiments, the desired movement is a movement of the arm…. toward a target; paragraph 34 – The depicted processing unit is also configured to select, from a plurality of planning schemes, at least one planning scheme to translate the arm toward the target. For example, using the relative location of the target and the arm… as well as the location of any identified obstacles between the arm and the target, a path may be selected between the arm contact portion and the target. Depending on the shape of the path and/or complexity (e.g., the number and location of obstacles to be avoided), a planning scheme may be selected. As used herein, a planning scheme is a plan that sets forth a trajectory or path of the arm along a shape (or shapes) of a path as defined by a determined coordinate system. Accordingly, in various embodiments, each planning scheme of the plurality of schemes is defined by path shape or type and a coordinate system. In various embodiments, the at least one planning scheme may be selected to reduce or minimize time of motion and/or computational requirements while providing sufficient complexity to avoid any obstacles between the arm and the target. Generally, a motion planning scheme or algorithm is selected in various embodiments to provide for movement of the arm within a desired time frame or at a desired speed);
respectively acquire for the operation policies, evaluation indices for evaluation of the operation of the robot based on the generated control command (paragraph 34 – time or motion and/or computational requirements…. Desired time frame or… desired speed);
perform the evaluation of the operation based on the evaluation indices; update parameters in the operation policies, based on the evaluation; regenerate the control command by combining the operation policies in which the parameters have been updated; and control the robot in accordance with the regenerated control command (paragraph 42 - The processing unit may then dynamically re-plan movement of the arm (e.g., during movement of the arm) using the additional environmental information. For example, due to motion of the target during movement of the arm, a previously used motion plan and/or planning scheme used to generate the motion plan may no longer be appropriate, or a better planning scheme may be available to address the new position of the target. Accordingly, the processing unit in various embodiments uses the additional environmental information obtained during motion of the arm to re-plan the motion using an initially utilized planning scheme and/or re-plans the motion using a different planning scheme; paragraph 43 -For example, the processing unit may use a first planning scheme for an initial planned movement using the environmental information (e.g., originally or initially obtained environmental information acquired before motion of the arm), and use a different, second planning scheme for revised planned movement using additional environmental information (e.g., environmental information obtain during movement of the arm or after an initial movement of the arm). For example, a first planning scheme may plan a motion to an intermediate point short of the target at which the arm stops, additional environmental information acquired, and the remaining motion toward the target may be planned using a second planning scheme. As another example, a first planning scheme may be used to plan an original motion; however, an obstacle may be discovered during movement, or the target may be determined to move during the motion of the arm, and a second planning scheme used to re-plan the motion. For instance, in one example scenario, an initial motion is planned using a point-to-point in joint space planning scheme. However, an obstacle may be discovered while the arm is in motion, and the motion may be re-planned using linear trajectory planning in Cartesian space to avoid the obstacle. In some embodiments, the re-planned motion in Cartesian space may be displayed to an operator for approval or modification).
Ijiri does not explicitly teach combining the operation policies includes calculating output of respective joints in the operation policies for each control period and calculating a linear sum of the calculated output of the respective joints.
However, Hoshino et al. teaches converting sixteen values outputted from the sensors of a data glove into X-coordinate, Y-coordinate, Z-coordinate, pitch angle, yaw angle and roll angle with respect to each of the joints, and using a multiple regression expression for converting data obtained through the data glove to the command values for controlling the robot hand. In order to express the joints of the robot hand using a multiple regression expression with respect to the respective joints of the robot hand, i.e., weighted linear sum of the every output from the data glove, it is necessary to carry out multiple regression analysis by inputting the joint angle command values to the driving device in accordance with predetermined time series joint angle patterns to operate the robot hand (column 9, lines 4-34, Expression 1).
Ijiri, Hoshino and Tan are all directed towards controlling the operations of a robot with gripper or fingers; thus, they are deemed to be analogous references. It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the teachings of Ijiri to include the use of attraction, avoidance, and retention as operational policies used to generate and evaluate control commands of the robot using evaluation parameters in updating operational policies and controlling the robot, as taught by Tan, and calculating linear sums in the evaluation of operational policies, as taught by Hoshino because doing so allows for the optimal decision making in how to control and operate the robot in order to achieve its goals while minimizing cost.
Claim 11 recites substantially similar limitations as those in claim 1 above; thus, the same rejection applies.
Claim 12 recites substantially similar limitations as those in claim 1 above; thus, the same rejection applies.
Further Ijiri teaches a non-transitory computer readable storage medium storing a program executable by a computer (see at least Fig. 8 and paragraph 133: “A non-transitory computer- readable storage medium may be provided that stores a program, which when read an executed, causes a computer to perform operations according”).
Regarding Claim 3, Ijiri teaches the limitations of claim 1 as described above. Ijiri further teaches
wherein the at least one processor is configured to execute the instructions to acquire, for each of the operation policies, and an evaluation index selected based on a user input from plural candidates for the evaluation index (see at least paragraph 54: “a learning program for executing learning processing of a learning model is stored in the ROM 32 or the storage 34. The CPU 31 is a central processing unit and executes various types of programs and controls the constituent elements. That is, the CPU 31 reads a program from the ROM 32 or the storage 34, and executes the program using the RAM 33 as a work area. The CPU 31 controls the constituent elements and performs various types of arithmetic processing according to a program recorded in the ROM 32 or the storage 34. The ROM 32 stores various types of programs and data. The RAM 33 acts as a work area where programs or data is temporarily stored. The storage 34 is configured by an HDD (Hard Disk Drive), an SSD (Solid State Drive), or a flash memory, and stores various types of programs including an operating system, and various types of data. The keyboard 35 and the mouse 36 are examples of an input apparatus, and are used to make various types of input. The monitor 37 is, for example, a liquid crystal display, and displays a user interface. The monitor 37 may be of a touch-panel type and function as an input unit.”).
Regarding Claim 5, Ijiri teaches the limitations of claim 1 as described above. Ijiri further teaches
wherein the at least one processor is configured to execute the instructions to conduct the
evaluation for each of the operation policies based on an evaluation index for each of the operation policies, and
wherein the at least one processor is configured to execute the instructions to learn a
learning target parameter for each of the operation policies based on the evaluation for each of the operation policies (see at least paragraph 84-86: “the state transition model updating unit 25 updates the state transition model according to Gaussian process regression, based on the data obtained in
step S71. Gaussian process regression is nonparametric regression that determines the form of functions based on data, and realizes non-linear expressions. Gaussian process regression can also express the unreliability (unreliability due to noise or lack of data) of a model through stochastic prediction. In the present method, the input of a model is a state (position, speed, angle, and angular velocity of a gripper) and an action (target speed command to arm leading end) at a timing t, and the output is the state at the following timing t+1. In step S73, the learning apparatus 20 uses a mid- learning state transition model to learn the policy 28. A policy is a map ?(u|x) that determines the next action u to be taken in a state x. In one or more embodiments, a definitive policy according to a non-linear function (only a predicted average of the Gaussian process is used) is used. The policy is expressed by policy parameters ? (width and basis of a Gaussian kernel). The policy updating unit 26 determines the policy parameters ? that maximize the cumulative expected reward on a predetermined prediction horizon t=0, . . . , T. The cumulative expected reward is calculated by predicting, based on a model, state transitions from an initial state to a T step. In PILCO, the gradient of the cumulative expected reward can be analytically acquired, and the policy parameters ? can be acquired using a common gradient method (conjugate gradient method or L-BFGS). In step S74, a determination is made as to whether the learning apparatus has completed learning of the policy 28”).
Regarding Claim 6, Ijiri teaches the limitations of claim 1 as described above. Ijiri further teaches wherein the at least one processor is configured to execute the instructions to acquire a learning target parameter selected based on a user input from candidates for the learning target parameter (see at least paragraph 54: “a learning program for executing learning processing of a learning model is stored in the ROM 32 or the storage 34. The CPU 31 is a central processing unit and executes various types of programs and controls the constituent elements. That is, the CPU 31 reads a program from the ROM 32 or the storage 34, and executes the program using the RAM 33 as a work area. The CPU 31 controls the constituent elements and performs various types of arithmetic processing according to a program recorded in the ROM 32 or the storage 34. The ROM 32 stores various types of programs and data. The RAM 33 acts as a work area where programs or data is temporarily stored. The storage 34 is configured by an HDD (Hard Disk Drive), an SSD (Solid State Drive), or a flash memory, and stores various types of programs including an operating system, and various types of data. The keyboard 35 and the mouse 36 are examples of an input apparatus, and are used to make various types of input. The monitor 37 is, for example, a liquid crystal display, and displays a user interface. The monitor 37 may be of a touch-panel type and function as an input unit.”), and
wherein the at least one processor is configured to execute the instructions to update a value
of the learning target parameter (see at least paragraph 85: “he learning apparatus 20 uses a mid- learning state transition model to learn the policy 28. A policy is a map ?(u|x) that determines the next action u to be taken in a state x. In one or more embodiments, a definitive policy according to a non-linear function (only a predicted average of the Gaussian process is used) is used. The policy is expressed by policy parameters ? (width and basis of a Gaussian kernel). The policy updating unit 26 determines the policy parameters ? that maximize the cumulative expected reward on a predetermined prediction horizon t=0, . . . , T. The cumulative expected reward is calculated by predicting, based on a model, state transitions from an initial state to a T step. In PILCO, the gradient of the cumulative expected reward can be analytically acquired, and the policy parameters ? can be acquired using a common gradient method (conjugate gradient method or L-BFGS)”).
Regarding Claim 7, Ijiri teaches the limitations of claim 1 as described above. Ijiri further teaches wherein the at least one processor is configured to execute the instructions to acquire the operation policies (see at least paragraph 95: “The controller 84 has the operation determination unit 87 and the policies (learning models) 88. The operation determination unit 87 determines the next action to be made by the robot 10, based on state observation data obtained by the data obtaining unit 81 and the policies 88”) selected based on user input from candidates for the operation policies of the robot (see at least paragraph 54: “a learning program for executing learning processing of a learning model is stored in the ROM 32 or the storage 34. The CPU 31 is a central processing unit and executes various types of programs and controls the constituent elements. That is, the CPU 31 reads a program from the ROM 32 or the storage 34, and executes the program using the RAM 33 as a work area. The CPU 31 controls the constituent elements and performs various types of arithmetic processing according to a program recorded in the ROM 32 or the storage 34. The ROM 32 stores various types of programs and data. The RAM 33 acts as a work area where programs or data is temporarily stored. The storage 34 is configured by an HDD (Hard Disk Drive), an SSD (Solid State Drive), or a flash memory, and stores various types of programs including an operating system, and various types of data. The keyboard 35 and the mouse 36 are examples of an input apparatus, and are used to make various types of input. The monitor 37 is, for example, a liquid crystal display, and displays a user interface. The monitor 37 may be of a touch-panel type and function as an input unit.”).
Regarding Claim 8, Ijiri teaches the limitations of claim 7 as described above. Ijiri further teaches wherein the operation policies are each a control law for controlling a target state of a point of action of the robot in accordance with a state variable, and wherein the at least one processor is configured to execute the instructions to acquire information specifying the point of action and the state variable (see at least paragraph 35: “performing machine learning of a control model requires that a very large amount of data be collected, and the learning takes time. Thus, in the robot system 1, learning in which state space and action space are subjected to dimension reduction is performed in consideration of constraint conditions according to the flexibility of the robot and contact between the robot and its environment. For example, in learning of an operation to fit a peg into a hole, the entire operation is divided into segments (Motion Primitives (MP)), and only state variables of a dimension with degrees of freedom in MPs, in which a state of contact is entered, are focused on. Also, because the robot is provided with the flexible portion, force control is not required for dimensions with reduced degrees of freedom due to contact, and thus it is sufficient to only control position and speed, and learning is performed such that only an action is taken in a dimension reduced action space in which the degrees of freedom have further decreased due to contact. Accordingly, fast learning can be realized by subjecting the state space and action space to dimension reduction”).
Regarding Claim 9, Ijiri teaches the limitations of claim 8 as described above. Ijiri further teaches wherein the at least one processor is configured to execute the instructions to acquire, as a
learning target parameter for each of the operation policies, the state variable which is specified as the learning target parameter (see at least paragraph 14: “the predetermined work may include a plurality of motion primitives, and the controller may include a plurality of learning models corresponding to the plurality of motion primitives. The motion primitives are also called operation sections, MPs, or the like. Each of the motion primitives may be an operation with a defined goal, and unique restrictions may be applied to state focused-on variables and actions to be performed”).
Regarding Claim 10, Ijiri teaches the limitations of claim 1 as described above. Ijiri further teaches wherein the at least one processor is configured to execute the instructions to further acquire
an operation policy application condition for applying each of the operation policies, and wherein the at least one processor is configured to execute the instructions to generate the control command based on the operation policy application condition (see at least paragraph 86-87: “In step S74, a determination is made as to whether the learning apparatus has completed learning of the policy 28. A termination condition is, for example, a pre-designated number of repetitions having been completed, and a change in the policy parameters 8 being a threshold value or less. If the termination condition is not met (S74: NO), the processing proceeds to step S75. If the termination condition is met (S74: YES), learning is terminated. In step S75, the operation determination unit 27 applies the mid-learning policy 28 and determines the next movement u(t+1), and the data obtaining unit 21 observes the resulting state. The processing returns to step S72, and learning using the thus obtained state observation data (updating of state transition model and updating of policy) is repeated”).
Response to Arguments
Applicant does not present arguments beyond stating that the claims as amended overcome the outstanding 101 rejection and are patentable.
While the amendments are persuasive in overcoming the outstanding 101 rejection, the claims remain rejected under prior art per the updated rejection above.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PETER H CHOI whose telephone number is (469)295-9171. The examiner can normally be reached M-F 930am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Debbie Reynolds can be reached at 571-272-0734. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PETER H CHOI/Supervisory Patent Examiner, Art Unit 3681