DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 15 is objected to because of the following informalities: Claim 15 appears to have text which was cut off on the last line. Examiner believes the last line of claim 15 should read “estimated from the database based on the recognition information. Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1 and 6 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites, in relevant part “A robot control device comprising: a controller configured to be capable of estimating a holding manner of a holding target using each of a database and an interference model …” The specification, nor any of the other claims make mention of an interference model, and it is unclear what this is meant to refer to. Examiner proceeds on the assumption that the interference model corresponds to the inference model.
Claim 6 recites the limitation “the posture information of the holding target”. There is insufficient antecedent basis for this limitation in the claim. Posture information of the holding target is not introduced earlier in claim 6, nor is it introduced in any claim from which claim 6 depends.
Claims 2-14 depend from claim 1 and are thus rejected on the same grounds as claim 1. Claims 7-13 depend from claim 6, and are thus rejected on the same grounds as claim 6.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-2, 14-15 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Schafer (US-11331799-B1).
Claim 1
Schafer teaches
A robot control device comprising:
a controller
(Schafer - [col 24, col 24, ln 38-42] The robot control system 660 may be implemented in one or more processors, such as a CPU, GPU, and/or other controller(s) of the robot 620. In some implementations, the robot 620 may comprise a “brain box” that may include all or aspects of the control system 660.)
configured to be capable of estimating a holding manner of a holding target using each of
a database
(Schafer - [col 13, ln 13-17] … The pre-stored visual features database 152 can include a plurality of instances (e.g., 20 or more) of pre-stored visual features, and one or more corresponding grasp criteria can be assigned to each of the instances of pre-stored visual features.
and an interference model
(Schafer - [col 2, ln 25-37] … For example, the final grasp pose can be determined based on selecting instance(s) of pre-stored visual features(s) that satisfy similarity condition(s) relative to current visual features of the instance(s) of end effector vision data, and determining the final grasp pose based on pre-stored grasp criteria stored in association with the selected instance(s) of pre-stored visual feature(s). Also, for example, the final grasp pose can additionally or alternatively be determined based on processing, using a trained machine learning model, an instance of end effector vision data and/or corresponding visual feature(s) thereof to generate output that indicates the final grasp pose and/or a predicted success measure for the final grasp pose.)
and to control a robot based on the estimated holding manner,
(Schafer - [col 6, ln 38-50] After the final grasp pose is determined, a grasp path from a current end effector pose (which can be the actual pose, or a pose nearby) to the final grasp pose can then be generated and checked for kinematic feasibility. … If kinematically feasible, the grasp path can then be implemented by providing corresponding control commands to actuators of the robot, and a grasp attempted once the end effector arrives at the grasp pose …)
the database containing reference information including
object information of multiple objects
(Schafer - [col 3, ln 22-29] The end effector vision data can be captured at the actual pose traversed to by the end effector and/or additional pose(s) near the actual pose. One or more current visual features are then determined based on processing the end effector vision data. The current visual features can include detected edges, detected corners, detected interest points, detected shape(s) (e.g., line(s), ellipsis(es), handle shape(s), and/or arbitrary shape(s)), and/or other visual feature(s). )
EXAMINER NOTE: Visual features correspond to object information.
(Schafer - [col 3, ln 60-65] An instance of current visual features can be compared to a plurality of instances of pre-stored visual features (e.g., 10, 20, 30, or more instances of pre-stored features) to determine one or more pre-stored visual features (if any) that satisfy similarity threshold(s) relative to the instance of current visual features.)
EXAMINER NOTE: The captured visual features are compared to pre-stored visual features (object information in a database)
and holding manner information of the multiple objects
(Schafer - [col 4, ln 46 thru col 5, ln 4] Each instance of pre-stored visual features has one or more corresponding grasp pose criteria associated therewith, such as manually engineered grasp pose criteria. … Notably, each instance of visual features can correspond to a plurality of different objects. )
and the inference model being capable of estimating a holding manner of an object,
(Schafer - [col 2, ln 31-37] … Also, for example, the final grasp pose can additionally or alternatively be determined based on processing, using a trained machine learning model, an instance of end effector vision data and/or corresponding visual feature(s) thereof to generate output that indicates the final grasp pose and/or a predicted success measure for the final grasp pose.)
wherein the controller acquires recognition information of a holding target,
(Schafer - [col 3, ln 22-26] The end effector vision data can be captured at the actual pose traversed to by the end effector and/or additional pose(s) near the actual pose. One or more current visual features are then determined based on processing the end effector vision data.
and when the controller determines that the holding manner of the holding target cannot be estimated from the database based on the recognition information, the controller estimates the holding manner using the inference model.
(Schafer - [col 7, ln 1-18] In some implementations, end effector vision data is captured initially at the actual pose (the pose arrived at in attempting to traverse to the pre-grasp pose), an instance of current visual features determined based on the end effector vision data, and those features compared to the instances of visual features to determine if one or more of the instances satisfy similarity threshold(s) relative to the instance of current visual features. If so, the corresponding pre-grasp criteria of those instance can be utilized in generating candidate grasp pose(s) and determining a final grasp pose based on the candidate grasp pose(s). If not, the end effector can be moved, an additional instance of end effector vision data captured, additional features determined based on the additional instance of end effector vision data, and those additional features compared to the instances of visual features to determine if one or more of the instances satisfy similarity threshold(s) relative to the additional instance of current visual features.
[col 18, ln 40-48] When the decision at block 270 is no, the system proceeds to block 272 and determines a final grasp pose based on the candidate grasp pose(s) determined in one or more iterations of block 268. When there is only one candidate grasp pose, the system can utilize it as the final grasp pose. When there are multiple candidate grasp poses, the final system can determine the final grasp pose as a function of one or more of the multiple candidate grasp poses, or select one of the multiple candidate grasp poses as the final grasp pose.)
EXAMINER NOTE: See Fig. 2. If a similarity is established between the initially collected visual features and the pre-stored visual features, the pre-stored grasp information is used (estimated from the database). If not, the grasping information is generated as a function of multiple candidate grasp poses. The sections cited above indicate that this is done using machine learning (inference model).
Claim 2
Schafer teaches the limitations of claim 1 as outlined above. Schafer further teaches
wherein the controller searches the database for object information similar to the recognition information,
and is capable of extracting a holding manner associated with the object information found in the search.
(Schafer - [col 4, ln 21-28] One or more visual comparison techniques can be utilized to determine similarity measure(s) between an instance of current visual features and an instance of pre-stored visual features. As one example, one or more distance measure(s) can be determined between the current and pre-stored visual feature(s), and the similarity measure determined as a function of the distance measure(s) (i.e., with smaller distance measure(s) indicating greater similarity).
[col 4, ln 46-52] Each instance of pre-stored visual features has one or more corresponding grasp pose criteria associated therewith, such as manually engineered grasp pose criteria. Grasp pose criteria for an instance of pre-stored visual features can define at least one or more two-dimensional (2D) or three-dimensional (3D) grasp points/positions relative to the instance of pre-stored visual features.)
Claim 14
Schafer teaches the limitations of claim 1 as outlined above. Schafer further teaches
wherein when no object information similar to the recognition information is registered in the database, the controller estimates the holding manner of the holding target using the inference model.
(Schafer - [col 7, ln 1-18] In some implementations, end effector vision data is captured initially at the actual pose (the pose arrived at in attempting to traverse to the pre-grasp pose), an instance of current visual features determined based on the end effector vision data, and those features compared to the instances of visual features to determine if one or more of the instances satisfy similarity threshold(s) relative to the instance of current visual features. If so, the corresponding pre-grasp criteria of those instance can be utilized in generating candidate grasp pose(s) and determining a final grasp pose based on the candidate grasp pose(s). If not, the end effector can be moved, an additional instance of end effector vision data captured, additional features determined based on the additional instance of end effector vision data, and those additional features compared to the instances of visual features to determine if one or more of the instances satisfy similarity threshold(s) relative to the additional instance of current visual features.
[col 18, ln 40-48] When the decision at block 270 is no, the system proceeds to block 272 and determines a final grasp pose based on the candidate grasp pose(s) determined in one or more iterations of block 268. When there is only one candidate grasp pose, the system can utilize it as the final grasp pose. When there are multiple candidate grasp poses, the final system can determine the final grasp pose as a function of one or more of the multiple candidate grasp poses, or select one of the multiple candidate grasp poses as the final grasp pose.)
EXAMINER NOTE: See Fig. 2. If a similarity is established between the initially collected visual features and the pre-stored visual features, the pre-stored grasp information is used (estimated from the database). If not, the grasping information is generated as a function of multiple candidate grasp poses. The sections cited above with respect to claim 1 indicate that this is done using machine learning (inference model).
Claim 15
Schafer teaches
a robot control device
(Schafer - [col 24, col 24, ln 38-42] The robot control system 660 may be implemented in one or more processors, such as a CPU, GPU, and/or other controller(s) of the robot 620. In some implementations, the robot 620 may comprise a “brain box” that may include all or aspects of the control system 660.)
acquiring recognition information of a holding target,
(Schafer - [col 3, ln 22-26] The end effector vision data can be captured at the actual pose traversed to by the end effector and/or additional pose(s) near the actual pose. One or more current visual features are then determined based on processing the end effector vision data.
the robot control device configured to
be capable of estimating the holding manner of the holding target using each of
a database
(Schafer - [col 13, ln 13-17] … The pre-stored visual features database 152 can include a plurality of instances (e.g., 20 or more) of pre-stored visual features, and one or more corresponding grasp criteria can be assigned to each of the instances of pre-stored visual features.)
and an interference model
(Schafer - [col 2, ln 25-37] … For example, the final grasp pose can be determined based on selecting instance(s) of pre-stored visual features(s) that satisfy similarity condition(s) relative to current visual features of the instance(s) of end effector vision data, and determining the final grasp pose based on pre-stored grasp criteria stored in association with the selected instance(s) of pre-stored visual feature(s). Also, for example, the final grasp pose can additionally or alternatively be determined based on processing, using a trained machine learning model, an instance of end effector vision data and/or corresponding visual feature(s) thereof to generate output that indicates the final grasp pose and/or a predicted success measure for the final grasp pose.)
and to control a robot based on the estimated holding manner,
(Schafer - [col 6, ln 38-50] After the final grasp pose is determined, a grasp path from a current end effector pose (which can be the actual pose, or a pose nearby) to the final grasp pose can then be generated and checked for kinematic feasibility. … If kinematically feasible, the grasp path can then be implemented by providing corresponding control commands to actuators of the robot, and a grasp attempted once the end effector arrives at the grasp pose …)
the database containing reference information including
object information of multiple objects
(Schafer - [col 3, ln 22-29] The end effector vision data can be captured at the actual pose traversed to by the end effector and/or additional pose(s) near the actual pose. One or more current visual features are then determined based on processing the end effector vision data. The current visual features can include detected edges, detected corners, detected interest points, detected shape(s) (e.g., line(s), ellipsis(es), handle shape(s), and/or arbitrary shape(s)), and/or other visual feature(s). )
EXAMINER NOTE: Visual features correspond to object information.
(Schafer - [col 3, ln 60-65] An instance of current visual features can be compared to a plurality of instances of pre-stored visual features (e.g., 10, 20, 30, or more instances of pre-stored features) to determine one or more pre-stored visual features (if any) that satisfy similarity threshold(s) relative to the instance of current visual features.)
EXAMINER NOTE: The captured visual features are compared to pre-stored visual features (object information in a database)
and holding manner information of the multiple objects
(Schafer - [col 4, ln 46 thru col 5, ln 4] Each instance of pre-stored visual features has one or more corresponding grasp pose criteria associated therewith, such as manually engineered grasp pose criteria. … Notably, each instance of visual features can correspond to a plurality of different objects. )
and the inference model being capable of estimating a holding manner of an object,
(Schafer - [col 2, ln 31-37] … Also, for example, the final grasp pose can additionally or alternatively be determined based on processing, using a trained machine learning model, an instance of end effector vision data and/or corresponding visual feature(s) thereof to generate output that indicates the final grasp pose and/or a predicted success measure for the final grasp pose.)
and the robot control device estimating the holding manner using the inference model when the robot control device determines that the holding manner of the holding target cannot be estimated from the database based on the recognition information.
(Schafer - [col 7, ln 1-18] In some implementations, end effector vision data is captured initially at the actual pose (the pose arrived at in attempting to traverse to the pre-grasp pose), an instance of current visual features determined based on the end effector vision data, and those features compared to the instances of visual features to determine if one or more of the instances satisfy similarity threshold(s) relative to the instance of current visual features. If so, the corresponding pre-grasp criteria of those instance can be utilized in generating candidate grasp pose(s) and determining a final grasp pose based on the candidate grasp pose(s). If not, the end effector can be moved, an additional instance of end effector vision data captured, additional features determined based on the additional instance of end effector vision data, and those additional features compared to the instances of visual features to determine if one or more of the instances satisfy similarity threshold(s) relative to the additional instance of current visual features.
[col 18, ln 40-48] When the decision at block 270 is no, the system proceeds to block 272 and determines a final grasp pose based on the candidate grasp pose(s) determined in one or more iterations of block 268. When there is only one candidate grasp pose, the system can utilize it as the final grasp pose. When there are multiple candidate grasp poses, the final system can determine the final grasp pose as a function of one or more of the multiple candidate grasp poses, or select one of the multiple candidate grasp poses as the final grasp pose.)
EXAMINER NOTE: See Fig. 2. If a similarity is established between the initially collected visual features and the pre-stored visual features, the pre-stored grasp information is used (estimated from the database). If not, the grasping information is generated as a function of multiple candidate grasp poses. The sections cited above indicate that this is done using machine learning (inference model).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Schafer in view of Ogawa (US-20180272535-A1).
Claim 3
Schafer teaches the limitations of claim 1 as outlined above. Schafer alone may not explicitly teach the following limitations in combination. However, Ogawa teaches
wherein the reference information includes category information indicating a category of each of multiple objects,
(Ogawa - [0147] FIG. 12 is a table illustrating the object information stored in the object database 102a. The object information stored in the object database 102a may include, for example, an object ID number, object name information, item category information, …)
EXAMINER NOTE: The object information includes item category information.
and the controller acquires the category information of the holding target,
(Ogawa -[0150] FIG. 14 is a flowchart illustrating a procedure of acquiring the object information performed by the information acquisition system 300. When the cover is opened, the object P is placed on the rotary stage 302 (S301), and the object ID number or the object name information is input through the user interface and the like, the controller 320 outputs a direction to start execution of processing of acquiring the object information to each component of the information acquisition system 300 (S302).
[0157] Next, the controller 320 automatically generates primary grasp information based on the modified primary object information (S309). The primary grasp information is, for example, the grasping shape category information, the grasping method information, the grasping position information, the grasp score information, the search range information, the pressing amount information, the grasp determination information, the conveying posture information, and the allowable speed information related to grasp of the object P.
[0160] Next, the controller 320 updates and registers the modified primary object information in the object database 102a, and updates and registers the modified primary grasp information in the grasp database 102b …)
and is capable of estimating the holding manner from the database based on the object information containing the category information.
(Ogawa -[0200] The arithmetic processing unit 101 controls the camera 21 and the manipulator 20 so that the camera 21 photographs the surroundings of the component (object P). In this case, images of a plurality of components may be photographed (S601).
[0201] Next, the object recognition unit 101h compares the image of the component photographed by the camera 21 with the object information in the object database 102a to specify the photographed component. The object recognition unit 101h acquires the object ID number of the component from the object database 102a (S602).
[0202] Next, the arithmetic processing unit 101 compares the object ID number of the component as a target object directed by the host with the object ID number of the component specified at S602, and determines whether the target object is included in the recognized component (S603).
[0204] If the recognized component includes the target object (Yes at S603), the object recognition unit 101h matches photographed three-dimensional information with the CAD model information in the object database 102a, and calculates a posture of the recognized article. At this point, a posture of the component cage or a posture of a component other than the target object may be calculated, for example (S605).
[0205] Next, the grasp/operation plan generation unit 101d refers to the grasp information stored in the grasp database 102b, and calculates a grasping method for the recognized component and a candidate for a grasping point. The grasp database 102b includes a plurality of grasping methods, a plurality of grasping points, grasp scores thereof, and the like for the component. )
EXAMINER NOTE: The object is photographed to obtain object information, which includes category information. The object information is then used to select a grasp from the database which is suitable for the object.
It would have been obvious to one of ordinary skill in the art before the effective filing date of invention to include item category information in Schafer's system in order to aid in object identification for the appropriate grasp, as taught by Ogawa.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Schafer in view of Aparicio (US-20220410391-A1, claiming priority to WO-2021101561-A1, filed 11/22/2019).
Claim 4
Schafer teaches the limitations of claim 1 as outlined above. Schafer alone may not explicitly teach the following limitations in combination. However, Aparicio teaches
wherein the reference information includes category information indicating a category of each of multiple objects,
and the controller acquires the category information of the holding target, and estimates the holding manner of the holding target using the inference model when the category information is not registered in the database.
(Aparicio - [0019] In some cases, models can be automatically generated using deep learning- based object recognition algorithms. For example, a system can have several machines in a database that can be recognized by type, location, and orientation in a workspace … Such a system can enable fast interaction with known parts. … It is recognized herein, however, that these types of system designs might not account for brownfield devices or other devices with no CAD or kinematic information that is readily accessible for building an operational model. Further, such systems often cannot interact with objects that are not in the database.
[0027] … If a given object is detected that is unknown to the system102, for instance information related to the object is not stored in a database that is accessible to the system102, the unknown object (e.g., a brownfield device) can be represented in the simulation by a collision boundary generated by the camera112. In particular, the camera112 can scan the unknown object to generate an image of the unknown object, and the image can be converted into a mesh representation of the unknown object. The mesh representation can be imported in the simulation environment. Based on the mesh representation in the simulation environment, the system102, in particular the robot device104, can interact with the unknown object. …
[0032] In an example, the robotic device 104 can classify (or determine a classification of) a detected subcomponent of a given machine, for instance the handle 216 or door 214 of the industrial machine 208, based on detecting the handle 216 or door 214, respectively, via the sensor 212. In particular, continuing with the example, the system 202 can include a neural network that can be trained with training data that includes images of various doors. After training, the sensor 212 can capture images of the door 214, and based on the one or more images of the door 214, the neural network, and thus the system 202, can identify that the detected object is a door. Thus, the robotic device 104 can be configured to recognize the door 214 even if the robotic device has not previously encountered the door 214. Such interactions may include various operations that are performed by the robot device104. Operations include, without limitation, picking up the object, painting the object, inserting screws in available threads of the object, or the like. Such operations can be performed without specialized engineering that is specific to the object.
[0033] Based on the classification of the detected subcomponent, the autonomous system can determine a principle of operation associated with the detected subcomponent. For example, the door214, and in particular the handle216, can define features ... and using those features, the subcomponent can be classified. … For example, after determining the principle of operation of the detected subcomponent (e.g., handle216), the autonomous machine104 can perform the principle of operation associated with the detected subcomponent, so as to complete a given task (e.g., loading the machine) that requires that the autonomous machine interacts with the machine 208.
[0037] … In some cases, although the robot device 104 can recognize the handle 216 as a handle, the robot device 104 might not have knowledge related to how the specific handle or how the specific door functions. That is, the robot device 104 might not know the principle of operation of a detected subcomponent. … In an example operation, the robot device 104 determines that the machine 208 is loaded by opening the door 214 that the robot device 104 detects, but the robot device 104 is unaware of the kinematics associated with the door 214. In an example, based on identifying and classifying the subcomponent as a door, the robot device 104 can retrieve policies associated with opening a door. The robot device 104 can implement the policies to explore different operations until the door opens, such that the robot device 104 can determine how to open the door 214.)
EXAMINER NOTE: Aparicio discusses systems which utilize databases and classify objects to be interacted with. Aparicio notes the advantage of this approach with respect to speed and efficiency, but recognizes that relying solely on databases of known objects has inherent limitations when dealing with unknown objects (see [0019]). To address issues when dealing with unknown objects, Aparicio introduces machine learning methods to classify an object, and to implement various policies to explore different operations until a successful interaction is obtained (inference model).
It would have been obvious to one of ordinary skill in the art before the effective filing date of invention to combine Schafer and Aparicio by accounting for object classifications for fast operations using the database for known objects, and introducing exploratory and machine learning methods based on classifications for unknown objects. The resulting combination yields a more versatile system which can handle unknown objects with greater ease.
Claims 5-10 are rejected under 35 U.S.C. 103 as being unpatentable over Schafer in view of Morales ("Integrated Grasp Planning and Visual Object Localization For a Humanoid Robot with Five-Fingered Hands," 2006).
Claim 5
Schafer teaches the limitations of claim 1 as outlined above. Schafer alone may not explicitly teach the following limitations in combination. However, Morales teaches
wherein the reference information includes posture information for each of the multiple objects and holding manner information associated with the posture information, the controller acquires the posture information of the holding target as the recognition information, and estimates the holding manner based on the posture information of the holding target.
(Morales - [p. 5664, col 1, ln 14 thru col 2, ln 14] A functional description of the grasp planning system described in this paper is depicted in figure 2. It consists of the next parts
The global model database. It is the core of our approach. It contains not only the CAD models of all the objects, but also stores a set of feasible grasps for each object. Moreover, this database is the interface between the different modules of the system
The offline grasp analyzer that uses the model of the objects and of the hand to compute on a simulation environment a set of stable grasps (see Sec. III). The results produced by this analysis are stored in the grasps database to be used by the other modules. …
A online visual procedure to identify objects in stereo images by matching the features of a pair of images with the 3D prebuilt models of such objects. After recognizing the target object it determines its location and pose. This information is necessary to reach the object.
Once an object has been localized in the work-scene, a grasp for that object is then selected from the set of precomputed stable grasps. This is instanced to a particular arm/hand configuration that takes into account the particular pose and reachability conditions of the object.)
In the above passage, Morales discusses the use of posture information in conjunction with pre-computed stable grasps stored in a database. The posture information is utilized to account for the reachability of the object. It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Schafer and Morales by incorporating object pose in order to ensure the robot is able to reach the object.
Claim 6
Schafer teaches the limitations of claim 1 as outlined above. Schafer alone may not explicitly teach the following limitations. However, Morales teaches
wherein the controller estimates the holding manner based on at least one piece of holding target information associated with reference posture information similar to the posture information of the holding target.
(Morales - [p. 5664, col 1, ln 14 thru col 2, ln 14] A functional description of the grasp planning system described in this paper is depicted in figure 2. It consists of the next parts
The global model database. It is the core of our approach. It contains not only the CAD models of all the objects, but also stores a set of feasible grasps for each object. Moreover, this database is the interface between the different modules of the system
The offline grasp analyzer that uses the model of the objects and of the hand to compute on a simulation environment a set of stable grasps (see Sec. III). The results produced by this analysis are stored in the grasps database to be used by the other modules. …
A online visual procedure to identify objects in stereo images by matching the features of a pair of images with the 3D prebuilt models of such objects. After recognizing
the target object it determines its location and pose. This information is necessary to reach the object.
Once an object has been localized in the work-scene, a grasp for that object is then selected from the set of precomputed stable grasps. This is instanced to a particular arm/hand configuration that takes into account the particular pose and reachability conditions of the object.)
(Morales - [p.5666, col 1, ln 6-] All stable grasps computed for every object are stored in a database in order to be used by execution modules. Every grasp stored includes the grasp type, the grasp starting point, hand orientation, approaching direction and the quality measure obtained from the simulation. This value is used by
the other modules to select the best grasp for a given object.)
(Morales - [p.5664, col 2, ln 48 thru col 5665, col 1, ln 5] It is important to note that all directions are given with respect to an object centered coordinate system. The real approach directions result from matching of this relative description with the localized object pose in the workspace of the robot.
A main advantage of this grasp representation is its practical application. A grasp can be easily executed from the informa-tion contained in its description, and is better suited for the use with execution modules like arm path planning. Moreover this representation is more robust to inaccuracies since it only describes starting conditions and not final conditions like a description based in contacts points.)
In the above passage, Morales discusses the use of posture information in conjunction with pre-computed stable grasps stored in a database. The posture information is utilized to account for the reachability of the object. On p.5666, Morales indicates that a grasp candidate consists of a starting point, approach direction, and hand direction relative to the object coordinate system (reference posture). Morales indicates that standardizing relative to an object coordinate system aids in arm path planning and is robust to inaccuracies.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Schafer and Morales by accounting for object pose in order to allow for easy grasp execution and increase robustness to inaccuracies.
Claim 7
The combination of Schafer and Morales teaches the limitations of claim 6 as outlined above. As shown above, the cited combination also teaches
wherein the controller estimates the holding manner based on multiple pieces of holding manner information, each of which are associated with reference posture information similar to the posture information.
EXAMINER NOTE: See Morales cited in the rejection of claim 6 above. Morales discusses multiple candidate grasps (multiple pieces of holding information) relative to the object coordinate system (reference posture information).
Claim 8
The combination of Schafer and Morales teaches the limitations of claim 7 as outlined above. As shown above with Morales, and shown below with Schafer, the cited combination also teaches
wherein one piece of the reference posture information is associated with multiple pieces of holding manner information,
EXAMINER NOTE: See Morales cited in the rejection of claim 6 above. Morales discusses multiple candidate grasps (multiple pieces of holding information) relative to the object coordinate system.
and the controller selects the holding manner information based on an associated holding performance from among the multiple pieces of holding manner information.
(Morales - [p.5666, col 1, ln 6-] All stable grasps computed for every object are stored in a database in order to be used by execution modules. Every grasp stored includes the grasp type, the grasp starting point, hand orientation, approaching direction and the quality measure obtained from the simulation. This value is used by
the other modules to select the best grasp for a given object.)
EXAMINER NOTE: Morales also includes a quality measure (holding performance) as part of the holding information.
(Schafer - [col 15, ln 39-42] The success/data engine 119 can store, in data database 159, the grasp success label in association with other data from the grasp attempt.)
Claim 9
The combination of Schafer and Morales teaches the limitations of claim 7 as outlined above. Schafer further teaches
wherein when the controller estimates the holding manner from the inference model, the controller can register an estimation result in the database.
(Schafer - [col 18, ln 59-62] At optional block 276, the system determines a grasp success measure for the grasp attempt (of block 274), and stores the grasp success measure and other data for the grasp attempt. The data can be stored for using training at least one machine learning model.)
Claim 10
As shown above with respect to claims 8 and 9, Schafer stores grasp success measures for grasp attempts. Additionally, Morales generates, for each object in the database, grasp data and corresponding quality measures. Therefore, as shown above (and elaborated more below), the previously cited combination of Schafer and Morales teaches
wherein when the holding target is successfully held by controlling the robot based on the estimation result of the holding manner, the controller generates reference information
(Schafer - [col 18, ln 59-62] At optional block 276, the system determines a grasp success measure for the grasp attempt (of block 274), and stores the grasp success measure and other data for the grasp attempt. The data can be stored for using training at least one machine learning model.)
… reference information in which the holding manner information representing the successful holding manner and the object information of the successfully held holding target are associated with each other, and registers the generated reference information in the database.
(Morales - [p.5666, col 1, ln 6-47] Our approach to compute stable grasps on 3D objects is inspired by a previous work by Miller et al. using GraspIt! [15]. The offline analysis follows four steps to find the grasps for a given object:
1) The shape of the object model is approximated by a set of basic shape primitives (boxes, cylinders, spheres and cones). There are many ways to obtain these primitive approach. GraspIt! doesn’t provide any procedure to produce them. We assume that the primitive description of the objects is part of the model of an object.
2) A set of candidate grasps is generated automatically for every primitive shape of the object description. A grasp candidate consists of a hand type, a grasp starting point, an approach direction and a hand orientation. For every primitive there exists a set of predefined grasp types and approaching directions [15].
3) Each grasp candidate is tested within the simulation environment. … If the quality is under certain threshold then the hand opens, backs a step amount and closes again. This sequence is repeated until a maximum stability measurement is
reached. … The final position of the hand and the quality obtained is stored.
4) Finally, all final grasps that are over the minimum threshold are sorted and stored.)
(Morales - [p.5666, col 1, ln 6-] All stable grasps computed for every object are stored in a database in order to be used by execution modules. Every grasp stored includes the grasp type, the grasp starting point, hand orientation, approaching direction and the quality measure obtained from the simulation. This value is used by
the other modules to select the best grasp for a given object.)
EXAMINER NOTE: Morales associates multiple grasps and their respective quality scores with each object in the database.
Claims 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Schafer and Morales as applied to claim 7 above, and further in view of Azad ("Combining Appearance-based and Model-based Methods for Real-Time Object Recognition and 6D Localization," 2006).
Claim 11
The combination of Schafer and Morales teaches the limitations of claim 7 as outlined above. Schafer alone may not explicitly teach the limitations of claim 11. However, as established in the sections cited with reference to claim 6 (and claim 7, by proxy), Morales stores posture information of objects for use in identifying an appropriate grasp. Morales elaborates further on the step of identifying and localizing the object in the following section:
(Morales - [p.5667, col 2, ln 4-13] In [18], we present a system which can build object representations for appearance-based recognition and localization automatically, given a 3D model of the object. An initial estimate for the position of the object is determined through stereo vision, while an initial estimate for the orientation is determined by retrieving the rotation the recognized view was produced with. Then, a number of correction calculations are performed for accurate localization, which is explained in detail in [18].)
In the above passage, Morales references Azad (reference [18]), in which the methods utilized by Morales are further explained. Through Azad, Morales indirectly teaches
wherein the controller is capable of performing integration processing of integrating at least two pieces reference posture information from among of the multiple pieces of reference posture information registered in the database.
(Azad - [p.5341, col 1, ln 14-21] Our approach is based on the global appearance-based object recognition system proposed in [8], which is explained briefly in the following. For each object, a set of segmented views is stored, covering the space of possible views of one object. By associating pose information with each view, it is possible to recover the pose through the matched view from the database. For reasons of computational efficiency, PCA [9] is applied for reducing dimensionality.
[p.5343, col 2, para. 3] By using an appearance-based approach for a model-based object representation in the core of the system, it is possible to recognize and localize the objects in a given scene in realtime – which is by far impossible with a purely model-based method, as explained in Section III-A. For our experiments, we picked a rotational space of −45o ≤ α ≤ 0o, 45o ≤ β ≤ 325o, −45o ≤ γ ≤ 45o, with a resolution of 5o, resulting in a search space with 10 · 57 · 19 = 10830 configurations. For objects which have a rotational symmetry axis, β is set to zero, resulting in 10 · 19 = 190 configurations. For efficiency considerations, we use PCA to reduce dimensionality from 64 × 64 = 4096 to 100.)
EXAMINER NOTE: various views of the objects are collected in multiple orientations of each rotational axis (reference posture information), and these data are reduced in dimensionality (integration processing) using principal component analysis.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to include Azad's suggestion to utilize dimensionality reduction to reduce dimensionality in order to increase computational efficiency.
Claim 12
The combination of Schafer, Morales, and Azad teaches the limitations of claim 11 as outlined above. As shown above, the cited combination also teaches
wherein the controller performs the integration processing, the integration processing making a difference in density of integrated posture information generated by the integration processing smaller.
EXAMINER NOTE: See p.5341 and p.5343 of Azad, cited above with reference to claim 11. The use of principal component analysis reduces the size (density) of the posture information in the database.
Allowable Subject Matter
Claim 13 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: The prior art of record fails to teach or reasonably suggest at least the aspects of
Wherein the controller performs the integration processing when a number or density of the multiple pieces of reference posture information satisfies an integration condition
While many in the art discuss the use of dimensionality reduction to reduce data density and allow for expedient searching of databases (see Morales, Azad, Tepper), the prior art fails to teach or reasonably suggest doing so when a number or density of the multiple pieces of reference posture information satisfies an integration condition. The prior art of record reduces the dimensionality of the data independently of how dense or numerous the data may be. When considered in light of the claims from which claim 13 depends, there does not appear to be a reasonable motivation to combine teachings in the prior art to arrive at the invention claimed in claim 13.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Tepper (US-20200327365-A1) broadly discusses clustering methods (a kind of integration processing) in databases to enable faster searching, and is considered particularly relevant to claims 11-13 under the broadest reasonable interpretation of the claims.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES MILLER WATTS whose telephone number is (703)756-1249. The examiner can normally be reached 7:30-5:30 M-TH.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Adam Mott can be reached at 571-270-5376. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JAMES MILLER WATTS III/Examiner, Art Unit 3657
/ADAM R MOTT/Supervisory Patent Examiner, Art Unit 3657