Last updated: April 19, 2026
Application No. 18/568,986
GRASP INFORMATION GENERATION DEVICE, METHOD, AND PROGRAM

Non-Final OA §103
Filed
Dec 11, 2023
Examiner
KASPER, BYRON XAVIER
Art Unit
3657
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Omron Corporation
OA Round
1 (Non-Final)
Interview Optional

— +18.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 103 resolved cases, 2023–2026
Examiner Intelligence

KASPER, BYRON XAVIER View full profile →
Grants 70% — above average
Career Allow Rate
72 granted / 103 resolved
+17.9% vs TC avg
Strong +18% interview lift
Without
With
+18.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
36 currently pending
Career history
139
Total Applications
across all art units
Statute-Specific Performance

§101
10.9%
-29.1% vs TC avg
§103
56.3%
+16.3% vs TC avg
§102
11.9%
-28.1% vs TC avg
§112
16.4%
-23.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 103 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2. This communication is responsive to Application No. 18/568,986 and the preliminary amendments filed on 12/11/2023.
3. Claims 1-12 are presented for examination.


Information Disclosure Statement
4. The information disclosure statements (IDS) submitted on 12/11/2023 and 8/19/2025
have been fully considered by the Examiner.


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

5. The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
6. This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
The “acquisition section that acquires” in claims 1, 11, and 12.
The “extraction section that … extracts” in claims 1, 11, and 12.
The “generation section that … generates” in claims 1, 11, and 12.
The “pre-processing section that executes at least one pre-processing” in claim 4.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
In view of this, the Examiner interprets the “acquisition section” under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, as a computer module comprising hardware that stores computer programs executable by a central processing unit (CPU). Support for this interpretation can be found within paragraphs [0024], [0025], and [0029] of the specification and Figures 3 and 4 of the drawings.
Further, the Examiner interprets the “extraction section” under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, as a computer module comprising hardware that stores computer programs executable by a CPU. Support for this interpretation can be found within paragraphs [0024], [0025], and [0029] of the specification and Figures 3 and 4 of the drawings.
Further, the Examiner interprets the “generation section” under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, as a computer module comprising hardware that stores computer programs executable by a CPU. Support for this interpretation can be found within paragraphs [0024], [0025], and [0029] of the specification and Figures 3 and 4 of the drawings.
Further, the Examiner interprets the “pre-processing section” under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, as a computer module comprising hardware that stores computer programs executable by a CPU. Support for this interpretation can be found within paragraphs [0024], [0025], and [0029] of the specification and Figures 3 and 4 of the drawings.


Claim Rejections - 35 USC § 103
7. In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
8. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

9. Claim(s) 1, 11, and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (US 20210166418 A1 hereinafter Zhou) in view of Cansizoglu et al. (US 20190019030 A1 hereinafter Cansizoglu).

Regarding Claim 1, Zhou teaches a grasp information generation device comprising: an acquisition section ([0082] via “FIG. 6 is a schematic structural diagram of hardware of an object posture estimation apparatus provided in embodiments of the present disclosure. The estimation apparatus 2 includes a processor 21, and further includes an input apparatus 22, an output apparatus 23, and a memory 24. The input apparatus 22, the output apparatus 23, the memory 24, and the processor 21 are connected by means of a bus.”) that acquires target object information indicating a three dimensional shape of a target object ([0022] via “In the embodiments of the present disclosure, the point cloud data of the object is processed to obtain the posture of the object. In one possible implementation for obtaining the point cloud data of the object, the object is scanned by means of a three-dimensional laser scanner, and when laser light irradiates the surface of the object, the reflected laser light carries information such as orientation and distance.”), ([0029] via “At block 104, the posture of the object is obtained according to the predicted postures of the objects included in the at least one clustering set.”) for gripping by a gripping section ([0056] via “The object is grabbed and then assembled by the end effector.”), (Note: The Examiner interprets the robot end effector as the gripping section.), and 
gripping section information related to a shape of the gripping section ([0031] via “Because the grabbed points of the objects are preset, under the condition that the position of the reference point of the object under a camera coordinate system and the attitude angle of the object are obtained, an adjustment angle of a robot end effector is obtained according to the attitude angle of the object; … and the adjustment angle and the traveling route are taken as a control instruction, to control the robot to grab at least one of the stacked objects.”); and 
a generation section ([0082] of Zhou, recited above) that, based on gripping points extracted by the extraction section, generates grasp information indicating a relative positional relationship between the target object and the gripping section for a case of the target object being gripped by the gripping section ([0051] via “In the embodiments of the present disclosure, clustering processing is performed on the point cloud data of the object based on the posture of the object to which at least one point output by the point cloud neural network belongs, so as to obtain the clustering set; and then, the position of the reference point of the object and the attitude angle of the object are obtained according to the average value of the predicted values of the positions of the reference points of the objects to which the points included in the clustering set belong as well as the average value of the predicted values of the attitude angles.”), ([0056] via “The control instruction is sent to the robot, and the robot is controlled to grab and assemble the object. In one possible implementation, the adjustment angle of the robot end effector is obtained according to the attitude angle of the object, and the robot end effector is controlled to be adjusted according to the adjustment angle. The position of the grabbed point is obtained according to the position of the reference point of the object as well as the positional relationship between the grabbed point and the reference point. … The object is grabbed and then assembled by the end effector.”).
Zhou is silent on an extraction section that performs clustering on pairs of two points on the target object that are pairs satisfying a condition based on the target object information and the gripping section information acquired by the acquisition section, and extracts, as gripping points, a pair representative of each cluster.
However, Cansizoglu teaches an extraction section ([0049] via “The image processing system 100 can include a human machine interface (HMI) with input/output (I/O) interface
110 connectable with at least one RGB-D camera 111 … a processor 120, a storage device 130, a memory 140, a network interface controller 150 (NIC) ...”) that performs clustering on pairs of two points on the target object that are pairs satisfying a condition based on the target object information ([0025] via “Some embodiments use pixels of one or several clusters to determine a model of the object, which can facilitate pose estimation. For example, one embodiment determines a model of the object using the pixels of the first cluster and determines the pose of the object using the model of the object. Additionally, or alternatively, the embodiment can fuse pixels of the several clusters to produce the model of the object.”), ([0081] via “The clustering procedure results in sets of points that belong to the same object instance and are matched to another object instance. In other words, each cluster can be seen as two sets of points, where one set can be aligned with the other set using the transformation of the cluster. Some of these sets may have keypoints in common with other sets. Thus, the clustering result can be represented as a graph where nodes correspond to sets of points and edges correspond to the distance between sets based on the transformation of the cluster associating the two sets.”) and the gripping section information acquired by the acquisition section ([0083] via “An image is acquired from an RGBD camera in step S1. The image indicating a scene may be obtained via a network connecting computers or another camera connected to the network. … In step S7, the process 50 defines a transformation for each of the matched triplets and clusters the matched triplets using their associated transformations. In this case, each of transformations associated with the matched triplets represents a pose of an instance of the object, wherein the pose includes a location and an orientation of the object.”), ([0091] via “FIG. 8 is a drawing illustrating an example setup of a robot arm 60 including a vacuum gripper 61 and an ASUS Xtion Pro Live RGB-D camera 65 arranged at the end of the robot arm 60. … Further, the robot arm 60
includes a localization controller (not shown) that localizes the top of the vacuum gripper 61
to a desired position. The localization controller also includes the image processing system
100. The desired position is determined based on the image data processing of an image obtained by the camera 65 for picking up objects. The image data processing is performed for object detection and localization using the image processing system 100 obtaining image data of objects 75 on a table 70 in association with the camera 65.”), and extracts, as gripping points, a pair representative of each cluster ([0025] via “Some embodiments use pixels of one or several clusters to determine a model of the object, which can facilitate pose estimation. For example, one embodiment determines a model of the object using the pixels of the first cluster and determines the pose of the object using the model of the object. Additionally, or alternatively, the embodiment can fuse pixels of the several clusters to produce the model of the object.”), ([0091] via “In order to pick an object 75, the image processing system 100
detects the object 75 and analyze the normal direction of the surface and a central position of the object 75, and inform the detection and analysis data to the motion control circuit 62. The motion control circuit 62 operates the gripper of the robot arm 60 to approach a center of the object 75 so that the gripper 61 sucks the object 75 for picking.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Cansizoglu wherein the grasp information generation device comprises an extraction section that performs clustering on pairs of two points on the target object that are pairs satisfying a condition based on the target object information and the gripping section information acquired by the acquisition section, and extracts, as gripping points, a pair representative of each cluster. Doing so accurately determines the pose of the target object and correctly models the target object for subsequent interaction, as stated by Cansizoglu ([0095] via “Quantitative Results: FIG. 9 indicates an example of results obtained by model creation and detection performance on the generated dataset according to embodiments of the present invention. It should be noted that a number of instances are included in a single scene. For every scene we report the number of instances that are added to the initial model as a result of clustering and the number of additional detected instances using the initial model.”), ([0096] via “Each object has four different scenes where the instances are placed randomly. The third column indicates the ground truth (GT) number of instances in the scene. The fourth column shows the number of instances that are added to the initial object model as a result of clustering. The fifth column reports the number of additional instances detected using the initial model. Finally, at the last column we report whether the algorithm resulted in a correct model of the object. As can be seen in almost all scenes, our method was able to create an initial model and enlarge it by proceeding with additional instance detection. The average accuracy in detecting the number of instances (i.e. the average of the percentages reported in total column) is 82:25%. The model generation success rate is 87:5%.”).

Regarding Claim 11, Zhou teaches a grasp information generation method comprising: an acquisition section ([0082] via “FIG. 6 is a schematic structural diagram of hardware of an object posture estimation apparatus provided in embodiments of the present disclosure. The estimation apparatus 2 includes a processor 21, and further includes an input apparatus 22, an output apparatus 23, and a memory 24. The input apparatus 22, the output apparatus 23, the memory 24, and the processor 21 are connected by means of a bus.”) acquiring target object information indicating a three dimensional shape of a target object ([0022] via “In the embodiments of the present disclosure, the point cloud data of the object is processed to obtain the posture of the object. In one possible implementation for obtaining the point cloud data of the object, the object is scanned by means of a three-dimensional laser scanner, and when laser light irradiates the surface of the object, the reflected laser light carries information such as orientation and distance.”), ([0029] via “At block 104, the posture of the object is obtained according to the predicted postures of the objects included in the at least one clustering set.”) for gripping by a gripping section ([0056] via “The object is grabbed and then assembled by the end effector.”), (Note: The Examiner interprets the robot end effector as the gripping section.), and 
gripping section information related to a shape of the gripping section ([0031] via “Because the grabbed points of the objects are preset, under the condition that the position of the reference point of the object under a camera coordinate system and the attitude angle of the object are obtained, an adjustment angle of a robot end effector is obtained according to the attitude angle of the object; … and the adjustment angle and the traveling route are taken as a control instruction, to control the robot to grab at least one of the stacked objects.”); and 
a generation section ([0082] of Zhou, recited above), based on gripping points extracted by the extraction section, generating grasp information indicating a relative positional relationship between the target object and the gripping section for a case of the target object being gripped by the gripping section ([0051] via “In the embodiments of the present disclosure, clustering processing is performed on the point cloud data of the object based on the posture of the object to which at least one point output by the point cloud neural network belongs, so as to obtain the clustering set; and then, the position of the reference point of the object and the attitude angle of the object are obtained according to the average value of the predicted values of the positions of the reference points of the objects to which the points included in the clustering set belong as well as the average value of the predicted values of the attitude angles.”), ([0056] via “The control instruction is sent to the robot, and the robot is controlled to grab and assemble the object. In one possible implementation, the adjustment angle of the robot end effector is obtained according to the attitude angle of the object, and the robot end effector is controlled to be adjusted according to the adjustment angle. The position of the grabbed point is obtained according to the position of the reference point of the object as well as the positional relationship between the grabbed point and the reference point. … The object is grabbed and then assembled by the end effector.”).
Zhou is silent on an extraction section performing clustering on pairs of two points on the target object that are pairs satisfying a condition based on the target object information and the gripping section information acquired by the acquisition section, and extracting, as gripping points, a pair representative of each cluster.
However, Cansizoglu teaches an extraction section ([0049] via “The image processing system 100 can include a human machine interface (HMI) with input/output (I/O) interface
110 connectable with at least one RGB-D camera 111 … a processor 120, a storage device 130, a memory 140, a network interface controller 150 (NIC) ...”) performing clustering on pairs of two points on the target object that are pairs satisfying a condition based on the target object information ([0025] via “Some embodiments use pixels of one or several clusters to determine a model of the object, which can facilitate pose estimation. For example, one embodiment determines a model of the object using the pixels of the first cluster and determines the pose of the object using the model of the object. Additionally, or alternatively, the embodiment can fuse pixels of the several clusters to produce the model of the object.”), ([0081] via “The clustering procedure results in sets of points that belong to the same object instance and are matched to another object instance. In other words, each cluster can be seen as two sets of points, where one set can be aligned with the other set using the transformation of the cluster. Some of these sets may have keypoints in common with other sets. Thus, the clustering result can be represented as a graph where nodes correspond to sets of points and edges correspond to the distance between sets based on the transformation of the cluster associating the two sets.”) and the gripping section information acquired by the acquisition section ([0083] via “An image is acquired from an RGBD camera in step S1. The image indicating a scene may be obtained via a network connecting computers or another camera connected to the network. … In step S7, the process 50 defines a transformation for each of the matched triplets and clusters the matched triplets using their associated transformations. In this case, each of transformations associated with the matched triplets represents a pose of an instance of the object, wherein the pose includes a location and an orientation of the object.”), ([0091] via “FIG. 8 is a drawing illustrating an example setup of a robot arm 60 including a vacuum gripper 61 and an ASUS Xtion Pro Live RGB-D camera 65 arranged at the end of the robot arm 60. … Further, the robot arm 60
includes a localization controller (not shown) that localizes the top of the vacuum gripper 61
to a desired position. The localization controller also includes the image processing system
100. The desired position is determined based on the image data processing of an image obtained by the camera 65 for picking up objects. The image data processing is performed for object detection and localization using the image processing system 100 obtaining image data of objects 75 on a table 70 in association with the camera 65.”), and extracting, as gripping points, a pair representative of each cluster ([0025] via “Some embodiments use pixels of one or several clusters to determine a model of the object, which can facilitate pose estimation. For example, one embodiment determines a model of the object using the pixels of the first cluster and determines the pose of the object using the model of the object. Additionally, or alternatively, the embodiment can fuse pixels of the several clusters to produce the model of the object.”), ([0091] via “In order to pick an object 75, the image processing system 100
detects the object 75 and analyze the normal direction of the surface and a central position of the object 75, and inform the detection and analysis data to the motion control circuit 62. The motion control circuit 62 operates the gripper of the robot arm 60 to approach a center of the object 75 so that the gripper 61 sucks the object 75 for picking.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Cansizoglu wherein the method comprises: an extraction section performing clustering on pairs of two points on the target object that are pairs satisfying a condition based on the target object information and the gripping section information acquired by the acquisition section, and extracting, as gripping points, a pair representative of each cluster. Doing so accurately determines the pose of the target object and correctly models the target object for subsequent interaction, as stated by Cansizoglu ([0095] via “Quantitative Results: FIG. 9 indicates an example of results obtained by model creation and detection performance on the generated dataset according to embodiments of the present invention. It should be noted that a number of instances are included in a single scene. For every scene we report the number of instances that are added to the initial model as a result of clustering and the number of additional detected instances using the initial model.”), ([0096] via “Each object has four different scenes where the instances are placed randomly. The third column indicates the ground truth (GT) number of instances in the scene. The fourth column shows the number of instances that are added to the initial object model as a result of clustering. The fifth column reports the number of additional instances detected using the initial model. Finally, at the last column we report whether the algorithm resulted in a correct model of the object. As can be seen in almost all scenes, our method was able to create an initial model and enlarge it by proceeding with additional instance detection. The average accuracy in detecting the number of instances (i.e. the average of the percentages reported in total column) is 82:25%. The model generation success rate is 87:5%.”).

Regarding Claim 12, Zhou teaches a non-transitory storage medium storing a grasp information generation program that causes a computer to function as ([0089] via “The embodiments of the present disclosure further provide a computer program product, configured to store computer-readable instructions, where when the instructions are executed, a computer performs the operations of the object posture estimation method according to any one of the foregoing embodiments.”), ([0090] via “The computer program product may be specifically implemented by means of hardware, software, or a combination thereof. In one optional embodiment, the computer program product is specifically reflected as a computer storage medium (including volatile and non-volatile storage media).”): 
an acquisition section ([0082] via “FIG. 6 is a schematic structural diagram of hardware of an object posture estimation apparatus provided in embodiments of the present disclosure. The estimation apparatus 2 includes a processor 21, and further includes an input apparatus 22, an output apparatus 23, and a memory 24. The input apparatus 22, the output apparatus 23, the memory 24, and the processor 21 are connected by means of a bus.”) that acquires target object information indicating a three dimensional shape of a target object ([0022] via “In the embodiments of the present disclosure, the point cloud data of the object is processed to obtain the posture of the object. In one possible implementation for obtaining the point cloud data of the object, the object is scanned by means of a three-dimensional laser scanner, and when laser light irradiates the surface of the object, the reflected laser light carries information such as orientation and distance.”), ([0029] via “At block 104, the posture of the object is obtained according to the predicted postures of the objects included in the at least one clustering set.”) for gripping by a gripping section ([0056] via “The object is grabbed and then assembled by the end effector.”), (Note: The Examiner interprets the robot end effector as the gripping section.), and 
gripping section information related to a shape of the gripping section ([0031] via “Because the grabbed points of the objects are preset, under the condition that the position of the reference point of the object under a camera coordinate system and the attitude angle of the object are obtained, an adjustment angle of a robot end effector is obtained according to the attitude angle of the object; … and the adjustment angle and the traveling route are taken as a control instruction, to control the robot to grab at least one of the stacked objects.”); and 
a generation section ([0082] of Zhou, recited above) that, based on the gripping points extracted by the extraction section, generates grasp information indicating a relative positional relationship between the target object and the gripping section for a case of the target object being gripped by the gripping section ([0051] via “In the embodiments of the present disclosure, clustering processing is performed on the point cloud data of the object based on the posture of the object to which at least one point output by the point cloud neural network belongs, so as to obtain the clustering set; and then, the position of the reference point of the object and the attitude angle of the object are obtained according to the average value of the predicted values of the positions of the reference points of the objects to which the points included in the clustering set belong as well as the average value of the predicted values of the attitude angles.”), ([0056] via “The control instruction is sent to the robot, and the robot is controlled to grab and assemble the object. In one possible implementation, the adjustment angle of the robot end effector is obtained according to the attitude angle of the object, and the robot end effector is controlled to be adjusted according to the adjustment angle. The position of the grabbed point is obtained according to the position of the reference point of the object as well as the positional relationship between the grabbed point and the reference point. … The object is grabbed and then assembled by the end effector.”).
Zhou is silent on an extraction section that performs clustering on pairs of two points on the target object that are pairs satisfying a condition based on the target object information and the gripping section information acquired by the acquisition section, and extracts, as gripping points, a pair representative of each cluster.
However, Cansizoglu teaches an extraction section ([0049] via “The image processing system 100 can include a human machine interface (HMI) with input/output (I/O) interface
110 connectable with at least one RGB-D camera 111 … a processor 120, a storage device 130, a memory 140, a network interface controller 150 (NIC) ...”) that performs clustering on pairs of two points on the target object that are pairs satisfying a condition based on the target object information ([0025] via “Some embodiments use pixels of one or several clusters to determine a model of the object, which can facilitate pose estimation. For example, one embodiment determines a model of the object using the pixels of the first cluster and determines the pose of the object using the model of the object. Additionally, or alternatively, the embodiment can fuse pixels of the several clusters to produce the model of the object.”), ([0081] via “The clustering procedure results in sets of points that belong to the same object instance and are matched to another object instance. In other words, each cluster can be seen as two sets of points, where one set can be aligned with the other set using the transformation of the cluster. Some of these sets may have keypoints in common with other sets. Thus, the clustering result can be represented as a graph where nodes correspond to sets of points and edges correspond to the distance between sets based on the transformation of the cluster associating the two sets.”) and the gripping section information acquired by the acquisition section ([0083] via “An image is acquired from an RGBD camera in step S1. The image indicating a scene may be obtained via a network connecting computers or another camera connected to the network. … In step S7, the process 50 defines a transformation for each of the matched triplets and clusters the matched triplets using their associated transformations. In this case, each of transformations associated with the matched triplets represents a pose of an instance of the object, wherein the pose includes a location and an orientation of the object.”), ([0091] via “FIG. 8 is a drawing illustrating an example setup of a robot arm 60 including a vacuum gripper 61 and an ASUS Xtion Pro Live RGB-D camera 65 arranged at the end of the robot arm 60. … Further, the robot arm 60
includes a localization controller (not shown) that localizes the top of the vacuum gripper 61
to a desired position. The localization controller also includes the image processing system
100. The desired position is determined based on the image data processing of an image obtained by the camera 65 for picking up objects. The image data processing is performed for object detection and localization using the image processing system 100 obtaining image data of objects 75 on a table 70 in association with the camera 65.”), and extracts, as gripping points, a pair representative of each cluster ([0025] via “Some embodiments use pixels of one or several clusters to determine a model of the object, which can facilitate pose estimation. For example, one embodiment determines a model of the object using the pixels of the first cluster and determines the pose of the object using the model of the object. Additionally, or alternatively, the embodiment can fuse pixels of the several clusters to produce the model of the object.”), ([0091] via “In order to pick an object 75, the image processing system 100
detects the object 75 and analyze the normal direction of the surface and a central position of the object 75, and inform the detection and analysis data to the motion control circuit 62. The motion control circuit 62 operates the gripper of the robot arm 60 to approach a center of the object 75 so that the gripper 61 sucks the object 75 for picking.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Cansizoglu wherein the computer functions as: an extraction section that performs clustering on pairs of two points on the target object that are pairs satisfying a condition based on the target object information and the gripping section information acquired by the acquisition section, and extracts, as gripping points, a pair representative of each cluster. Doing so accurately determines the pose of the target object and correctly models the target object for subsequent interaction, as stated by Cansizoglu ([0095] via “Quantitative Results: FIG. 9 indicates an example of results obtained by model creation and detection performance on the generated dataset according to embodiments of the present invention. It should be noted that a number of instances are included in a single scene. For every scene we report the number of instances that are added to the initial model as a result of clustering and the number of additional detected instances using the initial model.”), ([0096] via “Each object has four different scenes where the instances are placed randomly. The third column indicates the ground truth (GT) number of instances in the scene. The fourth column shows the number of instances that are added to the initial object model as a result of clustering. The fifth column reports the number of additional instances detected using the initial model. Finally, at the last column we report whether the algorithm resulted in a correct model of the object. As can be seen in almost all scenes, our method was able to create an initial model and enlarge it by proceeding with additional instance detection. The average accuracy in detecting the number of instances (i.e. the average of the percentages reported in total column) is 82:25%. The model generation success rate is 87:5%.”).


10. Claim(s) 2, 3, and 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (US 20210166418 A1 hereinafter Zhou) in view of Cansizoglu et al. (US 20190019030 A1 hereinafter Cansizoglu), and further in view of Kondo et al. (US 20110288683 A1 hereinafter Kondo).

Regarding Claim 2, modified reference Zhou teaches the grasp information generation device of claim 1, but is silent on wherein the target object information is information of a mesh structure resulting from joining a plurality of polygonal shaped planes.
However, Kondo teaches wherein the target object information is information of a mesh structure resulting from joining a plurality of polygonal shaped planes ([0041] via “The workpiece setting section 100 sets a shape and an orientation of a workpiece 16. … The mesh generator 104 generates a predetermined mesh (or a group of gripping candidate points) for the entire surface of the workpiece 16 or a region thereof which can be gripped by the fingers 14.”), ([0046] via “In step S2, at an initial stage, a mesh (see FIGS. 5 and 15) is generated as all candidates of positions where the distal end of the finger 14 can abut against the surface of the workpiece 16. The mesh is shown as being in the form of a grid. However, the mesh may comprise a polygonal mesh (e.g., a triangular mesh) which is used to represent the model of the workpiece 16, for example. The mesh is set up in a mesh size small enough to provide candidates of gripping positions, but large enough not to pose undue calculation loads.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Kondo wherein the target object information is information of a mesh structure resulting from joining a plurality of polygonal shaped planes. Doing so determines the position of the target object and determines gripping points of the target object based on said determined position of the target object, as stated by Kondo ([0048] via “In step S4, as shown in FIG. 5, an initial position (initial value) P0 where the distal end of the finger 14 grips the workpiece 16 with reference to the position determined in step S3 is identified based on the mesh generated in step S2, and the initial position P0 is regarded as a to-be-calculated position P. Specifically, a plurality of initial positions P0 are established in advance, and one of those initial positions P0 which remain to be unprocessed is selected.”).

Regarding Claim 3, modified reference Zhou teaches the grasp information generation device of claim 2, but is silent on wherein as the condition, the extraction section extracts the gripping points with an angle formed between the planes that each of the two points belong to, with an opening of the gripping section, and with an index representing respective stability when the target object is gripped by the gripping section, which are within respective set ranges of values.
However, Kondo teaches wherein as the condition, the extraction section extracts the gripping points with an angle formed between the planes that each of the two points belong to, with an opening of the gripping section ([0041] via “The robot position calculator 102 determines an attitude of the multijoint robot 17 based on the state of the workpiece 16 which is set by the workpiece setting section 100. The mesh generator 104 generates a predetermined mesh (or a group of gripping candidate points) for the entire surface of the workpiece 16 or a region thereof which can be gripped by the fingers 14. The gripping condition calculator 106 calculates a plurality of gripping conditions 108 which are conditions for gripping the workpiece 16 with the four fingers 14. … The plural gripping conditions 108 are determined depending on the mesh generated by the mesh generator 104. The initial value identifier 112 identifies from which point (one or more points) calculations are to be started based on the mesh generated by the mesh generator 104.”), ([0046] via “In step S2, at an initial stage, a mesh (see FIGS. 5 and 15) is generated as all candidates of positions where the distal end of the finger 14 can abut against the surface of the workpiece 16. The mesh is shown as being in the form of a grid. However, the mesh may comprise a polygonal mesh (e.g., a triangular mesh) which is used to represent the model of the workpiece 16, for example. The mesh is set up in a mesh size small enough to provide candidates of gripping positions, but large enough not to pose undue calculation loads.”), and 
with an index representing respective stability when the target object is gripped by the gripping section, which are within respective set ranges of values ([0011] via “Since a gripping position is determined based on the allowable external force as an index indicative of the magnitude of an external force which is allowed, the manipulator can grip the workpiece more stably. A more appropriate gripping position can be determined by repeating the fourth step and the fifth step.”), ([0057] via “In step S11, one, where the allowable external force Dc is maximum, of a plurality of gripping candidates obtained based on the established initial positions P0, is selected and established as a final gripping position.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Kondo wherein as the condition, the extraction section extracts the gripping points with an angle formed between the planes that each of the two points belong to, with an opening of the gripping section, and with an index representing respective stability when the target object is gripped by the gripping section, which are within respective set ranges of values. Doing so calculates appropriate gripping positions for the target object, as stated above by Kondo in paragraph [0011].

Regarding Claim 4, modified reference Zhou teaches the grasp information generation device of claim 2, but is silent on wherein: the grasp information generation device further comprises a pre-processing section that executes at least one pre-processing on the target object information acquired by the acquisition section from among simplification processing to reduce a number of the planes, smoothing processing to remove noise, optimization processing to make size variation of the planes not more than a threshold value, or symmetry detection processing to detect a group of planes having a single rotation axis; and the extraction section extracts the gripping points based on the target object information after pre-processing by the pre-processing section.
However, Kondo teaches wherein: the grasp information generation device further comprises a pre-processing section that executes at least one pre-processing on the target object information acquired by the acquisition section from among simplification processing to reduce a number of the planes, smoothing processing to remove noise, optimization processing to make size variation of the planes not more than a threshold value, or symmetry detection processing to detect a group of planes having a single rotation axis ([0046] via “In step S2, at an initial stage, a mesh (see FIGS. 5 and 15) is generated as all candidates of positions where the distal end of the finger 14 can abut against the surface of the workpiece 16. The mesh is shown as being in the form of a grid. However, the mesh may comprise a polygonal mesh (e.g., a triangular mesh) which is used to represent the model of the workpiece 16, for example. The mesh is set up in a mesh size small enough to provide candidates of gripping positions, but large enough not to pose undue calculation loads.”), (Note: The Examiner interprets Kondo to teach at least the pre-processing method of “optimizing processing to make size variation of the planes not more than a threshold value.”); and 
the extraction section extracts the gripping points based on the target object information after pre-processing by the pre-processing section ([0055] via “In step S9, the to-be-calculated position P at the time is established as a gripping candidate position.”), (Note: See the flowchart in Figure 3 of Kondo as well.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Kondo wherein: the grasp information generation device further comprises a pre-processing section that executes at least one pre-processing on the target object information acquired by the acquisition section from among simplification processing to reduce a number of the planes, smoothing processing to remove noise, optimization processing to make size variation of the planes not more than a threshold value, or symmetry detection processing to detect a group of planes having a single rotation axis; and the extraction section extracts the gripping points based on the target object information after pre-processing by the pre-processing section. Doing so establishes a mesh that is small enough to provide possible gripping positions of the target object, but large enough to not impose difficult computations, as stated above by Kondo in paragraph [0046].


11. Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (US 20210166418 A1 hereinafter Zhou) in view of Cansizoglu et al. (US 20190019030 A1 hereinafter Cansizoglu), further in view of Kondo et al. (US 20110288683 A1 hereinafter Kondo), and further in view of Owada et al. (WO 2022137509 A1 hereinafter Owada).

Regarding Claim 5, modified reference Zhou teaches the grasp information generation device of claim 4, but is silent on wherein the pre-processing section evaluates the target object information acquired by the acquisition section by evaluating a number of the planes, whether or not there is noise, and size variation of the planes, and executes the pre-processing based on results of the evaluation.
However, Kondo teaches wherein the pre-processing section evaluates the target object information acquired by the acquisition section by evaluating a number of the planes, and size variation of the planes, and executes the pre-processing based on results of the evaluation ([0046] via “In step S2, at an initial stage, a mesh (see FIGS. 5 and 15) is generated as all candidates of positions where the distal end of the finger 14 can abut against the surface of the workpiece 16. The mesh is shown as being in the form of a grid. However, the mesh may comprise a polygonal mesh (e.g., a triangular mesh) which is used to represent the model of the workpiece 16, for example. The mesh is set up in a mesh size small enough to provide candidates of gripping positions, but large enough not to pose undue calculation loads.”).
Further, Owada teaches wherein the pre-processing section evaluates the target object information acquired by the acquisition section by evaluating whether or not there is noise (Page 6 paragraph 6 via “The identification processing execution unit 205 generates a plurality of planes composed of a group of three-dimensional points included in the object region for each of the specified object regions, and generates a normal vector of each of the plurality of planes. do. Specifically, the identification processing execution unit 205 performs a meshing process for connecting points included in the object region to each of the specified object regions to generate a plurality of quadrangular or triangular planes. The identification processing execution unit 205 generates a vector of each side of a plurality of planes generated by the meshing processing.”), (Page 7 paragraph 2 via “Due to the perfo
Read full office action
Prosecution Timeline

Dec 11, 2023
Application Filed
Sep 16, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/126,888
Patent 12594964
METHOD OF AND SYSTEM FOR GENERATING REFERENCE PATH OF SELF DRIVING CAR (SDC)
2y 5m to grant Granted Apr 07, 2026
18/649,939
Patent 12594137
HARD STOP PROTECTION SYSTEM AND METHOD
2y 5m to grant Granted Apr 07, 2026
18/231,501
Patent 12583101
METHOD FOR OPERATING A MODULAR ROBOT, MODULAR ROBOT, COLLISION AVOIDANCE SYSTEM, AND COMPUTER PROGRAM PRODUCT
2y 5m to grant Granted Mar 24, 2026
18/288,416
Patent 12576529
ROBOT SIMULATION DEVICE
2y 5m to grant Granted Mar 17, 2026
17/707,930
Patent 12564962
ROBOT REMOTE OPERATION CONTROL DEVICE, ROBOT REMOTE OPERATION CONTROL SYSTEM, ROBOT REMOTE OPERATION CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
70%
Grant Probability
88%
With Interview (+18.4%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 103 resolved cases by this examiner. Grant probability derived from career allow rate.