Last updated: April 19, 2026
Application No. 17/780,759
IMPROVED PHYSICAL OBJECT HANDLING BASED ON DEEP LEARNING

Non-Final OA §103§112
Filed
May 27, 2022
Examiner
RODGERS, ALEXANDER JOHN
Art Unit
2661
Tech Center
2600 — Communications
Assignee
Robovision
OA Round
3 (Non-Final)
Interview Optional

— +7.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 33 resolved cases, 2023–2026
Examiner Intelligence

RODGERS, ALEXANDER JOHN View full profile →
Grants 70% — above average
Career Allow Rate
23 granted / 33 resolved
+7.7% vs TC avg
Moderate +7% lift
Without
With
+7.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
12 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
10.1%
-29.9% vs TC avg
§103
43.4%
+3.4% vs TC avg
§102
26.0%
-14.0% vs TC avg
§112
19.8%
-20.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 33 resolved cases
Office Action

§103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 13 August 2025, has been entered.

Response to Arguments
Applicant’s arguments, filed 13 August 2025, with respect to the 103 rejections of Claims 1 and 16 have been fully considered and are persuasive. The rejection under U.S.C. 103 of Claims 1-19 has been withdrawn. However, a new ground of rejection has been found in view of Robertson et al (US Publication No. 20190261565 A1).

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim 17 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Regarding Claim 17, the claim states: The method of claim 1, wherein the voxel representation comprises a representation of the three-dimensional surface of the object in the form of a plurality of voxels with a non-infinite size in a predefined voxel grid. However, in the Specification there is no mention of any predefined voxel grid which has raised concerns of new matter potentially being claimed or introduced via this amendment. Further there is no mention of perhaps an array, matrix or similar structure except for perhaps paragraph 0034 which only tangentially describes the small sized matrix used in convolution filter as commonly used in CNNs but still fails to define any type of voxel grid. A person of ordinary skill in the art would not be able to conceive what exactly is defined in this grid from the nature—a width, a height, a depth for voxels perhaps, a memory location, a memory size, etc.—to any specific actual definition of this grid. That is, if the language claims a predefined voxel grid, then somewhere the specification should define any voxel grid. Therefore, there is insufficient description of the subject matter in the newly introduced Claim 17.  Therefore, this subject matter of claim 17 constitutes new matter.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 2, 4, 7-18 are rejected under 35 U.S.C. 103 as being unpatentable by Lin et al (“A Vision-Guided Robotic Grasping Method for Stacking Scenes Based on Deep Learning”) in view of Robertson et al (US Publication No. 20190261565 A1).
Regarding Claim 1, Lin discloses a method for generating a robot command for handling a three-dimensional, 3D, physical object (Reference “robot grasping method”, see Section I: Introduction paragraph 1) present within a reference volume (Reference “robot grasp coordinate system”, see Figure 7 and note in applicant’ specification paragraph 0032 a reference volume is simply a cartesian coordinate system which is shown in Figure 7) and comprising a 3D surface (Reference “3D stacking scene”, see Figure 4 showing a depiction of object stacked via freefall), comprising: obtaining at least two images of said physical object from a plurality of cameras positioned at different respective angles with respect to said object (Reference “stereo camera”, see Section II: Overall Framework of Grasping Method, paragraph 2 where a stereo camera which takes multiple images from multiples angles respective to an object in focus is used); generating, with respect to the 3D surface of said object, a segmented voxel representation, said segmented voxel representation being a voxel representation segmented based on said at least two images (Reference “point cloud”, see Section II: Overall Framework of Grasping Method, paragraph 2 and Figure 3 showing various point clouds created from the stereo images. Note a point cloud being a set of data points in 3d space reads as a voxel representation since a voxel is a data or pixel value in 3d space. Also note object segments versus background segments shown in Figure 3); and computing the robot command for said handling of said object based on said segmented voxel representation (Reference “Grasp planning”, see Section IV: Grasp Handling where the data described above for the objects are used to calculate grasp coordinates); wherein the generating of said segmented voxel representation comprises: 2D segmenting said at least two images by at least one trained 2D convolutional neural network, CNN, followed by performing a 3D reconstruction of said 3D surface of said object based at least on said at least two segmented images (Note due to following use of the and/or disjunctive limitation a rejection with prior art has been made for the latter 3D limitation cited next);  and/or performing reconstruction of said 3D surface of said object based on said at least two images for obtaining a voxel representation followed by 3D segmenting said voxel representation by at least one trained 3D neural network, NN (Reference “PointNet++”, see Section III: B. PPRNet framework construction, where segmenting is performed and the process specifically lists Pointnet++, a 3D trained neural network designed to work specifically on point clouds which were generated from the stereo camera images); wherein said robot command is computed based on at least one of: a 3D coordinate within said reference volume (Reference “robot grasp coordinate system”, see Figure 7 and note in applicant’ specification paragraph 0032 a reference volume is simply a cartesian coordinate system which is shown in Figure 7); or a 3D orientation of said object relative to said reference volume (Reference “candidate grasp point” and “grasp pose coordinate system”, see section IV: Grasp Planning Module paragraph 1 and Figure 7); and wherein said robot command is executable by a device comprising a robot element configured for handling said object (Reference “Gripper bounding box” and “Experimental platform” see Figure 8 where the handling device is shown theoretically grasping a cylinder. Also see Figure 9 showing the robot gripping jaws installed on robot with grasping arm and stereo camera is shown with grasp object and bin being handled), and wherein said at least one trained 2D CNN comprises a semantic segmentation NN and/or wherein said at least one trained 3D NN comprises a semantic segmentation NN (Reference “PointNet++”, see Section III: B. PPRNet framework construction, where segmenting is performed and the process specifically lists Pointnet++, a 3D trained neural network designed which segments point clouds and such a process performed on images or spaces is commonly referred to as semantic segmentation) wherein said robot command is based at least on said 3D orientation of said object (Reference “robot grasp coordinate system”, see Figure 7 and Figure 8 where the orientation of the object is shown along with an axis with which it is to be grasped and the subsequent bounding of such an object), wherein said robot command comprises a 3D approaching angle for reaching a handling coordinate on said object (Reference “robot grasp coordinate system”, see Figure 7 and Figure 8 where the above orientation is used to determine an approaching or grasp angle). 
However, Lin fails to disclose wherein said object corresponds to a first segment class and a second segment class, wherein said object comprises one or more protrusions corresponding to the second segment class, wherein said 3D approaching angle relates to both reaching the handling coordinate of the first segment class and avoiding collision between said robot element and the second segment class, and wherein said object has a non-convex shape. Instead, Robertson discloses wherein said object corresponds to a first segment class and a second segment class (Reference “fruit/stem” and “leaves”, see Specification paragraph 0110 where the first segment class is the to be handled segment class, a fruit and the second segment class contains objects such as leaves), wherein said object comprises one or more protrusions corresponding to the second segment class (Reference “obstructions”, see Specification paragraph 0110 where obstructions such as leaves are detected and it is noted leaves protrude from a plant), wherein said 3D approaching angle relates to both reaching the handling coordinate of the first segment class and avoiding collision between said robot element and the second segment class (Reference “picking” and “collision”, see Specification paragraph 0158 where the picking of a fruit is described and the robot arm is planned to not undergo any collision during the pick), and wherein said object has a non-convex shape (Examiner’s Note: It is noted the general definition a nonconvex shape would include any shape such that a line can be drawn from two points of a shape where some segment of the line lies outside the bounds of said shape and note any line drawn from one leaf to another leaf on the plant would generally include segments outside the bounds of the plant. See Figure 6 showing the robot in the field harvesting a plant with an example plant that meets the definition of non-convex: that is, a line drawn from one point on one leaf from a singular plant to another leaf on the same plant would have some segment of said line lie outside the bounds of the plant as many leaves dangle off the edges or bounds of the plant shown).  Robertson also discloses motivation for these specific modifications to plan handling coordinates which cut the stem to harvest the fruit (See Specification paragraph 0114 where the fruit can bruise and therefore the stem is cut instead without handling the fruit). Therefore, it would have been obvious one of ordinary skill in the art before the effective filing date to modify Lin in view of Robertson.
Regarding Claim 2, Lin discloses The method of claim 1, wherein said at least one trained 2D CNN comprising said semantic segmentation NN is a 2D U-net and/or wherein said at least one trained 3D NN comprising said semantic segmentation NN is a 3D PointNet++ (Reference “PointNet++”, see Section III: B. PPRNet framework construction, where segmenting is performed and the process specifically lists Pointnet++, a 3D trained neural network designed which segments point clouds and such a process performed on images or spaces is commonly referred to as semantic segmentation).
Regarding Claim 4, Lin discloses The method of claim 1, wherein said robot command is based at least on said 3D coordinate, and wherein said robot command relates to a handling coordinate on said object relating to said 3D coordinate  (Reference “robot grasp coordinate system”, see Section IV: Grasping Planning Module paragraph 5 where the robot grasping is created from the cartesian space).
Regarding Claim 7, Lin discloses The method of claim 1, wherein the generating comprises said 2D segmenting of said at least two images by said at least one trained CNN followed by said  performing of said 3D reconstruction of said 3D surface of said object based on said at least two segmented images (Reference “PointNet++”, see Figure 10A and Figure 10B where the 2D image is segmented into an object point cloud and further how the 2D image is mapped back onto the point cloud in Figure 10E. Also see Section III: B. PPRNet framework construction, where segmenting is performed and the process specifically lists Pointnet++, a 3D trained neural network designed to work specifically on point clouds which were generated from the stereo camera images); wherein said 2D segmenting comprises identifying pixels corresponding to said 3D surface of said object and segmenting the pixels corresponding to said surface of said object with said trained CNN for obtaining at least two segmented images (Reference “PointNet++”, see Figure 10A and Figure 10B where the 2D image is segmented into an object point cloud and further how the 2D image is mapped back onto the point cloud in Figure 10E); wherein said performing of said 3D reconstruction of said 3D surface comprises obtaining a voxel representation (Reference “point cloud”, see Section II: Overall Framework of Grasping Method, paragraph 2 and Figure 3 showing various point clouds created from the stereo images. Note a point cloud being a set of data points in 3d space reads as a voxel representation since a voxel is a data or pixel value in 3d space. Also note object segments versus background segments shown in Figure 3) and letting segmented pixels of the at least two segmented images correspond to segmented voxels in said voxel representation for obtaining said segmented voxel representation images (Reference “PointNet++”, see Figure 10A and Figure 10B where the 2D image is segmented into an object point cloud and further how the 2D image is mapped back onto the point cloud in Figure 10E).
Regarding Claim 8, Lin discloses The method of claim 1, wherein the generating comprises said performing of said 3D reconstruction of said 3D surface of said object based on said at least two images for obtaining said voxel representation followed by said 3D segmenting of said voxel representation by said at least one trained 3D NN (As noted above), wherein said 3D segmenting comprises identifying voxels belonging to said 3D surface of said object and segmenting the voxels belonging to said 3D surface with the trained 3D NN for obtaining said segmented voxel representation (Reference “PointNet++”, see Figure 10A and Figure 10B where the 2D image is segmented into an object point cloud and further how the 2D image is mapped back onto the point cloud in Figure 10E. Also see Section III: B. PPRNet framework construction, where segmenting is performed and the process specifically lists Pointnet++, a 3D trained neural network designed to work specifically on point clouds which were generated from the stereo camera images).
Regarding Claim 9, Lin discloses The method of claim 1, comprises the further step of post-processing said segmented voxel representation in view of one or more semantic segmentation rules relating to one or more segment classes with respect to said 3D surface (see Figures 10A-10E specifically 10B where a voxel representation is further segmented to find both centers of objects in 10C but also entire instances of objects in 10D which relate to the 3D surface originally imaged in Figure 10A).
Regarding Claim 10, Lin discloses The method of claim 1, wherein said semantic segmentation NN comprises any or any combination of: U-net, Dynamic Graph CNN, DGCNN, PointNet++(Reference “PointNet++”, see Section III: B. PPRNet framework construction, where segmenting is performed and the process specifically lists Pointnet++, a 3D trained neural network designed which segments point clouds and such a process performed on images or spaces is commonly referred to as semantic segmentation).
Regarding Claim 11, Lin discloses The method of claim 13, surface of said object is comprising a first segment class a second segment class (Reference see Section IV: Grasp Planning Module B. Pose Estimation Experiment and Figures 10A-10E which show first a segmenting of centroids and then a segmenting of instances to gather poses of objects), wherein said actuation relates to a handling coordinate (Reference “Grasp planning”, see Section IV: Grasp Handling where the data described above for the objects are used to calculate grasp coordinates).
However Lin fails to disclose “said object is a plant comprising a stem corresponding to a first segment class and one or more leaves corresponding to a second segment class”. Instead, Robertson discloses said object is a plant comprising a stem corresponding to a first segment class (Reference “fruit/stem”, Specification paragraph 0110 where a fruit is being detected to be picked and further see the paragraph 0114 where this fruit has a stem which is being targeted and cut to reduce handling of the fruit itself. Details of the target pose determined for cutting the stalk are further described in paragraph 0148) and one or more leaves corresponding to a second segment class (Reference “leaves”, see Specification paragraph 0110 where as described in the rejection of claim 1, where leaves are specifically referenced as the obstructions which correspond to the second segment class). Robertson also discloses motivation for these specific modifications to plan handling coordinates which cut the stem to harvest the fruit (See Specification paragraph 0114 where the fruit can bruise and therefore the stem is cut instead without handling the fruit). Therefore, it would have been obvious one of ordinary skill in the art before the effective filing date to modify Lin in view of Robertson with this segmentation technique.
Regarding Claim 12, Lin discloses The method of claim 13, but fails to disclose wherein said surface of said object is a plant comprising a stem corresponding to a first segment class and one or more leaves corresponding to a second segment class, wherein a 3D approaching angle for reaching a handling coordinate on said object relates to a 3D cutting angle for reaching the handling coordinate in view of a position of said leaves. However Robertson discloses wherein said surface of said object is a plant comprising a stem corresponding to a first segment class (Reference “plant” and “fruit/stem”, see Specification paragraph 0061 describing the types of stem containing plants such as strawberries and tomatoes harvested with this robot. Also see Specification paragraph 0110 where a fruit is being detected to be picked and further see the paragraph 0114 where this fruit has a stem which is being targeted and cut to reduce handling of the fruit itself. Details of the target pose determined for cutting the stalk are further described in paragraph 0148) and one or more leaves corresponding to a second segment class (Reference “leaves”, see Specification paragraph 0110 where as described in the rejection of claim 1, where leaves are specifically referenced as the obstructions which correspond to the second segment class), wherein a 3D approaching angle for reaching a handling coordinate on said object relates to a 3D cutting angle for reaching the handling coordinate in view of a position of said leaves (Reference “collision”, see Specification paragraph 0291 where the hook and approach angle of the robot is described which in turn minimizes the size of the gap needed between leaves). Robertson also discloses motivation for these specific modifications to plan handling coordinates which cut the stem to harvest the fruit (See Specification paragraph 0114 where the fruit can bruise and therefore the stem is cut instead without handling the fruit). Therefore, it would have been obvious one of ordinary skill in the art before the effective filing date to modify Lin in view of Robertson with this approach angle.
Regarding Claim 13, Lin discloses The method of claim 1, comprising the further step of actuating said robot element based on said robot command (Reference “trajectory” see Section IV: C. Robotic Grasping Experiment where the trajectory is determined and executed. Executing the trajectory of a robot arm reads as actuating said robot.)
However, it is noted Lin fails to disclose comprising the further step of actuating said robot element based on said robot command, wherein said actuation relates to cutting a stem at the handling coordinate. Instead, Robertson discloses comprising the further step of actuating said robot element based on said robot command, wherein said actuation relates to cutting a stem at the handling coordinate (Reference “robot arm”, see Specification paragraph 0109 where the picking head is moved to positions for locating and picking target fruit. Picking the fruit comprises griping and cutting. The stem of the fruit specifically is targeted to avoid bruising the fruit as previously mentioned). Robertson also discloses motivation for these specific modifications to plan handling coordinates which cut the stem to harvest the fruit (See Specification paragraph 0114 where the fruit can bruise and therefore the stem is cut instead without handling the fruit). Therefore, it would have been obvious one of ordinary skill in the art before the effective filing date to modify Lin in view of Robertson.
Regarding Claim 14, Lin discloses The method of Claim 1, comprising the further steps of obtaining a training set relating to a plurality of training objects each of the training objects comprising a 3D surface (Reference “3D stacking scene”, see Figure 4 showing a depiction of 3D object stacked via freefall generated in a Physics simulation program from a synthetic data set used to train the trained neural networks), similar to the 3D surface of said object the training set comprising at least two images for each training object (Reference “depth image” and “label image”, see Figure 5A and 5B, and see Section III: Pose Estimation Module A paragraph 5 where the label image, depth image, and point cloud are all generated as a dataset for deep learning framework) receiving manual annotations with respect to a plurality of segment classes from a user for each of the training objects (Section III. A. Synthetic dataset generation method, where manual labelling of the scene is disclosed with its tradeoffs) via a Gul; training, based on said manual annotations, at least one NN, for obtaining said at least one trained NN (Reference “train” and “PPRNet”, See Section III. Where the datasets are used to train the neural network PPRNet).
Regarding Claim 15, Lin discloses A device for handling a three-dimensional, 3D, physical object present within a reference volume (Reference “robot grasp coordinate system”, see Figure 7 and note in applicant’ specification paragraph 0032 a reference volume is simply a cartesian coordinate system which is shown in Figure 7) and comprising a 3D surface  (Reference “3D stacking scene”, see Figure 4 showing a depiction of object stacked via freefall), the device comprising a robot element (see Figure 9 and Section V Experimental Results and Analysis A. Experimental Platform which describes the robot), a processor and memory comprising instructions which, when executed by said processor (Reference “main computer”, “GPU” and “memory” see Section V Experimental Analysis Results and Analysis A. Experiment Platform), cause the device to execute a method according to claim 1 (see rejection of Claim 1).
Regarding Claim 16, Lin discloses A system (Reference “main computer”, “GPU” and “memory” see Section V Experimental Analysis Results and Analysis A. Experiment Platform). for handling a three-dimensional, 3D,physical object (Reference “robot grasping method”, see Section I: Introduction paragraph 1) present within a reference volume (Reference “robot grasp coordinate system”, see Figure 7 and note in applicant’ specification paragraph 0032 a reference volume is simply a cartesian coordinate system which is shown in Figure 7) and comprising a 3D surface (Reference “3D stacking scene”, see Figure 4 showing a depiction of object stacked via freefall), the system comprising: a device; a plurality of cameras positioned at different respective angles with respect to said object and connected to said device (Reference “stereo camera”, see Section II: Overall Framework of Grasping Method, paragraph 2 where a stereo camera which takes multiple images from multiples angles respective to an object in focus is used); a robot element comprising actuation means and connected to said device (see Figure 9 and Section V Experimental Results and Analysis A. Experimental Platform which describes the robot),; wherein said device is configured for: obtaining, from said plurality of cameras at least two images of said physical object (Reference “stereo camera”, see Section II: Overall Framework of Grasping Method, paragraph 2 where a stereo camera which takes multiple images from multiples angles respective to an object in focus is used); generating, with respect to the 3D surface of said object, a voxel representation segmented based on said at least two images (Reference “point cloud”, see Section II: Overall Framework of Grasping Method, paragraph 2 and Figure 3 showing various point clouds created from the stereo images. Note a point cloud being a set of data points in 3d space reads as a voxel representation since a voxel is a data or pixel value in 3d space. Also note object segments versus background segments shown in Figure 3); computing a robot command for said handling of said object based on said segmented voxel representation (Reference “Grasp planning”, see Section IV: Grasp Handling where the data described above for the objects are used to calculate grasp coordinates); sending said robot command to said robot element for letting said robot element handle said object (Reference “control” and “grasp task”, see Section IV: Grasp Planning Module where the robot is controlled to perform the grasp task); wherein said plurality of cameras is configured for: acquiring at least two images of said physical object (Reference “stereo camera”, see Section II: Overall Framework of Grasping Method, paragraph 2 where a stereo camera which takes multiple images from multiples angles respective to an object in focus is used); sending the images to said device (See Figure 2 showing connection of camera to system to send images); said robot element is configured for: receiving said robot command from said device (Reference “control” and “grasp task”, see Section IV: Grasp Planning Module where the robot is controlled to perform the grasp task); handling said object using said actuation means (Reference “UR3 robot” see Section V Experimental Results and Analysis A. Experiment Platform which comprises actuation means); wherein the generating comprises 2D segmenting said at least two images by means of at least one trained 2Dconvolutional neural network, CNN, followed by performing a 3Dreconstruction of said 3D surface of said object HJ-based at least on said at least two segmented images (Note due to following use of the and/or disjunctive limitation a rejection with prior art has been made for the latter 3D limitation cited next); and/or performing a 3D reconstruction of said 3D surface of said object based on said at least two images for obtaining a voxel representation followed by 3D segmenting said voxel representation by means of at least one trained 3D neural network, NN; wherein said robot command is computed based on at least one of: a 3D coordinate within said reference volume (Reference “robot grasp coordinate system”, see Figure 7 and note in applicant’ specification paragraph 0032 a reference volume is simply a cartesian coordinate system which is shown in Figure 7); and/or a 3D orientation of said object relative to said reference volume (Note due to following use of the and/or disjunctive limitation a rejection with prior art has been made for the prior limitation last cited. Further, it is also noted as in rejection of Clam 1 the art teaches this limitation as well); wherein said at least one trained 2D CNN comprises a semantic segmentation NN (Note due to following use of the and/or disjunctive limitation a rejection with prior art has been made for the latter 3D limitation cited next);  and/or wherein said at least one trained 3D NN comprises a semantic segmentation NN (Reference “PointNet++”, see Section III: B. PPRNet framework construction, where segmenting is performed and the process specifically lists Pointnet++, a 3D trained neural network designed which segments point clouds and such a process performed on images or spaces is commonly referred to as semantic segmentation).
However, Lin fails to disclose wherein said object corresponds to a first segment class and a second segment class, wherein said object comprises one or more protrusions corresponding to the second segment class, wherein said 3D approaching angle relates to both reaching the handling coordinate of the first segment class and avoiding collision between said robot element and the second segment class, and wherein said object has a non-convex shape. Instead, Robertson discloses wherein said object corresponds to a first segment class and a second segment class (Reference “fruit/stem” and “leaves”, see Specification paragraph 0110 where the first segment class is the to be handled segment class, a fruit and the second segment class contains objects such as leaves), wherein said object comprises one or more protrusions corresponding to the second segment class (Reference “obstructions”, see Specification paragraph 0110 where obstructions such as leaves are detected and it is noted leaves protrude from a plant), wherein said 3D approaching angle relates to both reaching the handling coordinate of the first segment class and avoiding collision between said robot element and the second segment class (Reference “picking” and “collision”, see Specification paragraph 0158 where the picking of a fruit is described and the robot arm is planned to not undergo any collision during the pick), and wherein said object has a non-convex shape (Examiner’s Note: It is noted the general definition a nonconvex shape would include any shape such that a line can be drawn from two points of a shape where some segment of the line lies outside the bounds of said shape and note any line drawn from one leaf to another leaf on the plant would generally include segments outside the bounds of the plant. See Figure 6 showing the robot in the field harvesting a plant with an example plant that meets the definition of non-convex: that is, a line drawn from one point on one leaf from a singular plant to another leaf on the same plant would have some segment of said line lie outside the bounds of the plant as many leaves dangle off the edges or bounds of the plant shown).  Robertson also discloses motivation for these specific modifications to plan handling coordinates which cut the stem to harvest the fruit (See Specification paragraph 0114 where the fruit can bruise and therefore the stem is cut instead without handling the fruit). Therefore, it would have been obvious one of ordinary skill in the art before the effective filing date to modify Lin in view of Robertson to use this approach angle and segmentation.
Regarding Claim 17, Lin discloses The method of claim 1, wherein the voxel representation comprises a representation of the three-dimensional surface of the object in the form of a plurality of voxels (Reference “coordinate system”, see Section IV Grasp Planning Module paragraph 3 where the grasp pose coordinate system is 3-dimensional and also is used to determine grasp ability of the surface of the object, centroid of the object, and physical space of the object in the 3-d space all of which would read as a voxel or plurality of voxels being points or sets of points in said 3-d space) with a non-infinite size in a predefined voxel grid (Reference “size”, see Section C where the stacking scene described is within a box of real dimensions or non-infinite size. Also note the gripper bounding box in Section IV Grasp Planning Module paragraph 3 which would read as a predefined voxel grid as).
Regarding Claim 18, Lin discloses The method of claim 1, but fails to disclose wherein generating the segmented voxel representation comprises performing the 2D segmenting of the at least two images by the at least one trained 2D convolutional neural network, followed by the 3D reconstruction based at least on the at least two segmented images.
Instead, Robertson discloses wherein generating the segmented voxel representation comprises performing the 2D segmenting of the at least two images by the at least one trained 2D convolutional neural network (Reference “convolutional neural network”, see Specification paragraph 0146 where the segmentations into classes are performed by a convolutional neural network or decision forest classifier. This segmentation is performed on the pixels of the image which are 2D sourced from the stereo camera described in paragraph 0143 or 0145), followed by the 3D reconstruction based at least on the at least two segmented images (Reference “3D points”, see Specification paragraph 0148 where the 3d pose estimation which reads as a 3D reconstruction of the segmented image. Note the reference to the target fruit which as referenced previously in the action or in paragraph 0147 comes specifically from the segmentation performed by the CNN). Robertson also discloses motivation for these specific modifications to plan handling coordinates which cut the stem to harvest the fruit (See Specification paragraph 0114 where the fruit can bruise and therefore the stem is cut instead without handling the fruit). Therefore, it would have been obvious one of ordinary skill in the art before the effective filing date to modify Lin in view of Robertson to utilize this segmentation approach.
Regarding Claim 19, Lin discloses The method of claim 1, wherein the at least two images are obtained from a plurality of cameras positioned at the different respective angles with respect to the object, without latency in acquiring the images (Reference “Ensenso N35 stereo camera”, See Section V Experiment Results and Analysis Part A. Experiment Platform where the Ensenso N35 stereo camera takes images from a plurality of cameras position at different angles. See figure 9 showing this stereo camera used which contains a plurality of cameras positioned at different angles with respect to the picking scene captured).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Lin in view of Robertson et al (US Publication No. 20190261565 A1) and further in view of Perez-Sanz et al. (“Plant Phenomics: An Overview of Image Acquisition Technologies and Image Data Analysis Algorithms”).
Regarding Claim 3, Lin discloses The method of Claim 1, and wherein said computing of said robot command is further based on values of pixels whereof at least an intensity is determined based on image information (Reference Figures 10A-10E, where the greatest example of intensities in the image information is shown, especially figure 10C showing the clustering of centroids which has various intensities shown). However, Lin fails to disclose “wherein at least one of said plurality of cameras is a hyperspectral camera, herein the object comprises a portion of a plant,” and that the “values of pixels whereof at least an intensity is based on hyperspectral image information”. Instead, 
Instead, Perez-Sanz discloses wherein at least one of said plurality of cameras is a hyperspectral camera, herein the object comprises a portion of a plant (See Section “Multi- and Hyperspectral Cameras”, where this is identified as an already established practice common in the art). Perez-Sanz also discloses values of pixels whereof at least an intensity is based on hyperspectral image information (Reference “segmenting” and “spectrum”, see Section “Mono-RGB vision” where a traditional camera such as the one disclosed by Lin can be modified with more systems which in turn create segmenting from regions of a spectrum. And as noted previously, it is Lin’s segmenting which creates intensity maps of centroids as shown by Lin in Figure 10C). Further it is noted Perez-Sanz describes not only several motivations—such as identifying various indicators such as pests or pathologies in a noninvasive manner (See Section “Multi- and Hyperspectral Cameras”) as well as water and nutrient contents of crops (See Table 4’s Row “Multi-Hyper Spectral” and Column “Machine Learning”) which are the same specific indicators claimed in applicant’s specification-- but also predicts a future shift in more hyperspectral imaging specifically for plant imaging (See Section “Multi- and Hyperspectral Cameras”). Therefore, it would have been obvious to one of ordinary skill in the art before the time of filing to modify Lin as predicted and described by Perez-Sanz to include hyperspectral imaging.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER JOHN RODGERS whose telephone number is (703)756-1993. The examiner can normally be reached 5:30AM to 2:30PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Villecco can be reached on (571) 272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/ALEXANDER JOHN RODGERS/Examiner, Art Unit 2661   

/JOHN VILLECCO/Supervisory Patent Examiner, Art Unit 2661
Read full office action
Prosecution Timeline

May 27, 2022
Application Filed
May 27, 2022
Response after Non-Final Action
Aug 09, 2024
Non-Final Rejection — §103, §112
Sep 26, 2024
Interview Requested
Oct 10, 2024
Examiner Interview Summary
Nov 13, 2024
Response Filed
Feb 07, 2025
Final Rejection — §103, §112
Aug 13, 2025
Request for Continued Examination
Aug 14, 2025
Response after Non-Final Action
Jan 10, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/758,003
Patent 12548181
INFORMATION PROCESSING APPARATUS, SENSING APPARATUS, MOBILE OBJECT, METHOD FOR PROCESSING INFORMATION, AND INFORMATION PROCESSING SYSTEM
2y 5m to grant Granted Feb 10, 2026
17/979,164
Patent 12541961
INFORMATION EXTRACTION METHOD OF OFFSHORE RAFT CULTURE BASED ON MULTI-TEMPORAL OPTICAL REMOTE SENSING IMAGES
2y 5m to grant Granted Feb 03, 2026
17/952,002
Patent 12494058
RELATIONSHIP MODELING AND KEY FEATURE DETECTION BASED ON VIDEO DATA
2y 5m to grant Granted Dec 09, 2025
18/117,102
Patent 12453511
SYSTEMS AND METHODS FOR CONFIRMATION OF INTOXICATION DETERMINATION
2y 5m to grant Granted Oct 28, 2025
17/932,544
Patent 12430771
LIGHT FIELD RECONSTRUCTION METHOD AND APPARATUS OF A DYNAMIC SCENE
2y 5m to grant Granted Sep 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
70%
Grant Probability
77%
With Interview (+7.0%)
3y 2m
Median Time to Grant
High
PTA Risk
Based on 33 resolved cases by this examiner. Grant probability derived from career allow rate.
IMPROVED PHYSICAL OBJECT HANDLING BASED ON DEEP LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email