Last updated: April 19, 2026
Application No. 18/550,946
IMPROVED ORIENTATION DETECTION BASED ON DEEP LEARNING

Final Rejection §103§DP
Filed
Sep 15, 2023
Examiner
RAMIREZ, ELLIS B
Art Unit
3658
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Robovision
OA Round
2 (Final)
Interview Optional

— +18.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 194 resolved cases, 2023–2026
Examiner Intelligence

RAMIREZ, ELLIS B View full profile →
Grants 80% — above average
Career Allow Rate
156 granted / 194 resolved
+28.4% vs TC avg
Strong +18% interview lift
Without
With
+18.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
39 currently pending
Career history
233
Total Applications
across all art units
Statute-Specific Performance

§101
9.1%
-30.9% vs TC avg
§103
62.0%
+22.0% vs TC avg
§102
14.1%
-25.9% vs TC avg
§112
7.4%
-32.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 194 resolved cases
Office Action

§103 §DP
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
                                  		           Response to Amendments
The amendment and response  filed on October 16, 2025, to  the Non-Final Office Action dated July 16, 2025 has been entered.  Claims 1 -2 were amended for the purpose of correcting typographical errors.  Applicant’s amendments to the Claims have been found sufficient to overcome the previous claim objections.  Claims 1 – 18 are pending in this application.      
                                                             Response to Arguments             
Applicant’s arguments and amendments filed on October 16, 2025, with respect to the Double Patenting rejection, requests that the rejection be held in abeyance pending determination of patentable subject matter in this case. Response at Page 7. A complete response to a non-statutory double patenting (NSDP) rejection is either a reply by applicant showing that the claims subject to the rejection are patentably distinct from the reference claims, or the filing of a terminal disclaimer in accordance with 37 CFR 1.321.  Such a response is required even when the non-statutory double patenting rejection is provisional.  See MPEP § 804 and  § 804.02, subsection VI.
 Since the response of October 16 ,2025,  did not include  (i) a showing of the claims being patentable distinct or (ii) a filing of a terminal disclaimer the response is deemed to be non-responsive to the Office action of July 16, 2025.
Applicant’s arguments and amendments, see pages 7-13, filed October 16, 2025, with respect to the 35 U.S.C. § 103 rejection based on Hirano et al (NPL: "Image-based object recognition and dexterous hand/arm motion planning using RRTs for grasping in cluttered scene") , Ku et al (US-20220016765-A1), and   further in view of Robertson et al (US-20210000013-A1) have been considered but are not persuasive.  The 35 U.S.C. § 103 rejection of claims 1-18  is maintained for the reasons explained below. 
Applicant’s arguments with respect to the § 103 rejections of Claims 1 and 14 have been fully considered but are not persuasive. Specifically with regard to the independent claims 1 and 14, applicant argued that Hirano in combination with Ku and Robertson did not disclose "generating, with respect to the 3D surface of the physical object, a voxel representation segmented based on the at least two images."  Firstly, Hirano discloses the use of two or stereo camera for acquiring at least two images of a 3D object at Section A of the NPL. Further, the applied prior art, especially KU, discloses at Para. [0016] acquiring the  “principal component of the object based on the 3D points for the object”. Further, in Para. [0099] these principal components can be represented as a voxel of the 3D object. Additionally, a point cloud being a set of data points in 3d space reads as a voxel representation since a voxel is a data or pixel value in 3d space. Also note segmented regions in an image represented as a plurality of depth map which  are linked to represent an object like  shown in Figure 2 of Hirano can be reasonably interpreted as a voxel representation.
Applicant’s argument concerning the "determining the main direction [of the object] based on the segmented voxel representation" using a neural network trained with respect to the main direction. Ku at Para. [0049] discloses the use of a neural network implemented in detector 345, scene parser 320 at Fig. 2B, can function to “detect objects, extract features, attributes, and/or other information from images and/or point clouds.”  An attribute of an object includes its position and orientations, see Para. [0073], which is tantamount to the claimed object direction.
Applicant’s argument concerning the failure of the applied prior art  to disclose "computing the robot command for the handling of the physical object based on the segmented voxel representation and the determined main direction" of the object. Hirano at Figure 8 and Section III  generates a command for a robot hand to grasp an object based on the segmentation; Ku at Figure 2A clearly shows a robot being commanded to grasp an object; and Robertson at Figure 1 and Para. [0288] discloses a robotic hand to “select, grasp, and cut produce from the host plant”. 
After consideration of the applied prior art and Applicant’s remarks it has been determined that Hirano, Ku, and Robertson continue to meet the invention as claimed.
                                                              Double Patenting
The non-statutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A non-statutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on non-statutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
       The filing of a terminal disclaimer by itself is not a complete reply to a non-statutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-18 provisionally rejected on the ground of non-statutory double patenting as being unpatentable over claims 1-20 of copending Application No. 18/550,948 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other because they both recite the same limitations except for negligible wording/phrasing differences. This is a provisional non-statutory double patenting rejection because the patentably indistinct claims have not in fact been patented. 
Claims 1-18 provisionally rejected on the ground of non-statutory double patenting as being unpatentable over claims 1-16 of copending Application No. 17/780,759 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other because they both recite the same limitations except for negligible wording/phrasing differences. This is a provisional non-statutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
	           Claim Rejections -- 35 U.S.C. § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-18 are rejected under 35 U.S.C. 103 as being unpatentable over Hirano et al (NPL: "Image-based object recognition and dexterous hand/arm motion planning using RRTs for grasping in cluttered scene") (“Hirano”), Ku et al (US-20220016765-A1)(“KU”), and   further in view of Robertson et al (US-20210000013-A1)(“Robertson”).
As per claim 1, Hirano discloses a method for generating a robot command  for handling a three-dimensional (3D) a physical object present within a reference volume (Hirano at Abstract), the physical object  comprising a main direction and a 3D surface, the method comprising:
obtaining at least two images of the physical object  from a plurality of cameras  positioned at different respective angles with respect to the object (Hirano at Page 2, Column 1, Paragraphs 1-2, “stereo camera system”.);
 
 determining the main direction based on the segmented voxel representation (Hirano at Page 3, Column 2, Section 3, at Figure 6 first photograph showing the locus of points.); and
computing the robot command  for the handling of the physical object  based on the segmented voxel representation and the determined main direction (Hirano at Figure 8, robot hand grasping object, and Page 3, Column 2, Section III, disclosing calculating a grasping strategy:” Fig. 8 shows some variations of grasping attitude of the robot hand. One axis of the hand coordinate system {Xh, Yh, Zh} is settled to be parallel to the main axis of the object as shown in Fig. 8. Once the orientation of the hand coordinate frame in the grasping attitude is decided, then the inverse kinematics problem is solved to decide collision-free arm attitude. This arm attitude corresponds to one goal point in the arm joint configuration space.”) , 
, 
wherein the robot command  is executable by means of a device comprising a robot element  configured for handling the physical object (Hirano at Figure 8 see robot hand.).  
Hirano does not explicitly disclose generating, with respect to the 3D surface of the physical object , a voxel representation segmented based on the at least two images, said segmenting being performed by means of at least one segmentation neural network (NN)_trained with respect to the main direction.
Ku in the same field of endeavor discloses a method for object grasping by determining a set of grasp locations and deriving a score for success for the determined grasp score.
 In particular, Ku discloses a process for generating, with respect to the 3D surface of the physical object , a voxel representation segmented based on the at least two images, said segmenting being performed by means of at least one segmentation neural network (NN)_trained with respect to the main direction (Ku at Figure 2, scene parsing module 320 and described in Para. [0030] using machine learning , and Para. [0099] disclosing that a voxel representation of the image to determine a grasp location:” a predetermined grid (e.g., with a predetermined cell size, cell distribution, etc.) can be overlayed on an image and/or point cloud of the scene to subdivide the scene into a set of image segments and/or voxels. In this example, a candidate grasp can be determined for each grid cell (e.g., based on the image segment, using methods discussed above), wherein a final grasp can be selected from the resultant set of candidate grasps.”).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the robot control as taught by Hirano with the scene parser using machine learning taught by Ku with a reasonable expectation of success in order for the robot to recognize an object and the grasping of the object in a reference space or volume.. The teaching suggestion/motivation to combine is that by using a machine learning process to determine a grasping point, increases the probability of grasp success as taught by Ku in Para. [0094]. 
While Hirano teaches the segmentation of objects to identify a grasp point and Ku teaches that the success for identifying the grasp point can be improved through machine learning and the like; both Hirano and Ku do not explicitly disclose determining a main direction relative to the reference volume.
Robertson in the same field of endeavor discloses a system for grasping objects by recognizing that certain objects like fruit and produce and the like have what is called a dominant axis and that with this knowledge a neural network could be trained to determine from a set of images the grasping point for the fruit object. See Abstract and Figure 9.
Hirano and Ku do not disclose but Robertson discloses wherein the robot command  is computed based on the determined main direction of the physical object  relative to the reference volume (Robertson at Figure 9 and Para. [0153] discloses the use of a neural network to find the centroid point an object such as fruit to then follow a path that can be considered the grasping point which is the stalk of the fruit:” training images are cropped so that the centroid of the detected fruit appears in the center of the frame and scaled so that the fruit occupies constant size. Then a convolutional neural network or other regression model is trained to predict fruit orientation in previously unseen images. Various image features are informative as to the orientation of the fruit in the camera image frame (and can be exploited automatically by a suitable machine learning approach), e.g. the density and orientation of any seeds on the surface of the fruit, the location of the calyx (the leafy part around the stem), and image location of the stalk.”).
It would have been obvious to one of ordinary skill in the art  before the effective filing date of the claimed invention  to implement the grasping point method taught in Robertson in the robot controller in Hirano at modified by Ku with a reasonable expectation of success because this results in the robots being utilized to determine the grasping point for a produce like fruit thereby minimizing errors and minimizing the likelihood of spoilage by grasping the object/fruit at the wrong location (See Robertson at Para. [0284] disclosing importance of locating specific parts of a fruit.).
As per claim 2, Hirano, Ku, and Robertson disclose a  method according to claim 1, wherein the generating comprises:
 determining one or more protruding portions associated with the main direction (Robertson at Para. [0154] that a protruding portion of an object fruit is its stalk:” knowledge of the orientation of the stalk may be very important for picking some types of fruits (or otherwise informative as to the orientation of the body of the fruit), another useful innovation is a stalk detection algorithm that identifies and delineates stalks in images. A stalk detector can be implemented by training a pixel-wise semantic labelling engine (e.g. a decision forest or CNN) using manually annotated training images to identify pixels that lie on the central axis of a stalk.”), 
 wherein the determining of the main direction is based further on the determined one or more protruding portions (Robertson at Para. [0153] discloses tracing from the centroid of the fruit until the stalk (protrusion) is located:” Suitable training images may be obtained using a camera mounted to the end of a robot arm. First, the arm is moved manually until the camera is approximately aligned with a suitable fruit-based coordinate system and a fixed distance away from the fruit's centroid. The arm is aligned so the fruit has canonical orientation in a camera image, i.e. so that the two or three angles used to describe orientation in the camera coordinate frame are 0.”).  
As per claim 3, Hirano, Ku, and Robertson disclose a  method according to claim 1, wherein the generating comprises:
wherein the main direction is determined with respect to a geometry of the 3D surface (Hirano at Figure 6, Page 3, Section III.). 
As per claim 4, Hirano, Ku, and Robertson disclose a  method according to claim 1,further comprising:
determining a clamping portion for clamping the physical object  by means of the robot element , wherein the handling comprises clamping the physical object  based on the clamping portion ( Hirano at Page 3, Section III, discloses the grasping/clamping portion of the object as secured by the robot:” Fig. 8 shows some variations of grasping attitude of the robot hand. One axis of the hand coordinate system {Xh, Yh, Zh} is settled to be parallel to the main axis of the object as shown in Fig. 8. Once the orientation of the hand coordinate frame in the grasping attitude is decided, then the inverse kinematics problem is solved to decide collision-free arm attitude.”).  
As per claim 5, Hirano, Ku, and Robertson disclose a  method according to claim 1,wherein the handling of the physical object  by the robot command  is performed with respect to another object  being a receiving object for receiving the physical object (Robertson at Para. [0173] discloses the picking and placement of an object fruit:” to facilitate more precise placement of the fruit in the storage container, and therefore to minimize the risk of bruising due to collisions.”).  
As per claim 6, Hirano, Ku, and Robertson disclose a method according to claim 5, wherein the receiving object  comprises a receiving direction for receiving the physical object ,wherein the determining of  a clamping portion is based on the main direction of the physical object  and the receiving direction of the receiving object , wherein the handling comprises orienting the physical object  with respect to the main direction of the physical object  and the receiving direction of the receiving object (Robertson at Para. [0197] discloses a various placement schemes for the fruit at a tray:” each fruit can be placed to minimize total cost considering (i) the known existing placement of strawberries in punnets and (ii) expectation over many samples of future streams of yet-to-be-picked strawberries. A probability distribution (Gaussian, histogram, etc.) describing the size of picked fruits and possibly other measures of quality can be updated dynamically as fruit is picked.”).  
As per claim 7, Hirano, Ku, and Robertson disclose a method according to claim 1,wherein the physical object  relates to a plant, wherein the main direction is a growth direction of the plant, wherein the determining of the main direction is based on an indication of a growth direction provided by the 3D surface (Robertson at Para. [0607] discloses a growth trajectory data:” robotic fruit picking system in which the prediction of the flavour or quality of a fruit depends on the analysis of a growth trajectory data measured over time for the fruit.”).  
As per claim 8, Hirano, Ku, and Robertson disclose a method according to claim 1,wherein the generating comprises:
Two-dimensional (2D) segmenting  the at least two images  by means of at least one trained semantic segmentation NN being a 2D convolutional neural network (Ku at Para. [0043] discloses that a feature can be based on 2D features:” the features are 2D features, the 2D features can be mapped to a point cloud (e.g., predetermined, generated using the method, sampled in S100, etc.) to determine feature locations in 3D”.), CNN, for determining one or more segment components corresponding to protruding portions of the physical object  in each of the at least two images (Ku at Para. [0049] discloses the use of a CNN on the feature:” detector 345 can function to detect objects, extract features, attributes, and/or other information from images and/or point clouds. The detector is preferably a convolutional neural network (CNN)”.);
performing a 3D reconstruction of the 3D surface of the physical object  based at least on the at least two images for obtaining a voxel representation (Ku at Para. [0033] discloses using multiple images:” sensor suite can include an imaging system which preferably functions to capture images of the inference scene, but can provide any other functionality. An imaging system can include: stereo camera pairs, CCD cameras, CMOS cameras”.);
 obtaining said segmented voxel representation by projecting said one or more segment components with respect to said voxel representation (Ku at Para. [0099] discloses a voxel representation:” a predetermined grid (e.g., with a predetermined cell size, cell distribution, etc.) can be overlayed on an image and/or point cloud of the scene to subdivide the scene into a set of image segments and/or voxels.”).  
As per claim 9, Hirano, Ku, and Robertson disclose a method according to claim1, wherein the generating comprises:
performing a 3D reconstruction of the 3D surface of the physical object  based on the at least two images for obtaining a voxel representation (Robertson at Figure 9 and Para. [0067] discloses “computer vision system comprising a 3D stereo camera and image processing software for detecting target fruits, and deciding whether to pick them and how to pick them.”);
 3D segmenting said voxel representation by means of at least one semantic segmentation NN being a 3D CNN trained with respect to the main direction (Robertson at Paras. [0148]-[0149] discloses segmenting the images using a CNN:” A decision forest classifier or convolutional neural network (CNN) may be trained to perform semantic segmentation, i.e. to label pixels corresponding to ripe fruit, unripe fruit, and other objects. Pixel-wise labelling may be noisy, and evidence may be aggregated across multiple pixels by using a clustering algorithm. [0149] 2. A CNN can be trained to distinguish image patches that contain a target fruit at their center from image patches that do not. A sliding window approach may be used to determine the positions of all image patches likely to contain target fruits.”);
 obtaining said segmented voxel representation by determining one or more segment components corresponding to protruding portions of the physical object  in the voxel representation (Robertson at Para. [0244] discloses a voxel representation of the captured scene:” approximating real and possibly frequently changing scene geometry using voxels, those edges corresponding to paths which would cause the arm to collide with the scene can be quickly eliminated from the graph at runtime. A suitable voxel-based model of approximate scene geometry might be obtained using prior knowledge of the geometry of the growing infrastructure and the pose of the robot relative to it.”);
 wherein said obtaining of said segmented voxel representation comprises determining a first portion of the protruding portions associated with the main direction (Robertson at Para. [0289] discloses where the representation includes a volume that comprises the stem of the fruit/object to be grasped:” the control of the Picking Arm (or otherwise) the hook is mechanically swept through a (in general dynamically) chosen volume of space, and any stems within this swept volume are gathered into the hook. With the stem thus captured, the hook may be used to pull the target fruit away from the plant (and potential sources of occlusion like leaves or other fruits) so that measurements of picking suitability may be made (including visual, olfactory and tactile measurements).”).  
As per claim 10, Hirano, Ku, and Robertson disclose a method according to claim 9, wherein said performing of said 3D reconstruction comprises determining RGB values associated with each voxel based on said at least two images, wherein said 3D segmenting is performed with respect to said voxel representation comprising said RGB values by means of a NN trained with RGB data (Robertson at Para. [0147] discloses the use of RGB data:” machine learning approach is used to train a detection algorithm to identify fruit in RGB colour images (and/or in depth images obtained by dense stereo or otherwise).”). 
As per claim 11, Hirano, Ku, and Robertson disclose a method according to claim 8, further comprising:
obtaining a training set relating to a plurality of training objects, each of the plurality of training objects comprising a 3D surface similar to the 3D surface of said physical object, the training set comprising at least two images for each of the plurality of training objects (Robertson at least Para. [0148] discloses training with image data:” decision forest classifier or convolutional neural network (CNN) may be trained to perform semantic segmentation, i.e. to label pixels corresponding to ripe fruit, unripe fruit, and other objects. Pixel-wise labelling may be noisy, and evidence may be aggregated across multiple pixels by using a clustering algorithm.”);
 receiving manual annotations with respect to said main direction from a user for each of the plurality of training objects via a graphic user interface (GUI} (Robertson at Para. [0147] discloses manual annotation of image data:” provide training data, images obtained from representative viewpoints are annotated manually with the position and/or extent of target fruit.”); and
training, based on said manual annotations, at least one NN, for obtaining said at least one trained NN, wherein, for each training object, said receiving of manual annotations relates to displaying an automatically calculated centroid for each object and receiving a manual annotation being a position for defining said main direction extending between said centroid and said position, said manual annotation is the only annotation to be performed by said user. (Robertson at Para. [0149] discloses using a CNN on the image data including the manual annotations:” CNN can be trained to distinguish image patches that contain a target fruit at their center from image patches that do not. A sliding window approach may be used to determine the positions of all image patches likely to contain target fruits. Alternatively, the semantic labelling algorithm 1 may be used to identify the likely image locations of target fruits for subsequent more accurate classification by a (typically more computationally expensive) CNN.”) 
As per claim 12, Hirano, Ku, and Robertson disclose a method according to claim 1,further comprising:
preprocessing the at least two images, wherein the preprocessing comprises at least one of largest component detection, background subtraction, mask refinement, cropping and rescaling (Robertson at Para. [0113] discloses isolating the object from other objects in a field of view:” secondary purpose is to move leaves and other sources of occlusion out of the way so fruit can be detected and localized, and to separate target fruit from the plant (before it is permanently severed) to facilitate determination of picking suitability.”); or
 postprocessing the segmented voxel representation in view of one or more semantic segmentation rules relating to one or more segment classes with respect to the 3D surface (Robertson at Para. [0167] discloses a postprocessing to isolate the object within an image frame:” an alternative and innovative approach is to use an implicit 3D model of the scene formed by the range of viewpoints from which the target fruit can be observed without occlusion. The underlying insight is that if the target fruit is wholly visible from a particular viewpoint, then the volume defined by the inverse projection of the 2D image perimeter of the fruit must be empty between the camera and the fruit.”).  
As per claim 13, Hirano, Ku, and Robertson disclose device for handling a three-dimensional, 3D, the physical object  present within a reference volume, the physical object  comprising a main direction and a 3D surface, the device comprising a robot element , a processor and memory  comprising instructions which, when executed by the processor (See at least Ku for hardware used to implement grasp point identifying from images.), cause the device to execute a method according to claim 1 (See above discussion of claim 1.).
As per claim 14, Hirano discloses a system for handling a three-dimensional(3D} a physical object  present within a reference volume, the physical object comprising a main direction and a 3D surface (Figures 8-10), the system comprising:
a device (Figure 8 robotic hand.);
 a plurality of cameras  positioned at different respective angles with respect to the physical object  and connected to the device (Hirano at Page 1, column 2, discloses “computer vision technique to reconstruct 3D scene, separate multiple objects by stereo vision and then approximate each objects to simple shaped primitives”.); and
 a robot element  comprising actuation means and connected to the device, wherein the device is configured (Hirano at Figure 8, robotic hand.)for:
obtaining, from the plurality of cameras , at least two images of the physical object (Hirano at Page 2, Column 1, Paragraphs 1-2, “stereo camera system”.);
 
    PNG
    media_image1.png
    13
    7
    media_image1.png
    Greyscale

 determining a main direction based on the segmented voxel representation (Hirano at Page 3, Column 2, Section 3, at Figure 6 first photograph showing the locus of points.);
 
 and sending the robot command  to the robot element  for letting the robot element  handle the physical object (Hirano at Figure 8.), wherein the plurality of cameras  is configured for:
acquiring at least two images of the physical object (Hirano at Page 2, Column 1, Paragraphs 1-2, “stereo camera system”.);
 and sending the at least two images of the device (Hirano at Page 2, Column 2, disclosing “As shown in Fig. 1, at first, a pair of stereo images is taken at a starting position. Then the stereo cameras are moved to get other pair of images from different viewpoints. At the same time, by tracking feature points in the images, camera motion from the previous viewpoint is estimated.“) , wherein the robot element  is configured for:
receiving the robot command  from the device (Hirano at Page 3, Section III, grasping strategies that are sent to the robot hand for securing the object.);
 and handling the physical object  using the actuation means, wherein the robot command  is executable by means of a device comprising a robot element  configured for handling the physical object Hirano does not explicitly disclose generating, with respect to the 3D surface of the physical object , a voxel representation segmented based on the at least two images, said segmenting being performed by means of at least one segmentation neural network (NN)_trained with respect to the main direction.
Ku in the same field of endeavor discloses a method for object grasping by determining a set of grasp locations and deriving a score for success for the determined grasp score.
 In particular, Ku discloses a process for generating, with respect to the 3D surface of the physical object , a voxel representation segmented based on the at least two images, said segmenting being performed by means of at least one segmentation neural network (NN)_trained with respect to the main direction (Ku at Figure 2, scene parsing module 320 and described in Para. [0030] using machine learning , and Para. [0099] disclosing that a voxel representation of the image to determine a grasp location:” a predetermined grid (e.g., with a predetermined cell size, cell distribution, etc.) can be overlayed on an image and/or point cloud of the scene to subdivide the scene into a set of image segments and/or voxels. In this example, a candidate grasp can be determined for each grid cell (e.g., based on the image segment, using methods discussed above), wherein a final grasp can be selected from the resultant set of candidate grasps.”).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the robot control as taught by Hirano with the scene parser using machine learning taught by Ku with a reasonable expectation of success in order for the robot to recognize an object and the grasping of the object in a reference space or volume.. The teaching suggestion/motivation to combine is that by using a machine learning process to determine a grasping point, increases the probability of grasp success as taught by Ku in Para. [0094]. 
While Hirano teaches the segmentation of objects to identify a grasp point and Ku teaches that the success for identifying the grasp point can be improved through machine learning and the like; both Hirano and Ku do not explicitly disclose determining a main direction relative to the reference volume.
Robertson in the same field of endeavor discloses a system for grasping objects by recognizing that certain objects like fruit and produce and the like have what is called a dominant axis and that with this knowledge a neural network could be trained to determine from a set of images the grasping point for the fruit object. See Abstract and Figure 9.
Hirano and Ku do not disclose but Robertson discloses wherein the robot command  is computed based on the determined main direction of the physical object  relative to the reference volume (Robertson at Figure 9 and Para. [0153] discloses the use of a neural network to find the centroid point an object such as fruit to then follow a path that can be considered the grasping point which is the stalk of the fruit:” training images are cropped so that the centroid of the detected fruit appears in the center of the frame and scaled so that the fruit occupies constant size. Then a convolutional neural network or other regression model is trained to predict fruit orientation in previously unseen images. Various image features are informative as to the orientation of the fruit in the camera image frame (and can be exploited automatically by a suitable machine learning approach), e.g. the density and orientation of any seeds on the surface of the fruit, the location of the calyx (the leafy part around the stem), and image location of the stalk.”).
It would have been obvious to one of ordinary skill in the art  before the effective filing date of the claimed invention  to implement the grasping point method taught in Robertson in the robot controller in Hirano at modified by Ku with a reasonable expectation of success because this results in the robots being utilized to determine the grasping point for a produce like fruit thereby minimizing errors and minimizing the likelihood of spoilage by grasping the object/fruit at the wrong location (See Robertson at Para. [0284] disclosing importance of locating specific parts of a fruit.).
As per claim 15, Hirano, Ku, and Robertson a non-transitory computer readable medium containing a computer executable software which when executed on a device, performs the method of claim 1(See rejection above to claim 1).. 
As per claim 16, Hirano, Ku, and Robertson disclose a  method according to claim 2, wherein said obtaining of said segmented voxel representation comprises determining a first portion of the protruding portions associated with the main direction (Robertson at Para. [0154] that a protruding portion of an object fruit is its stalk:” knowledge of the orientation of the stalk may be very important for picking some types of fruits (or otherwise informative as to the orientation of the body of the fruit), another useful innovation is a stalk detection algorithm that identifies and delineates stalks in images. A stalk detector can be implemented by training a pixel-wise semantic labelling engine (e.g. a decision forest or CNN) using manually annotated training images to identify pixels that lie on the central axis of a stalk.”).  
As per claim 17, Hirano, Ku, and Robertson disclose a  method according to claim 8, wherein said 2D segmenting and said projecting relates to confidence values with respect to said segment components being protruding portions and said determining of the main direction is based on determining a maximum of said confidence (Ku Para. [0099] discloses a confidence score for detection:” attribute can be an object occlusion score, a confidence score for a particular detection (e.g., output from the detector, calculated using the confidence scores and/or grasp outcomes over time, etc.), and/or any other suitable attribute.”).  
As per claim 18, Hirano, Ku, and Robertson disclose a  method according to claim 8, wherein the obtaining of said segmented voxel representation comprises performing clustering with respect to said projected one or more segment components (Robertson at Para. [0569] discloses a clustering of a data:” robotic fruit picking system in which a clustering algorithm aggregates the results of the semantic segmentation.”).
                                               Conclusion
 
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ELLIS B. RAMIREZ whose telephone number is (571)272-8920. The examiner can normally be reached 7:30 am to 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ramon Mercado can be reached at 571-270-5744. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ELLIS B. RAMIREZ/Examiner, Art Unit 3658
Read full office action
Prosecution Timeline

Sep 15, 2023
Application Filed
Jul 14, 2025
Non-Final Rejection — §103, §DP
Oct 16, 2025
Response Filed
Jan 08, 2026
Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/616,895
Patent 12600034
Compensation of Positional Tolerances in the Robot-assisted Surface Machining
2y 5m to grant Granted Apr 14, 2026
17/884,737
Patent 12584758
VEHICLE DISPLAY DEVICE, VEHICLE DISPLAY PROCESSING METHOD, AND NON-TRANSITORY STORAGE MEDIUM
2y 5m to grant Granted Mar 24, 2026
17/828,598
Patent 12571639
SYSTEM AND METHOD FOR IDENTIFYING TRIP PAIRS
2y 5m to grant Granted Mar 10, 2026
18/012,431
Patent 12551302
CONTROLLING A SURGICAL INSTRUMENT
2y 5m to grant Granted Feb 17, 2026
18/130,050
Patent 12552018
INTEGRATING ROBOTIC PROCESS AUTOMATIONS INTO OPERATING AND SOFTWARE SYSTEMS
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
80%
Grant Probability
99%
With Interview (+18.2%)
3y 3m
Median Time to Grant
Moderate
PTA Risk
Based on 194 resolved cases by this examiner. Grant probability derived from career allow rate.