Last updated: April 19, 2026

Application No. 18/366,079

AUTOMATIC BIN DETECTION FOR ROBOTIC APPLICATIONS

Final Rejection §103

Filed

Aug 07, 2023

Examiner

CHEIN, ALLEN C

Art Unit

3627

Tech Center

3600 — Transportation & Electronic Commerce

Assignee

Siemens Aktiengesellschaft

OA Round

2 (Final)

This examiner grants 44% of cases after interview

— +40.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 429 resolved cases, 2023–2026

Examiner Intelligence

CHEIN, ALLEN C View full profile →

Grants 44% of resolved cases

Career Allow Rate

189 granted / 429 resolved

-7.9% vs TC avg

Strong +40% interview lift

Without

With

+40.3%

Interview Lift

resolved cases with interview

Typical timeline

3y 6m

Avg Prosecution

39 currently pending

Career history

468

Total Applications

across all art units

Statute-Specific Performance

§101

28.3%

-11.7% vs TC avg

§103

47.9%

+7.9% vs TC avg

§102

7.8%

-32.2% vs TC avg

§112

14.5%

-25.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 429 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


DETAILED ACTION

Status of the Claims
In the event the determination of the status of the application


Response to Applicant Remarks

Applicant’s well-articulated remarks have been considered but are unpersuasive for the reasons below.  
Regarding the rejection under 35 USC 103, Applicant argues that the combination of Zadeh and Gajdosech does not disclose the invention.  (Applicant’s 1/30/26 remarks, p.3, “Gajdosech suggests a method for determining a pose of a bin using a structured light scanner device, particularly a Photoneo PhoXi (see p. 3, chapter 3, Gajdosech). Furthermore, Gajdosech explicitly states in that chapter that it relies on "2D single-view maps of 3D coordinates " Still further, Gajdosech teaches that its method is based on 3D point clouds which contain intrinsic parameters "as opposed to RGBD images", i.e. images from depth cameras (see p. 3, chapter 3, Gajdosech). As a result, Gajdosech teaches a method for determining the pose of the bin explicitly based on data that does not come with a depth image. That means, that Gajdosech discloses a self-contained teaching that may not be generalized and may not be easily transferred to different applications.
As a result, the combination of Zadeh and Gajdosech does not result in the subject-mattter of patent claim 1 as currently on file.
The Applicant respectfully takes the position that the Examiner's reasoning for a combination of Zadeh and Gajdosech falls short of the mark. Essentially, the Examiner argues that an ordinarily skilled person would combine these two references to avoid collisions. The Zadeh reference relates to a method for placing an object grasped by a robot into an end location, i.e. a bin (see [0003]-[0005], Zadeh). Gajdosech teaches a method for picking a mechanical part from a bin with a robotic arm (see p. 1, lefthand column, Gajdosech). The task of picking an item from a bin is not merely a reversal of putting an item into a bin. Thus, the Zadeh and Gajdosech references serve for solving different problems with Gajdosech teaching away from Zadeh. It is not obvious for an ordinarily skilled person to combine two documents that are related to solving different problems.”)  The examiner respectfully disagrees.
Zadeh does not explicitly require color imagery and is open ended as to what vision components may be employed.  (Zadeh, para 0044, “Vision component 184 generates images related to shape, color, depth, and/or other features of object(s) that are in the line of sight of the sensors. The vision component 184 can be, for example, a monographic camera (e.g., generating 2D RGB images), a stereographic camera (e.g., generating 2.5D RGB images), and/or a laser scanner (e.g., generating a 2.5D “point cloud”). It is understood that in many implementations, when simulator(s) 120 are additionally or alternatively utilized in performing placement attempts, the rendered images of the simulated data will be rendered to be of the same type as the images generated by the vision component 184. For example, both may be 2.5D RGBD images.”)  Likewise, Gajdosech discloses the use of a laser scanner.  (Gajdosech, section 3, “PhoXi scanner provides high-resolution 3D geometry data, but no RGB data, with a rough and noisy gray-scale intensity image being the closest equivalent…”).  
The examiner further does not concur that Gajdosech teaches away from Zadeh.  Although Gajdosech clearly describes one use case of bin pose detection to be preventing robot collision during an operation of removing an object from a bin, it is silent as to whether this would be useful in the opposite operation of inserting and object into a bin.  Common sense would appear to dictate that, as it would be beneficial to prevent a robot grasper from colliding with a bin during a removal operation, likewise it would be beneficial to prevent such a collision during an insertion into a bin.  Zadeh clearly discloses that it is necessary for a robot grasper to know the boundaries of a bin during and insertion operation.  (Zadeh, para 0060, “[0060] As one particular example, for a positive training example the target placement input can be a semantic identifier of the recycle bin 193, such as “recycle bin” (or “1”, “QX23” or other identifier of a recycle bin). As another particular example, the target placement input can be an image of the recycle bin 193 (or of a similar recycle bin). As yet another particular example, the target placement input can be a segmentation mask, bounding box, or other spatial identifier of a location of the target location in an image of the robot's environment (e.g., in an environmental image, as described above, that captures the robot's environment, including the recycle bin). For instance, the segmentation mask can have the same dimensions as the environmental image, but can include only a single channel with first values (e.g., “1s”) where the recycle bin (or at least an opening of the recycle bin) is present in the rendered image, and second values (e.g., “0s”) at all other locations. Alternative segmentation mask values/techniques can be utilized, such as techniques that have additional value(s) (e.g., value(s) between “0” and “1”) near the edges of the recycle bin (or at least an opening of the recycle bin), or techniques that include a first value (e.g., “1”) in only some locations where the recycle bin (or at least an opening of the recycle bin) is present in the environmental image (e.g., a “1” or other value in only a subset of (e.g., only one of) multiple pixels corresponding to the recycle bin in the rendered image). Also, for instance, a two-dimensional bounding box (or other shape) can be utilized that indicates the pixels that encompass all or portions of the recycle bin (or at least an opening of the recycle bin). The bounding box (or other shape) can be provided as an input that indicates the dimensions and position of the bounding box (or other shape) relative to the environmental image (e.g., an input that identifies a “center” pixel of the bounding box, and the size of the bounding box).”)
In response to applicant's argument that the examiner's conclusion of obviousness is based upon improper hindsight reasoning, it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning.  But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a reconstruction is proper.  See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971).  The examiner suggests that both Gajdosech and Zadeh clearly contemplate that it is necessary in the interaction of a robot with a bin for the robot to know the boundaries of the bin.  (See above).  Obviously, such knowledge is necessary to avoid a collision with the bin and/or to successfully deposit or remove an object to/from the bin.  Id.  


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.



Claims 1-10 are rejected under 35 U.S.C. 103 as being unpatentable over 
Zadeh 20200122321 in view of 
Gajdosech, “Towards Deep Learning-based 6D Bin Pose Estimation in 3D Scans”, 2022

Regarding Claim 1,
a robot defining an end effector configured to grasp a plurality of objects within a workspace, 
Zadeh is directed to a system for training a robot to place objects in an environment.  (Zadeh, abstract, “Training and/or use of a machine learning model for placement of an object secured by an end effector of a robot. A trained machine learning model can be used to process: (1) a current image, captured by a vision component of a robot, that captures an end effector securing an object; (2) a candidate end effector action that defines a candidate motion of the end effector; and (3) a target placement input that indicates a target placement location for the object.”)
a depth camera configured to capture a depth image of the workspace; 

(Zadeh, para 0044,”[0044] Example vision component 184 is also illustrated and, in FIG. 1, is mounted on a link of the robot 180. The pose of the vision component 184 therefore changes as the pose of that link moves. Further, the vision component 184 can also optionally independently adjust its pose relative to that link (e.g., pan and/or tilt). In other implementations, the vision component 184 may be coupled to another link of the robot and/or provided near the robot (but not coupled to the robot) and/or at a fixed pose relative to the base or other stationary reference point of robot 180. Vision component 184 generates images related to shape, color, depth, and/or other features of object(s) that are in the line of sight of the sensors.”)

one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the autonomous system to, during the runtime: 
detect a bin within the workspace, 
the bin capable of containing one or more of the plurality of objects; and 
based on the depth image, determine a pose of the bin, 
Zadeh discloses that the system may visually detect bin(s) and determine pose to properly place objects.  (Zadeh, para 0009-11, “[0009] As mentioned above, the target placement input can include a semantic indication in various implementations. In additional or alternative implementations, the target placement input can additionally or alternatively include: an image that is similar to the placement location (e.g., an image of a recycle bin); a segmentation mask, bounding box, or other spatial identifier of a location of the target location in an image of the robot's environment (e.g., determined based on processing the image of the environment to detect the target location).
[0010] Various implementations can train the machine learning model based on data from real and/or simulated placement attempts where corresponding real or simulated robots move a secured object (e.g., randomly for a fixed time period), then release the object. Those placement attempts that lead to the object being placed in a target location can be used to generate positive training example labels for corresponding training examples having training example input with a target placement input that corresponds to the target location (and can also optionally be used to generate negative training example labels for corresponding training examples having training example input with a different target placement input that does not correspond to the target location). Those placement attempts that lead to the object not being placed in any target location can be used to generate negative training example labels for all corresponding training examples. Each training example can include training example input with: a “current image” from a corresponding instance of time of the training example; a candidate end effector action that defines movement from a “current pose” at the corresponding instance of time to a “final pose” at a final instance of time of the placement attempt; and a corresponding target placement input. Each training example can include a labeled training example output that indicates whether the placement of the object was in the target location indicated by the target placement input of the training example input. Human labeling and/or automated labeling (e.g., for simulated training examples) can be utilized.
[0011] It is noted that the placement attempts can be performed utilizing various target locations, various poses for the target locations, various environments, and various secured objects. For example, a trash bin can be in a first pose in some placement attempts, in a second pose in other placement attempts, in a third pose in others, etc. In these and other manners, the machine learning model can be trained to be robust and enable placement in a target location in a variety of environments. Moreover, the machine learning model can be trained to enable placement in any of a variety of target locations (e.g., trash bins, recycle bins, compost bins, on a shelf, beside a plate, on the floor).”)

Zadeh does not explicitly disclose
the pose defining an orientation of the bin within the workspace.
Gajdosech is directed to a system for estimating bin pose in 3D scans.  (Gajdosech, abstract, section 4.2.1, “’The pose of the bin can be parameterized using a rotation matrix R ∈ SO(3) and a translation vector t ∈ R3.
We represent the translation vector directly. To represent rotation, we opt to use a strategy similar to (Zhou et al., 2019) and represent the rotation by using two
vectors from R3 which can be used to determine the
rotation matrix uniquely except for degenerate cases
discussed later. The two vectors represent the orientation of the z and y axes of the bin in the camera coordinates. We denote these vectors as~vz and~vy, respectively.”)  It would have been obvious to one of ordinary skill in the art before the filing date of the invention to combine Zadeh with the orientation determination of Gajdosech with the motivation of collision avoidance.  (Gajdosech, introduction, “Capturing a scene with 3D scanners is a standard for
automatized systems analyzing a scene. To pick mechanical parts from a bin by a robotic arm equipped
with a gripper, the parts need to be localized. First,
the localization of bin is essential to restrain the robot
from collisions. Then, the kinematics of the robot is
optimized for path planning. The problem of bin localization can be defined as a 6 DoF pose estimation
of a template 3D model of the bin in the 3D scan”;  see also Zadeh, background, “For example, many programmed robots may fail in dynamic environments and/or may fail in varying environments. For instance, in the preceding example where the robot is programmed to place the grasped object in the bin based on the bin being in the preprogrammed fixed location, the robot will fail to place the grasped object in the bin if the bin has been moved to a different location that is not the preprogrammed fixed location. Also, for instance, if the robot is placed in a new environment where the bin is in a different location, the robot will not adapt to the new environment without explicit user programming.”)

Regarding Claim 2, Zadeh and Gajdosech disclose the system of claim 1.

Zadeh does not explicitly disclose
wherein the bin defines a bottom end and a top end opposite the bottom end along a transverse direction, the bottom end positioned farther from the depth camera along the transverse direction as compared to the top end, the top end defining an opening and sides of the bin around the opening such that the depth camera is further configured to capture the depth image of the bin from a perspective along the transverse direction.
Gajdosech discloses a top/bottom transverse view (e.g. top town).  (Gajdosech, section 4.1, “An analytical algorithm for pose estimation is composed of a set of steps performed sequentially in the
pipeline. This four-step method assumes that the top
edges of the bin are closer to the camera than background objects, and at least a part of every top edge
can be seen”)
It would have been obvious to one of ordinary skill in the art before the filing date of the invention to combine Zadeh with the orientation determination of Gajdosech with the motivation of collision avoidance.  (Gajdosech, introduction, “Capturing a scene with 3D scanners is a standard for
automatized systems analyzing a scene. To pick mechanical parts from a bin by a robotic arm equipped
with a gripper, the parts need to be localized. First,
the localization of bin is essential to restrain the robot
from collisions. Then, the kinematics of the robot is
optimized for path planning. The problem of bin localization can be defined as a 6 DoF pose estimation
of a template 3D model of the bin in the 3D scan”)

Regarding Claim 3, Zadeh and Gajdosech disclose the system of claim 2.
the memory further storing instructions that, when executed by the one or more processors, cause the autonomous system to, during the runtime: based on the depth image, generate a segmentation mask of the bin, the segmentation mask defining pixels representative of the sides of the bin at the top end.
(Zadeh, para 0060, “[0060] As one particular example, for a positive training example the target placement input can be a semantic identifier of the recycle bin 193, such as “recycle bin” (or “1”, “QX23” or other identifier of a recycle bin). As another particular example, the target placement input can be an image of the recycle bin 193 (or of a similar recycle bin). As yet another particular example, the target placement input can be a segmentation mask, bounding box, or other spatial identifier of a location of the target location in an image of the robot's environment (e.g., in an environmental image, as described above, that captures the robot's environment, including the recycle bin). For instance, the segmentation mask can have the same dimensions as the environmental image, but can include only a single channel with first values (e.g., “1s”) where the recycle bin (or at least an opening of the recycle bin) is present in the rendered image, and second values (e.g., “0s”) at all other locations.”)

Regarding Claim 4, Zadeh and Gajdosech disclose the system of claim 3.
the memory further storing instructions that, when executed by the one or more processors, cause the autonomous system to, during the runtime: 
scan the segmentation mask … so as to identify a plurality of points on outermost edges of the segmentation mask.
See prior art rejection of claim 3.

Zadeh does not explicitly disclose
along a first direction and a second direction substantially perpendicular to the first direction
However, Gajdosech discloses that the bin may be characterized by computing perpendicular edges.  (Gajdosech, section 4.1, “Figure 2: (From left to right) the camera space is row-wise
and column-wise segmented into similar depth intervals,
from which horizontal and vertical bin-cuts are constructed.
A plane is fitted into the bin-cuts, and wall-cuts not corresponding to this plane are discarded as outliers. The remaining wall-cuts are assigned to four bin walls according
to corners fitted into horizontal and vertical bin-cuts. Finally, the lines are fitted into categorized wall-cuts, which
define the bin basis”) 
 It would have been obvious to one of ordinary skill in the art before the filing date of the invention to combine Zadeh with the orientation determination of Gajdosech with the motivation of collision avoidance.  (Gajdosech, introduction, “Capturing a scene with 3D scanners is a standard for
automatized systems analyzing a scene. To pick mechanical parts from a bin by a robotic arm equipped
with a gripper, the parts need to be localized. First,
the localization of bin is essential to restrain the robot
from collisions. Then, the kinematics of the robot is
optimized for path planning. The problem of bin localization can be defined as a 6 DoF pose estimation
of a template 3D model of the bin in the 3D scan”)

Regarding Claim 5, Zadeh and Gajdosech disclose the system of claim 4.
the memory further storing instructions that, when executed by the one or more processors, cause the autonomous system to, during the runtime:
fit a plurality of models to a boundary defined by the plurality of points, so as to determine the pose of the bin.
(Zadeh, para 0053, “Through varying of placement locations and/or environmental objects, diverse training examples can be generated that enable training of a placement model 150 that can be utilized in any of a variety of environments for successful placing of an object and/or the can be utilized to place an object in any of a variety of placement locations.”)

Regarding Claim 6,7,8,9,10
See prior art rejection of claims 1,2,3,4,5



Conclusion

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALLEN C CHEIN whose telephone number is (571)270-7985. The examiner can normally be reached Monday-Friday 8am -5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Florian Zeender can be reached at (571) 272-6790. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALLEN C CHEIN/Primary Examiner, Art Unit 3627

Read full office action

Prosecution Timeline

Aug 07, 2023

Application Filed

Nov 17, 2025

Non-Final Rejection — §103

Jan 30, 2026

Response Filed

Feb 20, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/019,058

Patent 12586084

DATA ANALYTICS TOOL

2y 5m to grant Granted Mar 24, 2026

18/136,513

Patent 12579512

OPTIMIZATION OF ITEM AVAILABILITY PROMPTS IN THE CONTEXT OF NON-DETERMINISTIC INVENTORY DATA

2y 5m to grant Granted Mar 17, 2026

18/201,627

Patent 12579513

DYNAMIC PRODUCTION BILL OF MATERIALS SYSTEM

2y 5m to grant Granted Mar 17, 2026

18/107,436

Patent 12572942

Intelligent Management of Authorization Requests

2y 5m to grant Granted Mar 10, 2026

18/495,786

Patent 12572918

COMMODITY REGISTRATION SYSTEM

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

44%

Grant Probability

84%

With Interview (+40.3%)

3y 6m

Median Time to Grant

Moderate

PTA Risk

Based on 429 resolved cases by this examiner. Grant probability derived from career allow rate.