Last updated: April 19, 2026
Application No. 18/368,927
MULTI-TASK GRASPING

Final Rejection §103
Filed
Sep 15, 2023
Examiner
KASPER, BYRON XAVIER
Art Unit
3657
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Nvidia Corporation
OA Round
2 (Final)
Interview Optional

— +18.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 103 resolved cases, 2023–2026
Examiner Intelligence

KASPER, BYRON XAVIER View full profile →
Grants 70% — above average
Career Allow Rate
72 granted / 103 resolved
+17.9% vs TC avg
Strong +18% interview lift
Without
With
+18.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
36 currently pending
Career history
139
Total Applications
across all art units
Statute-Specific Performance

§101
10.9%
-29.1% vs TC avg
§103
56.3%
+16.3% vs TC avg
§102
11.9%
-28.1% vs TC avg
§112
16.4%
-23.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 103 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2. This communication is responsive to Application No. 18/368,927 and the amendments filed on 11/4/2025.
3. Claims 1-20 are presented for examination.


Information Disclosure Statement
4. The information disclosure statements (IDS) submitted on 9/27/2023, 6/4/2025, and 9/18/2025 have been fully considered by the Examiner.


Response to Arguments
5. Applicant’s arguments, see pages 7-8, filed 11/4/2025, with respect to the rejection of claims 1-6, 9-14, and 17-20 under 35 U.S.C. 101 have been fully considered and are persuasive.  The rejection of claims 1-6, 9-14, and 17-20 under 35 U.S.C. 101 of 6/5/2025 has been withdrawn. 

6. Applicant’s arguments with respect to the rejection of claim(s) 1-20 under 35 U.S.C. 102 and/or 35 U.S.C. 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Regarding independent claim 1, the Examiner agrees that US 20220203547 A1 to Majumdar fails to teach all of amendments to the claim. However, in light of the amendments and the Applicant’s remarks, an updated search was conducted, and a new ground of rejection concerning claim 1 has been determined, in which will be described later.
Regarding independent claims 9 and 17, as these claims contain similar limitations to claim 1, are still rejected for similar reasons as claim 1 is, in which will be described later.
Regarding dependent claims 2-8, 10-16, and 18-20, as all of these claims either depend from claims 1, 9, or 17, are still rejected, in which will be described later.


Claim Rejections - 35 USC § 103
7. In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
8. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

9. Claim(s) 1, 4, 7, 8, 9, 12, 15, 16, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Majumdar et al. (US 20220203547 A1 hereinafter Majumdar) in view of Noda et al. (US 20250353173 A1 hereinafter Noda) and Chen et al. (US 20230202774 A1 hereinafter Chen).

Regarding Claim 1, Majumdar teaches a computer-implemented method ([0074] via “According to specific embodiments, at least some of the features or functionalities of the various embodiments disclosed herein may be implemented on one or more general-purpose computers associated with one or more networks, such as for example an end-user computer system, a client computer, a network server or other server system, a mobile computing device …, or any combination thereof.”) comprising: 
accessing information indicating a position of one or more objects in an environment, the information represented using one or more point clouds associated with the one or more objects ([0061] via “A variety of different methodologies may be used to identify the target portion(s) of objects by processing the 3D data, …. In general, this may comprise defining a metric in 3D space, applying the metric to a 3D point cloud, and accepting or rejecting points (and their corresponding objects) based on the metric value. For example, to identify a top layer of objects, a 3D point cloud representing the pick area and a pile of objects may be analyzed to identify, for each of a plurality of 2D locations, the highest point in the 3D cloud at the corresponding 2D location and identify the corresponding object(s) associated with this point.”); 
determining a grasp of the one or more objects ([0059] via “At step 304, the process comprises computing a pick plan based on at least one of the identifying objects step 302
(e.g. pick shapes) and at least one feature of the determining features step 303. A computed pick plan may comprise at least one of a pick sequence or order in which each object will be picked, instructions or pick coordinates for each pick, and end effector controls associated with each planned pick.”); 
determining a placement of the one or more objects ([0057] via “For example, pick objects may be further classified based on their determined placement location such as a first group of pick objects to be placed at a first location, a second group of pick objects to be placed at a second location, and so on.”); and 
causing an autonomous robot to manipulate the one or more objects based on the grasp of the one or more objects and the placement of the one or more objects ([0031] via “The robotic picking unit 114 may pick objects from one portion (e.g. a pallet) of a pick area
102 and place them at another portion (e.g. a conveyor) of the pick area 102. The robotic picking unit 114 may comprise a robotic arm and an end effector 124 attached to the robotic arm.”).
Majumdar is silent on determining a grasp of the one or more objects, using one or more neural networks, to generate a contact mask based, at least in part, on the information indicating the position of the one or more objects; and determining a placement of the one or more objects, using the one or more neural networks, to generate a placement mask based, at least in part, on the information indicating the position of the one or more objects.
However, Noda teaches determining a grasp of the one or more objects, using one or more neural networks, to generate a contact mask based, at least in part, on the information indicating the position of the one or more objects ([0048] via “The point cloud generation unit
101 generates point cloud data of the target object 2 based on a sensing result from the range sensor 12. … Accordingly, by plotting a point corresponding to each pixel of the RGB image on the three-dimensional space, the point cloud generation unit 101 can generate point cloud data of the target object 2 included in the RGB image.”), ([0050] via “The position/attitude estimation unit 102 takes a point included in the point cloud data of the target object 2 as a contact point, and estimates candidates for the position and attitude of the hand 11 that grips the target object 2 for each contact point. Specifically, the position/attitude estimation unit 102 may use a machine learning model such as a deep neural network (DNN) to estimate the candidates for the position and attitude of the hand 11 that grips the target object 2 for each contact point.”), (Note: The Examiner interprets the contact point of Noda as the contact mask, as the contact mask is described within paragraphs [0085], [0086], and [0095] of the specification of the instant application.).
Further, Chen teaches determining a placement of the one or more objects, using the one or more neural networks, to generate a placement mask based, at least in part, on the information indicating the position of the one or more objects ([0030] via “In sub-step 322, the processing unit 13 controls the second depth camera 122 to capture an image of the accommodation space 20 to obtain a space 3D point cloud generated by the second depth camera 122. Since the second depth camera 122 is disposed to capture the image of the accommodation space 20 from above the container 2 as shown in FIG. 2, the space 3D point cloud may present the container 2, the accommodation space 20, and any existing object 4
that is placed in the accommodation space 20 (the existing objects 4A, 4B in the case of FIG.
2) in an. aerial view, but this disclosure is not limited in this respect.”), ([0036] via “It is noted that, in this embodiment, the processing unit 13 may use a neural network technology to identify each existing object 4 from the space 3D point cloud, so a number of the space occupation data piece (s) generated by the processing unit 13 will be equal to a number of the existing object(s) 4.”), ([0041] via “The flow goes to step S6 when the processing unit 13
determines that the first cross section L1 as indicated by the first cross-section status data piece satisfies the first accommodation condition, and goes to step S7 when otherwise. It is noted that the first cross-section status data piece can be understood as a 2D image. So one may understand step S5 as the processing unit 13 simplifying the space 3D point cloud into a 2D image, so as to use the 2D image, rather than the entire space 3D point cloud that contains a huge amount of data, in determining whether the remaining, unoccupied space in the container 2 is sufficient for accommodating the to-be-packed object 3.”), ([0042] via “In step S6, the processing unit 13 controls the holding unit 11 to place the to-be-packed object 3
into the accommodation space 20 at a position corresponding to the unoccupied area of the first cross section L1.”), (Note: The Examiner interprets the determined placement location of the to-be-packed object into the accommodation space as the placement mask, as the placement mask is described within paragraphs [0071], [0086], and [0095] of the specification of the instant application.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Noda wherein the computer-implemented method comprises: determining a grasp of the one or more objects, using one or more neural networks, to generate a contact mask based, at least in part, on the information indicating the position of the one or more objects. Doing so increases the stability of the object shape to be grasped by a robot, a stated by Noda ([0087] via “According to the configuration described above, the information processing device 100 according to the present embodiment can estimate the shape of the target object 2 based on a distribution of the candidates for the position and attitude of the hand 11 estimated from sensing result for the target object 2 from the range sensor 12. Through this, by treating the estimated candidates for the position and attitude of the hand 11 as a distribution, the information processing device 100 can average out fluctuations or instability arising in the individual estimations, and estimate the shape of the target object 2 in a more stable manner.”).
In addition, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Chen wherein the computer-implemented method comprises: determining a placement of the one or more objects, using the one or more neural networks, to generate a placement mask based, at least in part, on the information indicating the position of the one or more objects. Doing so defines and places objects at specific placement locations relative to other objects in the space, such that placed objects are not placed on already-placed objects, as stated above by Chen in paragraph [0042].

Regarding Claim 4, modified reference Majumdar teaches the computer-implemented method of claim 1, wherein the information indicating the position of the one or more objects in the environment comprises one or more portions of image data ([0032] via “By way of example and not limitation, the data acquisition system 112 may include a two dimensional (2D) camera system and/or three dimensional (3D) camera system that is configured to capture data associated with at least one of the pick portion(s), the placement portion(s), and objects in the pick area (including pickable or movable objects and fixed or stationary objects). The data acquisition system 112 may comprise at least one of a three dimensional depth sensor, an RGB-D camera, a time of flight camera, a light detection and ranging sensor, a stereo camera, a structured light camera, and a two dimensional image sensor.”), ([0034] via “The vision system 106 may apply at least one algorithm to the pick area data in order to transform or extract from the pick area data, object data which can be used for computing a pick plan. For example, object data may be determined by applying an object detection algorithm to the pick area data in order to … determine features associated with each object that may aid in performing pick planning. … Object features generally comprise aspects associated with object location, object size or dimensions, and object appearance such as color, patterns, texture, etc.”).

Regarding Claim 7, modified reference Majumdar teaches the computer-implemented method of claim 1, further comprising performing the grasp of the one or more objects in the environment using the one or more autonomous robots ([0031] via “The robotic picking unit 114 may pick objects from one portion (e.g. a pallet) of a pick area 102 and place them at another portion (e.g. a conveyor) of the pick area 102. The robotic picking unit 114 may comprise a robotic arm and an end effector 124 attached to the robotic arm.”).

Regarding Claim 8, modified reference Majumdar teaches the computer-implemented method of claim 1, further comprising performing the placement of the one or more objects in the environment using the one or more autonomous robots ([0031] via “The robotic picking unit 114 may pick objects from one portion (e.g. a pallet) of a pick area 102 and place them at another portion (e.g. a conveyor) of the pick area 102. The robotic picking unit 114 may comprise a robotic arm and an end effector 124 attached to the robotic arm.”).

Regarding Claim 9, Majumdar teaches a non-transitory computer readable storage medium storing thereon executable instructions that ([0082] via “Because such information and program instructions may be employed to implement one or more systems or methods described herein, at least some network device embodiments may include nontransitory machine-readable storage media, which, for example, may be configured or designed to store program instructions, state information, and the like for performing various operations described herein.”), as a result of being executed by one or more processors of a computer system ([0083] via “Computing device 20 includes processors 21 that may run software that carry out one or more functions or applications of embodiments, such as for example a client application 24.”), cause the computer system to: 
access information indicating a position of one or more objects in an environment, the information represented using one or more point clouds associated with the one or more objects ([0061] via “A variety of different methodologies may be used to identify the target portion(s) of objects by processing the 3D data, …. In general, this may comprise defining a metric in 3D space, applying the metric to a 3D point cloud, and accepting or rejecting points (and their corresponding objects) based on the metric value. For example, to identify a top layer of objects, a 3D point cloud representing the pick area and a pile of objects may be analyzed to identify, for each of a plurality of 2D locations, the highest point in the 3D cloud at the corresponding 2D location and identify the corresponding object(s) associated with this point.”); 
determine a grasp of the one or more objects ([0059] via “At step 304, the process comprises computing a pick plan based on at least one of the identifying objects step 302
(e.g. pick shapes) and at least one feature of the determining features step 303. A computed pick plan may comprise at least one of a pick sequence or order in which each object will be picked, instructions or pick coordinates for each pick, and end effector controls associated with each planned pick.”); 
determine a placement of the one or more objects ([0057] via “For example, pick objects may be further classified based on their determined placement location such as a first group of pick objects to be placed at a first location, a second group of pick objects to be placed at a second location, and so on.”); and 
cause an autonomous machine to manipulate the one or more objects based on the grasp of the one or more objects and the placement of the one or more objects ([0031] via “The robotic picking unit 114 may pick objects from one portion (e.g. a pallet) of a pick area
102 and place them at another portion (e.g. a conveyor) of the pick area 102. The robotic picking unit 114 may comprise a robotic arm and an end effector 124 attached to the robotic arm.”).
Majumdar is silent on to determine a grasp of the one or more objects, using one or more neural networks, to generate a contact mask based, at least in part, on the information indicating the position of the one or more objects; and determine a placement of the one or more objects, using the one or more neural networks, to generate a placement mask based, at least in part, on the information indicating the position of the one or more objects.
However, Noda teaches to determine a grasp of the one or more objects, using one or more neural networks, to generate a contact mask based, at least in part, on the information indicating the position of the one or more objects ([0048] via “The point cloud generation unit
101 generates point cloud data of the target object 2 based on a sensing result from the range sensor 12. … Accordingly, by plotting a point corresponding to each pixel of the RGB image on the three-dimensional space, the point cloud generation unit 101 can generate point cloud data of the target object 2 included in the RGB image.”), ([0050] via “The position/attitude estimation unit 102 takes a point included in the point cloud data of the target object 2 as a contact point, and estimates candidates for the position and attitude of the hand 11 that grips the target object 2 for each contact point. Specifically, the position/attitude estimation unit 102 may use a machine learning model such as a deep neural network (DNN) to estimate the candidates for the position and attitude of the hand 11 that grips the target object 2 for each contact point.”), (Note: The Examiner interprets the contact point of Noda as the contact mask, as the contact mask is described within paragraphs [0085], [0086], and [0095] of the specification of the instant application.).
Further, Chen teaches to determine a placement of the one or more objects, using the one or more neural networks, to generate a placement mask based, at least in part, on the information indicating the position of the one or more objects ([0030] via “In sub-step 322, the processing unit 13 controls the second depth camera 122 to capture an image of the accommodation space 20 to obtain a space 3D point cloud generated by the second depth camera 122. Since the second depth camera 122 is disposed to capture the image of the accommodation space 20 from above the container 2 as shown in FIG. 2, the space 3D point cloud may present the container 2, the accommodation space 20, and any existing object 4
that is placed in the accommodation space 20 (the existing objects 4A, 4B in the case of FIG.
2) in an. aerial view, but this disclosure is not limited in this respect.”), ([0036] via “It is noted that, in this embodiment, the processing unit 13 may use a neural network technology to identify each existing object 4 from the space 3D point cloud, so a number of the space occupation data piece (s) generated by the processing unit 13 will be equal to a number of the existing object(s) 4.”), ([0041] via “The flow goes to step S6 when the processing unit 13
determines that the first cross section L1 as indicated by the first cross-section status data piece satisfies the first accommodation condition, and goes to step S7 when otherwise. It is noted that the first cross-section status data piece can be understood as a 2D image. So one may understand step S5 as the processing unit 13 simplifying the space 3D point cloud into a 2D image, so as to use the 2D image, rather than the entire space 3D point cloud that contains a huge amount of data, in determining whether the remaining, unoccupied space in the container 2 is sufficient for accommodating the to-be-packed object 3.”), ([0042] via “In step S6, the processing unit 13 controls the holding unit 11 to place the to-be-packed object 3
into the accommodation space 20 at a position corresponding to the unoccupied area of the first cross section L1.”), (Note: The Examiner interprets the determined placement location of the to-be-packed object into the accommodation space as the placement mask, as the placement mask is described within paragraphs [0071], [0086], and [0095] of the specification of the instant application.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Noda wherein the computer system is caused to: determine a grasp of the one or more objects, using one or more neural networks, to generate a contact mask based, at least in part, on the information indicating the position of the one or more objects. Doing so increases the stability of the object shape to be grasped by a robot, a stated by Noda ([0087] via “According to the configuration described above, the information processing device 100 according to the present embodiment can estimate the shape of the target object 2 based on a distribution of the candidates for the position and attitude of the hand 11 estimated from sensing result for the target object 2 from the range sensor 12. Through this, by treating the estimated candidates for the position and attitude of the hand 11 as a distribution, the information processing device 100 can average out fluctuations or instability arising in the individual estimations, and estimate the shape of the target object 2 in a more stable manner.”).
In addition, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Chen wherein the computer system is caused to: determine a placement of the one or more objects, using the one or more neural networks, to generate a placement mask based, at least in part, on the information indicating the position of the one or more objects. Doing so defines and places objects at specific placement locations relative to other objects in the space, such that placed objects are not placed on already-placed objects, as stated above by Chen in paragraph [0042].

Regarding Claim 12, modified reference Majumdar teaches the non-transitory computer readable storage medium of claim 9, wherein the information indicating the position of the one or more objects in the environment comprises one or more portions of image data ([0032] via “By way of example and not limitation, the data acquisition system 112 may include a two dimensional (2D) camera system and/or three dimensional (3D) camera system that is configured to capture data associated with at least one of the pick portion(s), the placement portion(s), and objects in the pick area (including pickable or movable objects and fixed or stationary objects). The data acquisition system 112 may comprise at least one of a three dimensional depth sensor, an RGB-D camera, a time of flight camera, a light detection and ranging sensor, a stereo camera, a structured light camera, and a two dimensional image sensor.”), ([0034] via “The vision system 106 may apply at least one algorithm to the pick area data in order to transform or extract from the pick area data, object data which can be used for computing a pick plan. For example, object data may be determined by applying an object detection algorithm to the pick area data in order to … determine features associated with each object that may aid in performing pick planning. … Object features generally comprise aspects associated with object location, object size or dimensions, and object appearance such as color, patterns, texture, etc.”).

Regarding Claim 15, modified reference Majumdar teaches the non-transitory computer readable storage medium of claim 9, wherein the computer system is to further perform the grasp of the one or more objects in the environment using the one or more autonomous machines ([0031] via “The robotic picking unit 114 may pick objects from one portion (e.g. a pallet) of a pick area 102 and place them at another portion (e.g. a conveyor) of the pick area 102. The robotic picking unit 114 may comprise a robotic arm and an end effector 124 attached to the robotic arm.”).

Regarding Claim 16, modified reference Majumdar teaches the non-transitory computer readable storage medium of claim 9, wherein the computer system is to further perform the placement of the one or more objects in the environment using the one or more autonomous machines ([0031] via “The robotic picking unit 114 may pick objects from one portion (e.g. a pallet) of a pick area 102 and place them at another portion (e.g. a conveyor) of the pick area 102. The robotic picking unit 114 may comprise a robotic arm and an end effector 124 attached to the robotic arm.”).

Regarding Claim 17, Majumdar teaches a system comprising: one or more processors ([0083] via “Computing device 20 includes processors 21 that may run software that carry out one or more functions or applications of embodiments, such as for example a client application 24.”)  to: 
access one or more point clouds indicating one or more objects in an environment ([0061] via “A variety of different methodologies may be used to identify the target portion(s) of objects by processing the 3D data, …. In general, this may comprise defining a metric in 3D space, applying the metric to a 3D point cloud, and accepting or rejecting points (and their corresponding objects) based on the metric value. For example, to identify a top layer of objects, a 3D point cloud representing the pick area and a pile of objects may be analyzed to identify, for each of a plurality of 2D locations, the highest point in the 3D cloud at the corresponding 2D location and identify the corresponding object(s) associated with this point.”); 
determine a grasp of the one or more objects ([0059] via “At step 304, the process comprises computing a pick plan based on at least one of the identifying objects step 302
(e.g. pick shapes) and at least one feature of the determining features step 303. A computed pick plan may comprise at least one of a pick sequence or order in which each object will be picked, instructions or pick coordinates for each pick, and end effector controls associated with each planned pick.”); 
determine a placement of the one or more objects ([0057] via “For example, pick objects may be further classified based on their determined placement location such as a first group of pick objects to be placed at a first location, a second group of pick objects to be placed at a second location, and so on.”); and 
cause an autonomous robot to manipulate the one or more objects based on the grasp of the one or more objects and the placement of the one or more objects ([0031] via “The robotic picking unit 114 may pick objects from one portion (e.g. a pallet) of a pick area
102 and place them at another portion (e.g. a conveyor) of the pick area 102. The robotic picking unit 114 may comprise a robotic arm and an end effector 124 attached to the robotic arm.”).
Majumdar is silent on to determine a grasp of the one or more objects, using one or more neural networks, to generate a contact mask based, at least in part, on the one or more point clouds; and determine a placement of the one or more objects, using the one or more neural networks, to generate a placement mask based, at least in part, on the one or more point clouds.
However, Noda teaches to determine a grasp of the one or more objects, using one or more neural networks, to generate a contact mask based, at least in part, on the one or more point clouds ([0048] via “The point cloud generation unit 101 generates point cloud data of the target object 2 based on a sensing result from the range sensor 12. … Accordingly, by plotting a point corresponding to each pixel of the RGB image on the three-dimensional space, the point cloud generation unit 101 can generate point cloud data of the target object 2 included in the RGB image.”), ([0050] via “The position/attitude estimation unit 102
takes a point included in the point cloud data of the target object 2 as a contact point, and estimates candidates for the position and attitude of the hand 11 that grips the target object 2 for each contact point. Specifically, the position/attitude estimation unit 102 may use a machine learning model such as a deep neural network (DNN) to estimate the candidates for the position and attitude of the hand 11 that grips the target object 2 for each contact point.”), (Note: The Examiner interprets the contact point of Noda as the contact mask, as the contact mask is described within paragraphs [0085], [0086], and [0095] of the specification of the instant application.).
Further, Chen teaches to determine a placement of the one or more objects, using the one or more neural networks, to generate a placement mask based, at least in part, on the one or more point clouds ([0030] via “In sub-step 322, the processing unit 13 controls the second depth camera 122 to capture an image of the accommodation space 20 to obtain a space 3D point cloud generated by the second depth camera 122. Since the second depth camera 122 is disposed to capture the image of the accommodation space 20 from above the container 2 as shown in FIG. 2, the space 3D point cloud may present the container 2, the accommodation space 20, and any existing object 4 that is placed in the accommodation space 20 (the existing objects 4A, 4B in the case of FIG. 2) in an. aerial view, but this disclosure is not limited in this respect.”), ([0036] via “It is noted that, in this embodiment, the processing unit 13 may use a neural network technology to identify each existing object 4 from the space 3D point cloud, so a number of the space occupation data piece (s) generated by the processing unit 13 will be equal to a number of the existing object(s) 4.”), ([0041] via “The flow goes to step S6 when the processing unit 13 determines that the first cross section L1 as indicated by the first cross-section status data piece satisfies the first accommodation condition, and goes to step S7
when otherwise. It is noted that the first cross-section status data piece can be understood as a 2D image. So one may understand step S5 as the processing unit 13 simplifying the space 3D point cloud into a 2D image, so as to use the 2D image, rather than the entire space 3D point cloud that contains a huge amount of data, in determining whether the remaining, unoccupied space in the container 2 is sufficient for accommodating the to-be-packed object 3.”), ([0042] via “In step S6, the processing unit 13 controls the holding unit 11 to place the to-be-packed object 3 into the accommodation space 20 at a position corresponding to the unoccupied area of the first cross section L1.”), (Note: The Examiner interprets the determined placement location of the to-be-packed object into the accommodation space as the placement mask, as the placement mask is described within paragraphs [0071], [0086], and [0095] of the specification of the instant application.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Noda wherein the one or more processors determine a grasp of the one or more objects, using one or more neural networks, to generate a contact mask based, at least in part, on the one or more point clouds. Doing so increases the stability of the object shape to be grasped by a robot, a stated by Noda ([0087] via “According to the configuration described above, the information processing device 100
according to the present embodiment can estimate the shape of the target object 2 based on a distribution of the candidates for the position and attitude of the hand 11 estimated from sensing result for the target object 2 from the range sensor 12. Through this, by treating the estimated candidates for the position and attitude of the hand 11 as a distribution, the information processing device 100 can average out fluctuations or instability arising in the individual estimations, and estimate the shape of the target object 2 in a more stable manner.”).
In addition it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Chen wherein the one or more processors determine a placement of the one or more objects, using the one or more neural networks, to generate a placement mask based, at least in part, on the one or more point clouds. Doing so defines and places objects at specific placement locations relative to other objects in the space, such that placed objects are not placed on already-placed objects, as stated above by Chen in paragraph [0042].


10. Claim(s) 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Majumdar et al. (US 20220203547 A1 hereinafter Majumdar) in view of Noda et al. (US 20250353173 A1 hereinafter Noda) and Chen et al. (US 20230202774 A1 hereinafter Chen), and further in view of Zizka et al. (US 20230415345 A1 hereinafter Zizka).

Regarding Claim 2, modified reference Majumdar teaches the computer-implemented method of claim 1, but is silent on wherein determining the grasp of the one or more objects is based on: one or more feature maps generated using the one or more point clouds; and one or more grasp task embeddings.
However, Zizka teaches wherein determining the grasp of the one or more objects is based on: one or more feature maps generated using the one or more point clouds ([0096] via “In another example, the apparatus 100 can receive a command from the master server to move an item from the box 120a to the box 120b, which can be carried out as shown in FIGS. 2A-2D. Upon receipt of the command, the apparatus 100 can acquire texture frame/information and point cloud information from its camera 160, feed the point cloud data (e.g., in a form of a depth map) and texture information to a neural network that can recognize potentially multiple instances of the item to be picked. The location(s) of the found items are then evaluated, for example, by checking whether they are accessible by the gripper 150 and the picker robot 125 without any collisions, …”); and 
one or more grasp task embeddings ([0140] via “In some cases, the apparatus 500 can include an interface (e.g., a touchscreen) coupled to a processor of the apparatus 500 and/or to the master server to present order information (e.g., number of items to be picked for each order) to the operator to guide the operator when picking items for the boxes 520b and
520c.”), (Note: The Examiner interprets the grasp task embeddings as a maximum number of objects for the robot to grasp, as this term is defined in paragraph [0082] of the specification of the instant application. In the context of Zizka, the Examiner interprets the number of items to be picked as a maximum number of items for the robot to pick for the order, as the robot is not supposed to go over that amount.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Zizka wherein determining the grasp of the one or more objects is based on: one or more feature maps generated using the one or more point clouds; and one or more grasp task embeddings. Doing so allows the robot to locate objects and place them into the appropriate placement location based on given order information, as stated by Zizka ([0101] via “In yet another example of a typical operation that can be executed by the apparatus 100, the apparatus 100 can load a source box (e.g., the box
120a) based on an order received from the master server, and pick one or more items from the source box into a destination box (e.g., the box 120b). The apparatus 100 can then place the source box back on a shelf of the fulfillment center and move to the location of another source box that contains items for the order. The foregoing process can then be repeated until all the items are picked for the order.”).


11. Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Majumdar et al. (US 20220203547 A1 hereinafter Majumdar) in view of Noda et al. (US 20250353173 A1 hereinafter Noda) and Chen et al. (US 20230202774 A1 hereinafter Chen), and further in view of Yu et al. (US 20230041343 A1 hereinafter Yu) and Sisbot et al. (US 9469028 B2 hereinafter Sisbot).

Regarding Claim 3, modified reference Majumdar teaches the computer-implemented method of claim 1, but is silent on wherein determining the placement of the one or more objects is based on: one or more feature maps generated using the one or more point clouds; and one or more placement task embeddings.
However, Yu teaches wherein determining the placement of the one or more objects is based on: one or more feature maps generated using the one or more point clouds ([0032] via “The storage devices 204 can also store object tracking data. In some embodiments, the object tracking data can include a log of scanned, manipulated, and/or transferred objects. In some embodiments, the object tracking data can include image data (e.g., a picture, point cloud, live video feed, etc.) of the objects at one or more locations (e.g., designated pickup or drop locations and/or conveyor belts) and/or placement locations/poses of the objects at the one or more locations.”).
Further, Sisbot teaches wherein determining the placement of the one or more objects is based on: one or more placement task embeddings (Col. 19 line 59 – Col. 20 line 10, where “The assessment module 268 may analyze the assessment criteria data 261 and determined if the threshold is exceeded. If the threshold is exceeded, the method 500 may return to step 508 where the assessment module 268 may select a different object transfer pose determined to be less risky. The method may cycle through in this manner a predetermined number of times attempting to identify an acceptable object transfer pose. If an acceptable object transfer pose is not identified, then the method 500 may end or the second communication unit 283 may ask for new instructions from the user 101 or the medical input provider 170. Alternatively, if no acceptable object transfer pose is identified after a predetermined number of attempts the pose system 199 may analyze the sensor data 249 to identify a flat or approximately flat surface in the user environment 198 and then determine actuations and movement vectors for the robot 190 to take to traverse to that surface and place the object 188 on the surface for the user 101 to pick up for themselves.”), (Note: The Examiner interprets the placement task embeddings as a number of discretized orientations of the object for placement, as this term is defined in paragraph [0082] of the specification of the instant application).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Yu wherein determining the placement of the one or more objects is based on: one or more feature maps generated using the one or more point clouds. Doing so incorporates a known method of mapping out a representation of the environment where the object(s) is/are located, as stated by Yu ([0038] via “The imaging devices 222 can generate a representation of the detected environment, such as a digital image, a depth map, and/or a point cloud, used for implementing machine/computer vision (e.g., for automatic inspection, robot guidance, or other robotic applications).”).
In addition, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Sisbot wherein determining the placement of the one or more objects is based on: one or more placement task embeddings. Doing so analyzes and determines the most appropriate placement pose for the object, as stated above by Sisbot.


12. Claim(s) 5, 13, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Majumdar et al. (US 20220203547 A1 hereinafter Majumdar) in view of Noda et al. (US 20250353173 A1 hereinafter Noda) and Chen et al. (US 20230202774 A1 hereinafter Chen), and further in view of Sun et al. (US 20240017426 A1 hereinafter Sun).

Regarding Claim 5, modified reference Majumdar teaches the computer-implemented method of claim 1, but is silent on further determining one or more grasp parameters to be used to calculate a grasp pose to grasp the one or more objects.
However, Sun teaches further determining one or more grasp parameters to be used to calculate a grasp pose to grasp the one or more objects ([0112] via “Process 2000 is shown to include identifying a pre-grasp configuration for a robotic hand based on a target quantity of objects to be grasped (2010). For example, approaches to identifying a pre-grasp configuration such as the best-expectation pre-grasp and the maximum capability pre-grasp as described above can be used.”), ([0114] via “Process 2000 is also shown to include operating fingers of the robotic hand in accordance with the pre-grasp configuration (2030). For example, processor 131 can be programmed to control operation of hand 135 such that the fingers of hand 135 are oriented in accordance with the identified pre-grasp configuration. The pre-grasp configuration is intended for grasping the target quantity of objects with a high probability of success. Once hand 135 is oriented in accordance with the pre-grasp configuration, hand 135 is ready to grasp multiple objects as part of the transfer process.”), (Note: The Examiner interprets the pre-grasp configuration as the grasp parameter.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Sun wherein the computer-implemented method further determines one or more grasp parameters to be used to calculate a grasp pose to grasp the one or more objects. Doing so configures the robot hand to perform a pre-grasp configuration resulting in the most likely configuration for a successful grasp of the object(s), as stated above by Sun in both paragraphs.

Regarding Claim 13, modified reference Majumdar teaches the non-transitory computer readable storage medium of claim 9, but is silent on wherein the computer system is further caused to determine one or more grasp parameters to be used to calculate a grasp pose to grasp the one or more objects.
However, Sun teaches wherein the computer system is further caused to determine one or more grasp parameters to be used to calculate a grasp pose to grasp the one or more objects ([0112] via “Process 2000 is shown to include identifying a pre-grasp configuration for a robotic hand based on a target quantity of objects to be grasped (2010). For example, approaches to identifying a pre-grasp configuration such as the best-expectation pre-grasp and the maximum capability pre-grasp as described above can be used.”), ([0114] via “Process 2000 is also shown to include operating fingers of the robotic hand in accordance with the pre-grasp configuration (2030). For example, processor 131 can be programmed to control operation of hand 135 such that the fingers of hand 135 are oriented in accordance with the identified pre-grasp configuration. The pre-grasp configuration is intended for grasping the target quantity of objects with a high probability of success. Once hand 135 is oriented in accordance with the pre-grasp configuration, hand 135 is ready to grasp multiple objects as part of the transfer process.”), (Note: The Examiner interprets the pre-grasp configuration as the grasp parameter.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Sun wherein the computer system is further caused to determine one or more grasp parameters to be used to calculate a grasp pose to grasp the one or more objects. Doing so configures the robot hand to perform a pre-grasp configuration resulting in the most likely configuration for a successful grasp of the object(s), as stated above by Sun in both paragraphs.

Regarding Claim 19, modified reference Majumdar teaches the system of claim 17, but is silent on wherein the one or more processors are to determine one or more grasp parameters to be used to calculate a grasp pose to grasp the one or more objects.
However, Sun teaches wherein the one or more processors are to determine one or more grasp parameters to be used to calculate a grasp pose to grasp the one or more objects ([0112] via “Process 2000 is shown to include identifying a pre-grasp configuration for a robotic hand based on a target quantity of objects to be grasped (2010). For example, approaches to identifying a pre-grasp configuration such as the best-expectation pre-grasp and the maximum capability pre-grasp as described above can be used.”), ([0114] via “Process 2000 is also shown to include operating fingers of the robotic hand in accordance with the pre-grasp configuration (2030). For example, processor 131 can be programmed to control operation of hand 135 such that the fingers of hand 135 are oriented in accordance with the identified pre-grasp configuration. The pre-grasp configuration is intended for grasping the target quantity of objects with a high probability of success. Once hand 135 is oriented in accordance with the pre-grasp configuration, hand 135 is ready to grasp multiple objects as part of the transfer process.”), (Note: The Examiner interprets the pre-grasp configuration as the grasp parameter.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Sun wherein the one or more processors are to determine one or more grasp parameters to be used to calculate a grasp pose to grasp the one or more objects. Doing so configures the robot hand to perform a pre-grasp configuration resulting in the most likely configuration for a successful grasp of the object(s), as stated above by Sun in both paragraphs.


13. Claim(s) 6, 14, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Majumdar et al. (US 20220203547 A1 hereinafter Majumdar) in view of Noda et al. (US 20250353173 A1 hereinafter Noda) and Chen et al. (US 20230202774 A1 hereinafter Chen), and further in view of Boroushaki et al. (US 20220168899 A1 hereinafter Boroushaki).

Regarding Claim 6, modified reference Majumdar teaches the computer-implemented method of claim 1, but is silent on wherein the one or more neural networks are trained to determine a grasp of one or more previously unseen objects and determine a placement of the one or more previously unseen objects.
However, Boroushaki teaches wherein the one or more neural networks are trained to determine a grasp of one or more previously unseen objects and determine a placement of the one or more previously unseen objects ([0077] via “Referring to FIGS. 5, 6A and 6B, the method includes, at 605, receiving data 510 from a vision sensor 505. As indicated, the data may be RGB-D data output from a camera or other image sensor. The data may be a different type of data in another embodiment provided, for example, that the data provides a video representation of the area of interest and optionally depth information of features in the area of interest including any one or more of those that may be partially or fully occluding the target object.”), ([0083] via “At 635, if operation 630 determines that the target object is within reach of the robot, the system transitions to an RF-visual grasping operation (e.g., RF-visual grasping 805 in FIG. 8).”), ([0153] via “A number of additional operations may be performed after controller 30 determines that a successful grasp of the target object has been executed. One additional operation may include a selective sorting operation 840. After confirming that the target object has been successfully grasped by the robot, the controller 30 determines the location of a sorting bin into which the target object is to be placed. The controller may then control the robot to move and place the target object into the sorting bin.”), ([0156] via “In accordance with one or more of the aforementioned embodiments, a control system and method are provided which locates a partially or fully occluded target object (generally, occluded objects) in an area of interest. A variety of types of obstructions may occlude perception of the target object by a vision sensor. By combining visual information from such a sensor with RF-based location information, the control system may effectively “see through” the obstructions to pinpoint the location of the target object. Model-based and deep-learning techniques may then be employed to move the robot into range relative to the target object and to provide access by the robot to the target object, which access may include, but is not limited to, performing a grasping operation for the target object while, for example, the object is still in the occluded state.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Boroushaki wherein the one or more neural networks are trained to determine a grasp of one or more previously unseen objects and determine a placement of the one or more previously unseen objects. Doing so allows for the robot to locate and subsequently grasp partially and fully occluded objects from a visual sensor, as stated above by Boroushaki in paragraph [0156].

Regarding Claim 14, modified reference Majumdar teaches the non-transitory computer readable storage medium of claim 9, but is silent on wherein the one or more neural networks are trained to determine a grasp of one or more previously unseen objects and determine a placement of the one or more previously unseen objects.
However, Boroushaki teaches wherein the one or more neural networks are trained to determine a grasp of one or more previously unseen objects and determine a placement of the one or more previously unseen objects ([0077] via “Referring to FIGS. 5, 6A and 6B, the method includes, at 605, receiving data 510 from a vision sensor 505. As indicated, the data may be RGB-D data output from a camera or other image sensor. The data may be a different type of data in another embodiment provided, for example, that the data provides a video representation of the area of interest and optionally depth information of features in the area of interest including any one or more of those that may be partially or fully occluding the target object.”), ([0083] via “At 635, if operation 630 determines that the target object is within reach of the robot, the system transitions to an RF-visual grasping operation (e.g., RF-visual grasping 805 in FIG. 8).”), ([0153] via “A number of additional operations may be performed after controller 30 determines that a successful grasp of the target object has been executed. One additional operation may include a selective sorting operation 840. After confirming that the target object has been successfully grasped by the robot, the controller
30 determines the location of a sorting bin into which the target object is to be placed. The controller may then control the robot to move and place the target object into the sorting bin.”), ([0156] via “In accordance with one or more of the aforementioned embodiments, a control system and method are provided which locates a partially or fully occluded target object (generally, occluded objects) in an area of interest. A variety of types of obstructions may occlude perception of the target object by a vision sensor. By combining visual information from such a sensor with RF-based location information, the control system may effectively “see through” the obstructions to pinpoint the location of the target object. Model-based and deep-learning techniques may then be employed to move the robot into range relative to the target object and to provide access by the robot to the target object, which access may include, but is not limited to, performing a grasping operation for the target object while, for example, the object is still in the occluded state.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Boroushaki wherein the one or more neural networks are trained to determine a grasp of one or more previously unseen objects and determine a placement of the one or more previously unseen objects. Doing so allows for the robot to locate and subsequently grasp partially and fully occluded objects from a visual sensor, as stated above by Boroushaki in paragraph [0156].

Regarding Claim 20, modified reference Majumdar teaches the system of claim 17, but is silent on wherein the one or more neural networks are trained to determine a grasp of one or more previously unseen objects and determine a placement of the one or more previously unseen objects.
However, Boroushaki teaches wherein the one or more neural networks are trained to determine a grasp of one or more previously unseen objects and determine a placement of the one or more previously unseen objects ([0077] via “Referring to FIGS. 5, 6A and 6B, the method includes, at 605, receiving data 510 from a vision sensor 505. As indicated, the data may be RGB-D data output from a camera or other image sensor. The data may be a different type of data in another embodiment provided, for example, that the data provides a video representation of the area of interest and optionally depth information of features in the area of interest including any one or more of those that may be partially or fully occluding the target object.”), ([0083] via “At 635, if operation 630 determines that the target object is within reach of the robot, the system transitions to an RF-visual grasping operation (e.g., RF-visual grasping 805 in FIG. 8).”), ([0153] via “A number of additional operations may be performed after controller 30 determines that a successful grasp of the target object has been executed. One additional operation may include a selective sorting operation 840. After confirming that the target object has been successfully grasped by the robot, the controller
30 determines the location of a sorting bin into which the target object is to be placed. The controller may then control the robot to move and place the target object into the sorting bin.”), ([0156] via “In accordance with one or more of the aforementioned embodiments, a control system and method are provided which locates a partially or fully occluded target object (generally, occluded objects) in an area of interest. A variety of types of obstructions may occlude perception of the target object by a vision sensor. By combining visual information from such a sensor with RF-based location information, the control system may effectively “see through” the obstructions to pinpoint the location of the target object. Model-based and deep-learning techniques may then be employed to move the robot into range relative to the target object and to provide access by the robot to the target object, which access may include, but is not limited to, performing a grasping operation for the target object while, for example, the object is still in the occluded state.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Boroushaki wherein the one or more neural networks are trained to determine a grasp of one or more previously unseen objects and determine a placement of the one or more previously unseen objects. Doing so allows for the robot to locate and subsequently grasp partially and fully occluded objects from a visual sensor, as stated above by Boroushaki in paragraph [0156].


14. Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Majumdar et al. (US 20220203547 A1 hereinafter Majumdar) in view of Noda et al. (US 20250353173 A1 hereinafter Noda) and Chen et al. (US 20230202774 A1 hereinafter Chen), and further in view of Wiersma et al. (US 20240408766 A1 hereinafter Wiersma).

Regarding Claim 10, modified reference Majumdar teaches the non-transitory computer readable storage medium of claim 9, but is silent on wherein the grasp of the one or more objects is determined based on: one or more feature maps generated using the one or more point clouds; one or more contact points associated with the one or more objects; and one or more grasp task feature embeddings.
However, Wiersma teaches wherein the grasp of the one or more objects is determined based on: one or more feature maps generated using the one or more point clouds ([0068] via “A depth map may also be included as a learnable property map. It can help determine which object to pick first as it can show which objects lie at or near the top of the bin. During the dataset generation, the depth map can be directly determined from the pixel-aligned point cloud. A depth map may be created by normalizing the values from the point cloud to be between 0 and 1.”); 
one or more contact points associated with the one or more objects ([0040] via “In yet another aspect, the invention may relate to a method of automatic generation of training data for training a deep neural network comprising: capturing image data associated with one or more objects to be picked up by a robot gripper, the image data including a 2D image and an associated (pixel-aligned) point cloud; determining a plurality of locations on the point cloud and using a patch fitting algorithm to fit points of the point cloud associated with each of the plurality of locations to a curved surface patch, each surface patch being associated with one or more patch parameters defining at least one of an orientation of a surface patch in a reference frame of the object, a curvature of the surface patch, or, dimensions of the surface patch; ...”), (Note: The Examiner interprets the geometry/dimensions of the object as the one or more contact points.); and 
one or more grasp task feature embeddings ([0063] via “Hence, a patch fitting algorithm is used to fit parts of a point cloud representing an object to be picked up by a gripper to a predetermined patch defining a surface having a predetermined orientation in the reference frame of the object and a predetermined curvature. Information associated with these patches may then be encoded into an object property map. This way, 3D information of an object may be encoded in (2D) object property map.”), (Note: The Examiner interprets the task feature embeddings as using point clouds to generate features of the object, as this term is defined within paragraph [0089] of the specification of the instant application.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wiersma wherein the grasp of the one or more objects is determined based on: one or more feature maps generated using the one or more point clouds; one or more contact points associated with the one or more objects; and one or more grasp task feature embeddings. Doing so determines the three-dimensional information of the object to be grasped, as stated by Wiersma above in all three citations.


15. Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Majumdar et al. (US 20220203547 A1 hereinafter Majumdar) in view of Noda et al. (US 20250353173 A1 hereinafter Noda) and Chen et al. (US 20230202774 A1 hereinafter Chen), and further in view of Yu et al. (US 20230041343 A1 hereinafter Yu).

Regarding Claim 11, modified reference Majumdar teaches the non-transitory computer readable storage medium of claim 9, but is silent on wherein the placement of the one or more objects is determined based on: one or more feature maps generated using the one or more point clouds; one or more contact points associated with the one or more objects; and one or more placement task feature embeddings.
However, Yu teaches wherein the placement of the one or more objects is determined based on: one or more feature maps generated using the one or more point clouds ([0032] via “The storage devices 204 can also store object tracking data. In some embodiments, the object tracking data can include a log of scanned, manipulated, and/or transferred objects. In some embodiments, the object tracking data can include image data (e.g., a picture, point cloud, live video feed, etc.) of the objects at one or more locations (e.g., designated pickup or drop locations and/or conveyor belts) and/or placement locations/poses of the objects at the one or more locations.”); 
one or more contact points associated with the one or more objects ([0077] via “At block 613, the robotic system 100 can obtain additional data during the transfer of the object. For example, the robotic system 100 can obtain lateral dimensions of the object based on implementing an initial displacement to separate the edges of the grasped object from adjacent objects.”), (Note: The Examiner interprets the obtaining the dimensions of the object as the contact points.); and 
one or more placement task feature embeddings ([0032] via “The storage devices 204
can also store object tracking data. In some embodiments, the object tracking data can include a log of scanned, manipulated, and/or transferred objects. In some embodiments, the object tracking data can include image data (e.g., a picture, point cloud, live video feed, etc.) of the objects at one or more locations (e.g., designated pickup or drop locations and/or conveyor belts) and/or placement locations/poses of the objects at the one or more locations.”), (Note: The Examiner interprets the task feature embeddings as using point clouds to generate features of the object, as this term is defined within paragraph [0089] of the specification of the instant application.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Yu wherein the placement of the one or more objects is determined based on: one or more feature maps generated using the one or more point clouds; one or more contact points associated with the one or more objects; and one or more placement task feature embeddings. Doing so captures multiple data points of the object to be placed, as stated above by Yu as a combination of both citations.


16. Claim(s) 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Majumdar et al. (US 20220203547 A1 hereinafter Majumdar) in view of Noda et al. (US 20250353173 A1 hereinafter Noda) and Chen et al. (US 20230202774 A1 hereinafter Chen), and further in view of Wiersma et al. (US 20240408766 A1 hereinafter Wiersma) and Yu et al. (US 20230041343 A1 hereinafter Yu).

Regarding Claim 18, modified reference Majumdar teaches the system of claim 17, but is silent on wherein the grasp of the one or more objects and the placement of the one or more objects is determined based on: one or more feature maps generated using the one or more point clouds; one or more contact points associated with the one or more objects; and one or more task features.
However, Wiersma teaches wherein the grasp of the one or more objects is determined based on: one or more feature maps generated using the one or more point clouds ([0068] via “A depth map may also be included as a learnable property map. It can help determine which object to pick first as it can show which objects lie at or near the top of the bin. During the dataset generation, the depth map can be directly determined from the pixel-aligned point cloud. A depth map may be created by normalizing the values from the point cloud to be between 0 and 1.”); 
one or more contact points associated with the one or more objects ([0040] via “In yet another aspect, the invention may relate to a method of automatic generation of training data for training a deep neural network comprising: capturing image data associated with one or more objects to be picked up by a robot gripper, the image data including a 2D image and an associated (pixel-aligned) point cloud; determining a plurality of locations on the point cloud and using a patch fitting algorithm to fit points of the point cloud associated with each of the plurality of locations to a curved surface patch, each surface patch being associated with one or more patch parameters defining at least one of an orientation of a surface patch in a reference frame of the object, a curvature of the surface patch, or, dimensions of the surface patch; ...”), (Note: The Examiner interprets the geometry/dimensions of the object as the one or more contact points.); and 
one or more task features ([0063] via “Hence, a patch fitting algorithm is used to fit parts of a point cloud representing an object to be picked up by a gripper to a predetermined patch defining a surface having a predetermined orientation in the reference frame of the object and a predetermined curvature. Information associated with these patches may then be encoded into an object property map. This way, 3D information of an object may be encoded in (2D) object property map.”), (Note: The Examiner interprets the task feature embeddings as using point clouds to generate features of the object, as this term is defined within paragraph [0089] of the specification of the instant application.).
Further, Yu teaches wherein the placement of the one or more objects is determined based on: one or more feature maps generated using the one or more point clouds ([0032] via “The storage devices 204 can also store object tracking data. In some embodiments, the object tracking data can include a log of scanned, manipulated, and/or transferred objects. In some embodiments, the object tracking data can include image data (e.g., a picture, point cloud, live video feed, etc.) of the objects at one or more locations (e.g., designated pickup or drop locations and/or conveyor belts) and/or placement locations/poses of the objects at the one or more locations.”); 
one or more contact points associated with the one or more objects ([0077] via “At block 613, the robotic system 100 can obtain additional data during the transfer of the object. For example, the robotic system 100 can obtain lateral dimensions of the object based on implementing an initial displacement to separate the edges of the grasped object from adjacent objects.”), (Note: The Examiner interprets the obtaining the dimensions of the object as the contact points.); and 
one or more task features ([0032] via “The storage devices 204 can also store object tracking data. In some embodiments, the object tracking data can include a log of scanned, manipulated, and/or transferred objects. In some embodiments, the object tracking data can include image data (e.g., a picture, point cloud, live video feed, etc.) of the objects at one or more locations (e.g., designated pickup or drop locations and/or conveyor belts) and/or placement locations/poses of the objects at the one or more locations.”), (Note: The Examiner interprets the task feature embeddings as using point clouds to generate features of the object, as this term is defined within paragraph [0089] of the specification of the instant application.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wiersma wherein the grasp of the one or more objects is determined based on: one or more feature maps generated using the one or more point clouds; one or more contact points associated with the one or more objects; and one or more task features. Doing so determines the three-dimensional information of the object to be grasped, as stated by Wiersma above in all three citations.
In addition, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Yu wherein the placement of the one or more objects is determined based on: one or more feature maps generated using the one or more point clouds; one or more contact points associated with the one or more objects; and one or more task features. Doing so captures multiple data points of the object to be placed, as stated above by Yu as a combination of both citations.


Examiner’s Note
17. The Examiner has cited particular paragraphs or columns and line numbers in the
references applied to the claims above for the convenience of the Applicant. Although the
specified citations are representative of the teachings of the art and are applied to specific
limitations within the individual claim, other passages and figures may apply as well. It is
respectfully requested of the Applicant in preparing responses, to fully consider the references
in their entirety as potentially teaching all or part of the claimed invention, as well as the
context of the passage as taught by the prior art or disclosed by the Examiner. See MPEP
2141.02 [R-07.2015] VI. A prior art reference must be considered in its entirety, i.e., as a whole,
including portions that would lead away from the claimed Invention. W.L. Gore & Associates,
Inc. v. Garlock, Inc., 721 F.2d 1540, 220 USPQ 303 (Fed. Cir. 1983), cert, denied, 469 U.S. 851
(1984). See also MPEP §2123.


Conclusion
18. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

19. Any inquiry concerning this communication or earlier communications from the
examiner should be directed to BYRON X KASPER whose telephone number is (571)272-3895.
The examiner can normally be reached Monday - Friday 8 am - 5 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing
using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is
encouraged to use the USPTO Automated Interview Request (AIR) at
http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s
supervisor, Adam Mott can be reached on (571) 270-5376. The fax phone number for the
organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be
obtained from Patent Center. Unpublished application information in Patent Center is available
to registered users. To file and manage patent submissions in Patent Center, visit:
https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for
more information about Patent Center and https://www.uspto.gov/patents/docx for
information about filing in DOCX format. For additional questions, contact the Electronic
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO
Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BYRON XAVIER KASPER/Examiner, Art Unit 3657                                                                                                                                                                                                        
/ADAM R MOTT/Supervisory Patent Examiner, Art Unit 3657
Read full office action
Prosecution Timeline

Sep 15, 2023
Application Filed
Jun 02, 2025
Non-Final Rejection — §103
Sep 02, 2025
Applicant Interview (Telephonic)
Sep 02, 2025
Examiner Interview Summary
Nov 04, 2025
Response Filed
Jan 02, 2026
Final Rejection — §103
Apr 08, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

17/126,888
Patent 12594964
METHOD OF AND SYSTEM FOR GENERATING REFERENCE PATH OF SELF DRIVING CAR (SDC)
2y 5m to grant Granted Apr 07, 2026
18/649,939
Patent 12594137
HARD STOP PROTECTION SYSTEM AND METHOD
2y 5m to grant Granted Apr 07, 2026
18/231,501
Patent 12583101
METHOD FOR OPERATING A MODULAR ROBOT, MODULAR ROBOT, COLLISION AVOIDANCE SYSTEM, AND COMPUTER PROGRAM PRODUCT
2y 5m to grant Granted Mar 24, 2026
18/288,416
Patent 12576529
ROBOT SIMULATION DEVICE
2y 5m to grant Granted Mar 17, 2026
17/707,930
Patent 12564962
ROBOT REMOTE OPERATION CONTROL DEVICE, ROBOT REMOTE OPERATION CONTROL SYSTEM, ROBOT REMOTE OPERATION CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
70%
Grant Probability
88%
With Interview (+18.4%)
3y 0m
Median Time to Grant
Moderate
PTA Risk
Based on 103 resolved cases by this examiner. Grant probability derived from career allow rate.