DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on December 30, 2025, has been entered.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant’s Amendments filed on December 30, 2025, has been entered and made of record.
Currently pending Claim(s) 1-4
Independent Claim(s) 1 and 4
Amended Claim(s) 1 and 4
Response to Arguments
This office action is responsive to Applicant’s Arguments/Remarks Made in an Amendment received on December 30, 2025.
In view of amendments filed on December 30, 2025, to the claims, the Applicant has amended claims 1 and 4 in response to the previous 35 U.S.C. § 112(b) rejection, and the rejection is overcome. Additionally, claims 1 and 4 were amended in response to the 35 U.S.C. § 103 rejection, and the amended claims now further explain the process of determining distance information using parallax and searching for workpieces in distance images using feature amounts. However, a new 35 U.S.C. § 103 rejection is applied to each claim using the art of Rublee (US 9,507,995 B2), which more directly anticipates the claimed invention than the previously used reference of Yasushi (JP 2012215394A).
Additionally, upon further review of the claims, it has been determined that a ground of rejection under 35 U.S.C. § 101 and interpretations under 35 U.S.C. § 112(f) were inadvertently omitted from the previous office actions. To ensure a complete and accurate examination of the pending claims, the new rejections and interpretations are included within this office action in the body of rejection below.
In view of Applicant Arguments/Remarks filed December 30, 2025, with respect to the claims, the Applicant first argued (on Remarks pages 4-5) that the amended claims 1 and 4 now include the limitation “wherein the image processing unit acquires the distance information by searching the second image for an image corresponding to a small zone in the first image, calculating a parallax between the first image and the second image, and converting the parallax to a distance,” and that the prior art of record fails to teach this newly amended limitation. The Examiner agrees that Yasushi teaches determining depth information by determining the parallax between edges rather than by using small zones, and Mutian also fails to teach this amended limitation. Therefore, the Examiner conducted a new search and found analogous prior art (Rublee US 9,507,995 B2) which more directly anticipates the claimed invention. Accordingly, Yasushi and Mutian are now removed from the 35 U.S.C. § 103 rejections. Although not included in the 35 U.S.C. § 103 rejections, also consider that Ohara et al. (US 10,614,585 B2) included in the conclusion of this office action directly teaches the method of selecting a small corresponding zone in both images to calculate the parallax.
Additionally, the Applicant argued (on Remarks page 5) that the prior art of record fails to teach selecting one or more feature amounts to detect an object in response to an operation of an operation unit by an instructor, setting a search range for searching for one or more feature amounts, and adjusting the search range for searching for the feature amounts by repeatedly detecting the workpiece. The Examiner agrees that Yasushi and Mutian fail to teach this limitation, but Konolige (Cols. 15-16) teaches determining ranges of features of objects (such as physical dimensions) and using the ranges to detect object edges within images. The ranges can be observed from detected objects or predetermined by an operator. The new reference, Rublee, is very similar to Konolige, so Rublee is used to teach this limitation in the body of rejection below (although both references teach this limitation).
The Examiner also acknowledges that Rublee and Konolige are not specific about using a range of features for searching, so the Examiner also includes the art of Arpenti (RGB-D Recognition and Localization of Cases for Robotic Depalletizing in Supermarkets. IEEE Xplore. https://ieeexplore.ieee.org/document/9158347.) in the 35 U.S.C. § 103 rejections. Arpenti shows a practical use of recognizing objects from a depth image based on a range of object features. For example, the robot can search for supermarket items in mixed pallets by recognizing objects that fall within a feature range (objects with dimensions that fall within a threshold of the known dimensions) [Arpenti Section 2 and Fig. 3].
Thus, the Examiner presents new rejections under 35 U.S.C. § 103 to the claims in the body of rejection below.
Claim Objections
Claims 1 and 4 are objected to. Claim 1 recites the limitations:
an image capturing unit that captures a two-dimensional image of a workpiece
an image processing unit that acquires distance information of the workpiece based on the two-dimensional image
a distance image generation unit that generates a distance image based on the distance information
These limitations teach that distance information is obtained from a single 2D image for generating a distance image. However, the claim later specifies that the capturing step utilizes two images and calculates the parallax between the two images for determining distance information and generating a distance image. To promote clarity of the claim, the Examiner recommends specifying that the image capturing unit captures two 2D images of the workpiece in limitations (a) and (b) rather than later in the claim.
Claim 4 is objected to for the same reasons as claim 1.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.
Such claim limitations are:
Image Processing Unit in claim 1;
Distance Image Generation Unit in claim 1;
Image Recognition Unit in claims 1 and 4;
Thickness Calculation Unit in claim 1;
Operation Unit in claim 1.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
As shown in Fig. 1 and 0042-0044, the Image Processing Unit, Distance Image Generation Unit, Image Recognition Unit, and Thickness Calculation Unit are interpreted as one or more computer systems executing instructions. The computer(s) can include at least one processor, memory, and non-transitory storage device, and the computer(s) is/are capable of receiving images from the image capturing unit for processing and communicating with the robot controller. Similarly, the Operation Unit is interpreted to be a computer system with I/O devices to receive input from an operator, but the Operation Unit is not shown in Fig. 1 or described in detail.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim Interpretation: Under the broadest reasonable interpretation, the terms of the claim are presumed to have their plain meaning consistent with the specification as it would be interpreted by one of ordinary skill in the art. See MPEP 2111.
Claims 1 and 4 are directed to non-statutory subject matter.
Regarding claim 1, several steps are recited:
Capturing a two-dimensional image of a workpiece
Acquiring distance information of the workpiece based on the two-dimensional image
Generating a distance image based on the distance information
Recognizing the workpiece based on the distance image
Calculating a thickness indicating a vertical dimension of the workpiece recognized by the image recognition unit based on the distance image
Detecting one or more workpieces from among a plurality of workpieces by selecting one or more feature amounts of a workpiece in response to an operation of an operation unit by an instructor
Capturing a first image and a second image
Setting a search range for searching for the one or more feature amounts, and adjusting the search range to search for the feature amounts by repeatedly detecting the plurality of workpieces
Searching the second image for an image corresponding to a small zone in the first image, calculating a parallax between the first image and the second image, and converting the parallax to a distance
Step 1: This part of the eligibility analysis evaluates whether the claim falls within any statutory category. See MPEP 2106.03. The claims recite a method for a robotic system and fall within one of the statutory categories of invention.
Step 2A, Prong One: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04, subsection II, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim. The broadest reasonable interpretation of steps (a) - (i) is that the steps fall within the mental process groupings of abstract ideas because it covers concepts performed in the human mind, including observation, evaluation, judgment, and opinion. See MPEP 2106.04(a)(2), subsection III.
Specifically, steps (d), (f), and (h) are mental processes that include observing an image and recognizing certain objects in the image based on observed features. These processes can be performed in the human mind. Steps (e) and (i) are mathematical concepts utilizing known formulas for calculating thickness and converting distance information using parallax using generic hardware. Steps (a)-(c) and (g) are steps of data gathering and data manipulation that recite capturing images for determining a distance image. There is no recitation of a specific improvement to the functioning of a computer or camera beyond conventional stereo processing and depth map generation.
Step 2A, Prong Two: This part of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception or whether the claim is “directed to” the judicial exception. This evaluation is performed by (1) identifying whether there are any additional elements recited in the claim beyond the judicial exception, and (2) evaluating those additional elements individually and in combination to determine whether the claim as a whole integrates the exception into a practical application. See MPEP 2106.04(d). Claims 1 and 4 recite the additional element of:
a first internal camera that captures a first image and a second internal camera that captures a second image
This limitation refers to using generic cameras to capture images, and the claim amounts to no more than instructions to apply the exception using a generic computer. See MPEP 2106.05(f).
Step 2B: This part of the eligibility analysis evaluates whether the claim as a whole amounts to significantly more than the recited exception i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim. See MPEP 2106.05.
Even when considered in combination, this additional element represents mere instructions to implement an abstract idea or other exception utilizing a generic computer and a camera, which do not provide an inventive concept.
Claim 4 is rejected for the same reasons stated above.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1 and 3-4 are rejected under 35 USC § 103 as being unpatentable over Rublee et al. (US 9,507,995 B2), hereafter Rublee in view of Arpenti et al. (“RGB-D Recognition and Localization of Cases for Robotic Depalletizing in Supermarkets.” https://ieeexplore.ieee.org/document/9158347.), hereafter Arpenti.
Regarding claim 1, Rublee teaches a robot system (Fig. 2 shows the components of the robotic system.) comprising:
an image capturing unit that captures a two-dimensional image of a workpiece (See the cameras 506 and 504 and the light projector 502 of Fig. 5.);
an image processing unit that acquires distance information of the workpiece based on the two-dimensional image (See Figs. 7A-7B. A projector 700 projects a light pattern onto the workpiece 704B, and the reflected light 706A and 706B is observed in the images 702 captured by the two cameras to acquire distance information.);
a distance image generation unit that generates a distance image based on the distance information (Rublee teaches creating a depth map to determine distances to objects. [Col. 5, lines 8-16] “The computing device may then determine a first depth estimate for at least one surface in the environment based on corresponding features between the first image and the second image. By way of example, the computing device may determine mappings of corresponding pixel values within the two images, and based on a physical relationship between the two optical sensors, the computing device can determine a depth map using triangulation.”);
an image recognition unit recognizing the workpiece based on the distance image (Rublee teaches using depth maps for object recognition for robot interaction with the workpieces. [Col. 6, lines 54-62] “The sensors may scan an environment containing one or more objects in order to capture visual data and/or 3D depth information. Data from the scans may then be integrated into a representation of larger areas in order to provide digital environment reconstruction. In additional examples, the reconstructed environment may then be used for identifying objects to pick up, determining pick positions for objects, and/or planning collision-free trajectories for the one or more robotic arms and/or a mobile base.”); and
a thickness calculation unit that calculates a thickness indicating a vertical dimension of the workpiece recognized by the image recognition unit based on the distance image (In Col. 11, lines 3-67, Rublee discusses recognizing facades, such as a stack of boxes, from the depth map information. Rublee teaches recognizing individual boxes within facades and identifying which boxes the robot will pick up based on observed size measurements of identified surfaces of each box. In Col. 13, line 63 – Col. 14, line 19, Rublee teaches computing object bounding volumes so that objects can be recognized based on their dimensions. [Col. 13, line 66 – Col. 14, line 9] “In some examples, object bounding volumes may be computed and/or distinguishing features of objects may be found (such as textures, colors, barcodes or OCR). In some embodiments, objects may be sorted into an assigned destination location by matching against a database of location assignments indexed by object type or object ID. For instance, an object's locations may be derived from reading a barcode, considering the size of the object, and/or by recognizing a particular kind of object.” Additionally, in Col 12, lines 1-40, Rublee teaches detecting workpieces, recognizing geometrical shapes of the workpieces, and representing workpieces as 3D models within a virtual environment to assist the robotic system with collision detection. This process would require calculating 3D data for each entire workpiece.).
wherein the image capturing unit includes a first internal camera that captures a first image and a second internal camera that captures a second image (See the cameras 506 and 504 and the light projector 502 of Fig. 5.),
wherein the image recognition unit detects one or more workpieces from among a plurality of workpieces by selecting one or more feature amounts of a workpiece in response to an operation of an operation unit by an instructor (See Col. 11, lines 3-67, discussing recognizing individual workpieces from a plurality of workpieces. [Col. 11, lines 31-42] “In further examples, a facade may be constructed from boxes, for instance to plan an order for picking up the boxes. For instance, as shown in FIG. 3C, box 322 may be identified by the robotic device as the next box to pick up. Box 322 may be identified within a facade representing a front wall of the stack of boxes 320 constructed based on sensor data collected by one or more sensors, such as sensors 206 and 208. A control system may then determine that box 322 is the next box to pick, possibly based on its shape and size, its position on top of the stack of boxes 320, and/or based on characteristics of a target container or location for the boxes.” Thus, boxes can be detected and/or selected based on feature amounts.).
wherein the image processing unit acquires the distance information by searching the second image for an image corresponding to a small zone in the first image, calculating a parallax between the first image and the second image, and converting the parallax to a distance (Rublee teaches identifying the shift of a small zone of the projected light pattern in each image from its expected location for determining depth, and in further examples, Rublee teaches determining small corresponding zones between the two images for determining depth using triangulation. [Col. 16, lines 8-29] “…the computing device may use a block matching algorithm to determine corresponding features between the first image and the second image. Using the block matching algorithm, a segment of pixels of the first image (e.g., a 5×5, 11×11, or 21×21 window of pixels) may be matched against a range of segments of pixels in the second image to determine the closest matching segment of pixels in the second image. For example, the closest matching segment of pixels may be determined by minimizing a match function. Subsequently, given the pixel positions of a pair of corresponding features, one of various triangulation methods can be used to reconstruct the 3D position of the identified feature. According to the process of triangulation, the 3D position (x, y, z) of a point P can be reconstructed from the perspective projection of P on the image planes of a first sensor and a second sensor, given the relative position and orientation of the two sensors. Therefore, if the pixel positions of a pair of corresponding features are known, and the relative position and orientation of the two sensors is known, the 3D position of the feature can be determined.” Additionally, see the background section, Col. 1, line 24 – Col. 2, line 10, where Rublee discusses that structured light methods for determining stereo depth are well known in the art. [Col. 1, lines 30-36] “…two optical sensors with a known physical relationship to one another are used to capture two images of a scene. By finding mappings of corresponding pixel values within the two images and calculating how far apart these common areas reside in pixel space, a computing device can determine a depth map or image using triangulation.”).
Rublee teaches that plans are made for palletizing/depalletizing (for example, large, heavy boxes will be placed at the bottom of the stack and smaller, lighter boxes will be placed at the top of a stack), so searching and selecting boxes based on feature amounts is required by Rublee’s invention. However, Although these feature amounts could likely be a range, Rublee is not specific about the feature amounts being a range rather than a single value. Thus, Rublee fails to teach setting a search range for searching for the one or more feature amounts and adjusting the search range to search for the feature amounts by repeatedly detecting the plurality of workpieces.
However, Arpenti teaches sets a search range for searching for the one or more feature amounts, and adjusts the search range to search for the feature amounts by repeatedly detecting the plurality of workpieces (Arpenti teaches recognizing boxes on a mixed pallet by calculating the features of the boxes, such as the 3D dimensions, and recognizing the boxes based on a feature range. The feature range is an error threshold surrounding the known dimensions of a certain box. For example, if the system is searching for a cereal box with a face that is 12 X 8 inches, the system will search for a face within the range of 11-13 inches by 7-9 inches if the error threshold is 1 inch. [Section 2C] “For each f stored in F with a barcode bc, the geometrical matching checks if the width and the height characterizing f are compatible with two out of the three dimensions xc, yc, and zc. The candidate face f is recognized, i.e. definitely associated with a case among those listed in the CDB, when the difference between its dimensions and those of the case is less than a given threshold value. Then, the matched case c∗ can be stored in the set of the recognized cases C∗.” The search ranges are adjusted based on which known object is being searched for (i.e. cereal box, shampoo box, etc.).).
Rublee and Arpenti are analogous in the art, because both teach methods of recognizing individual workpieces from a plurality of workpieces by determining depth information and utilizing object recognition methods for feature detection. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Rublee’s invention by searching for objects within a range of feature amounts rather than by using a single feature amount. This modification would consider error thresholds and allow for workpieces with a specific feature to be searched for in noisy images (For example, Arpenti teaches error thresholds when determining lines representing the dimensions and poses of workpieces. [Arpenti Section 2B] “To do this, for each sfeat, the corresponding segment sd is analyzed. If the discontinuity area is less than a given percentage threshold value of the total area of the depth segment sd, than the segment sfeat is stored, otherwise, it is rejected. In the following experiments, such threshold has been experimentally set to 10%, taking into account possible noisy measures of the RGB-D sensor.”).
Regarding claim 3, Rublee and Adepali teach the robot system according to claim 1. Rublee further teaches wherein the thickness calculation unit connects adjacent three-dimensional points in the distance image, characterizes a set of the three-dimensional points, detects the set of three-dimensional points to detect the workpiece (Rublee teaches 3D environment reconstruction where 3D points are characterized into simplified geometric shapes representing recognized workpieces. [Col. 6, lines 54-62] “The sensors may scan an environment containing one or more objects in order to capture visual data and/or 3D depth information. Data from the scans may then be integrated into a representation of larger areas in order to provide digital environment reconstruction. In additional examples, the reconstructed environment may then be used for identifying objects to pick up, determining pick positions for objects, and/or planning collision-free trajectories for the one or more robotic arms and/or a mobile base.” [Col. 10, lines 59-66] “In further examples, wide-angle environment reconstruction may be performed by sensing an environment and extracting that information into a simplified geometric model of simple mathematical 3D geometric forms (e.g., planes, cylinders, cones, hemispheres, etc.). In some instances, such techniques may make motion planning easier and/or may make violation of the models (e.g., collisions) easier to detect.”), and
calculates a thickness of the detected workpiece (In Col. 11, lines 3-67, Rublee discusses recognizing facades, such as a stack of boxes, from the depth map information. Rublee teaches recognizing individual boxes within facades and identifying which boxes the robot will pick up based on observed size measurements of identified surfaces of each box. In Col. 13, line 63 – Col. 14, line 19, Rublee teaches computing object bounding volumes so that objects can be recognized based on their dimensions. Additionally, in Col 12, lines 1-40, Rublee teaches detecting workpieces, recognizing geometrical shapes of the workpieces, and representing workpieces as 3D models within a virtual environment to assist the robotic system with collision detection. This process would require calculating 3D data for each entire workpiece.).
Regarding claim 4, Rublee teaches a control method of a robot system, comprising:
a step of capturing a two-dimensional image of a workpiece (See the cameras 506 and 504 and the light projector 502 of Fig. 5.);
a step of acquiring distance information of the workpiece based on the two-dimensional image (See Figs. 7A-7B. A projector 700 projects a light pattern onto the workpiece 704B, and the reflected light 706A and 706B is observed in the images 702 captured by the two cameras to acquire distance information.);
a step of generating a distance image based on the distance information (Rublee teaches creating a depth map to determine distances to objects. [Col. 5, lines 8-16] “The computing device may then determine a first depth estimate for at least one surface in the environment based on corresponding features between the first image and the second image. By way of example, the computing device may determine mappings of corresponding pixel values within the two images, and based on a physical relationship between the two optical sensors, the computing device can determine a depth map using triangulation.”);
a step of recognizing the workpiece based on the distance image (Rublee teaches using depth maps for object recognition for robot interaction with the workpieces. [Col. 6, lines 54-62] “The sensors may scan an environment containing one or more objects in order to capture visual data and/or 3D depth information. Data from the scans may then be integrated into a representation of larger areas in order to provide digital environment reconstruction. In additional examples, the reconstructed environment may then be used for identifying objects to pick up, determining pick positions for objects, and/or planning collision-free trajectories for the one or more robotic arms and/or a mobile base.”); and
a step of calculating a thickness indicating a vertical dimension of the workpiece based on the distance image (In Col. 11, lines 3-67, Rublee discusses recognizing facades, such as a stack of boxes, from the depth map information. Rublee teaches recognizing individual boxes within facades and identifying which boxes the robot will pick up based on observed size measurements of identified surfaces of each box. In Col. 13, line 63 – Col. 14, line 19, Rublee teaches computing object bounding volumes so that objects can be recognized based on their dimensions. [Col. 13, line 66 – Col. 14, line 9] “In some examples, object bounding volumes may be computed and/or distinguishing features of objects may be found (such as textures, colors, barcodes or OCR). In some embodiments, objects may be sorted into an assigned destination location by matching against a database of location assignments indexed by object type or object ID. For instance, an object's locations may be derived from reading a barcode, considering the size of the object, and/or by recognizing a particular kind of object.” Additionally, in Col 12, lines 1-40, Rublee teaches detecting workpieces, recognizing geometrical shapes of the workpieces, and representing workpieces as 3D models within a virtual environment to assist the robotic system with collision detection. This process would require calculating 3D data for each entire workpiece.), wherein the step of recognizing the workpiece includes:
a step of detecting one or more workpieces from among a plurality of workpieces by selecting one or more feature amounts of a workpiece(In Col. 11, lines 3-67, Rublee discusses recognizing facades, such as a stack of boxes, from the depth map information. Rublee teaches recognizing individual boxes within facades and identifying which boxes the robot will pick up based on observed size measurements of identified surfaces of each box. In Col. 13, line 63 – Col. 14, line 19, Rublee teaches computing object bounding volumes so that objects can be recognized based on their dimensions. [Col. 13, line 66 – Col. 14, line 9] “In some examples, object bounding volumes may be computed and/or distinguishing features of objects may be found (such as textures, colors, barcodes or OCR). In some embodiments, objects may be sorted into an assigned destination location by matching against a database of location assignments indexed by object type or object ID. For instance, an object's locations may be derived from reading a barcode, considering the size of the object, and/or by recognizing a particular kind of object.” Additionally, in Col 12, lines 1-40, Rublee teaches detecting workpieces, recognizing geometrical shapes of the workpieces, and representing workpieces as 3D models within a virtual environment to assist the robotic system with collision detection. This process would require calculating 3D data for each entire workpiece.);
wherein the step of capturing includes capturing a first image and a second image (See the cameras 506 and 504 and the light projector 502 of Fig. 5. Each camera captures an image during the capturing step.), and
wherein the step of acquiring distance information includes searching the second image for an image corresponding to a small zone in the first image, calculating a parallax between the first image and the second image, and converting the parallax to a distance (Rublee teaches identifying the shift of a small zone of the projected light pattern in each image from its expected location for determining depth, and in further examples, Rublee teaches determining small corresponding zones between the two images for determining depth using triangulation. [Col. 16, lines 8-29] “…the computing device may use a block matching algorithm to determine corresponding features between the first image and the second image. Using the block matching algorithm, a segment of pixels of the first image (e.g., a 5×5, 11×11, or 21×21 window of pixels) may be matched against a range of segments of pixels in the second image to determine the closest matching segment of pixels in the second image. For example, the closest matching segment of pixels may be determined by minimizing a match function. Subsequently, given the pixel positions of a pair of corresponding features, one of various triangulation methods can be used to reconstruct the 3D position of the identified feature. According to the process of triangulation, the 3D position (x, y, z) of a point P can be reconstructed from the perspective projection of P on the image planes of a first sensor and a second sensor, given the relative position and orientation of the two sensors. Therefore, if the pixel positions of a pair of corresponding features are known, and the relative position and orientation of the two sensors is known, the 3D position of the feature can be determined.” Additionally, see the background section, Col. 1, line 24 – Col. 2, line 10, where Rublee discusses that structured light methods for determining stereo depth are well known in the art. [Col. 1, lines 30-36] “…two optical sensors with a known physical relationship to one another are used to capture two images of a scene. By finding mappings of corresponding pixel values within the two images and calculating how far apart these common areas reside in pixel space, a computing device can determine a depth map or image using triangulation.”).
Rublee teaches that plans are made for palletizing/depalletizing (for example, large, heavy boxes will be placed at the bottom of the stack and smaller, lighter boxes will be placed at the top of a stack), so searching and selecting boxes based on feature amounts is required by Rublee’s invention. However, Although these feature amounts could likely be a range, Rublee is not specific about the feature amounts being a range rather than a single value. Thus, Rublee fails to teach a step of setting a search range for searching for the one or more feature amounts and a step of adjusting the search range to search for the feature amounts by repeatedly detecting the plurality of workpieces,
However, Arpenti teaches a step of setting a search range for searching for the one or more feature amounts; and a step of adjusting the search range to search for the feature amounts by repeatedly detecting the plurality of workpieces (Arpenti teaches recognizing boxes on a mixed pallet by calculating the features of the boxes, such as the 3D dimensions, and recognizing the boxes based on a feature range. The feature range is an error threshold surrounding the known dimensions of a certain box. For example, if the system is searching for a cereal box with a face that is 12 X 8 inches, the system will search for a face within the range of 11-13 inches by 7-9 inches if the error threshold is 1 inch. [Section 2C] “For each f stored in F with a barcode bc, the geometrical matching checks if the width and the height characterizing f are compatible with two out of the three dimensions xc, yc, and zc. The candidate face f is recognized, i.e. definitely associated with a case among those listed in the CDB, when the difference between its dimensions and those of the case is less than a given threshold value. Then, the matched case c∗ can be stored in the set of the recognized cases C∗.” The search ranges are adjusted based on which known object is being searched for (i.e. cereal box, shampoo box, etc.).).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Rublee’s invention by searching for objects within a range of feature amounts rather than by using a single feature amount. This modification would consider error thresholds and allow for workpieces with a specific feature to be searched for in noisy images (For example, Arpenti teaches error thresholds when determining lines representing the dimensions and poses of workpieces. [Arpenti Section 2B] “To do this, for each sfeat, the corresponding segment sd is analyzed. If the discontinuity area is less than a given percentage threshold value of the total area of the depth segment sd, than the segment sfeat is stored, otherwise, it is rejected. In the following experiments, such threshold has been experimentally set to 10%, taking into account possible noisy measures of the RGB-D sensor.”).
Claim 2 is rejected under 35 USC § 103 as being unpatentable over Rublee (US 9,507,995 B2), in view of Arpenti (“RGB-D Recognition and Localization of Cases for Robotic Depalletizing in Supermarkets.” https://ieeexplore.ieee.org/document/9158347.), and further in view of Konolige et al. (US 9,102,055), hereafter Konolige.
Regarding claim 2, Rublee and Arpenti teach the robot system according to claim 1. Rublee further teaches wherein the workpiece comprises a plurality of workpieces (In Col. 11, lines 3-67, Rublee discusses recognizing facades, such as a stack of boxes, from the depth map information. Rublee teaches recognizing individual boxes within facades and identifying which boxes the robot will pick up based on observed size measurements of identified surfaces of each box. In Col. 13, line 63 – Col. 14, line 19, Rublee teaches computing object bounding volumes so that objects can be recognized based on their dimensions. Additionally, in Col 12, lines 1-40, Rublee teaches detecting workpieces, recognizing geometrical shapes of the workpieces, and representing workpieces as 3D models within a virtual environment to assist the robotic system with collision detection. This process would require calculating 3D data for each entire workpiece.).
Although Rublee teaches determining the 3D dimensions of each workpiece within a plurality of workpieces, Rublee does not mention representing the entire plurality as an average thickness of the individual workpieces. Thus, Rublee fails to teach wherein the thickness calculation unit calculates an average value of thicknesses of the plurality of workpieces based on the distance image to calculate the thickness of the workpiece.
However, Konolige teaches the thickness calculation unit calculates an average value of thicknesses of the plurality of workpieces based on the distance image to calculate the thickness of the workpiece ([Col. 15, line 67] “when the robotic manipulator unloads a box from a stacked pallet of boxes, one or more sensors may measure dimensions of that box and the computing device may update a database including average dimensions or other information based on the measured dimensions of the boxes. The computing device may thus learn about the remaining boxes in the environment based on information gathered about the boxes that were previously in the environment and interacted with by the robotic manipulator.” [Col. 5, line 17] “In some example embodiments, boxes or objects may be automatically organized and placed onto pallets.” [Col. 20, line 10] “For instance, if the average length of a stack of boxes is approximately two meters, the computing device may use a basic 3D rectangular solid with a two-meter length and a width/height that is substantially the same as a given portion of the 2D image. Other examples are possible as well.” Konolige’s invention creates virtual representations of workpieces based on physical workpieces and the physical environment, and it uses the observed physical attributes, which includes average thickness of workpieces, to select appropriate 3D models to represent stacks of workpieces on pallets.).
Rublee and Konolige are analogous in the art to the claimed invention, because both teach robot systems for loading and unloading boxes utilizing similar machine vision systems. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Rublee’s invention by using the average thickness of the plurality of workpieces as the thickness of the workpiece (which consists of the plurality of workpieces). This modification would allow for simplification of the 3D virtual environment (taught by both Konolige and Rublee) by using a single model to represent an entire plurality of workpieces ([Konolige Col. 20, line 10] “For instance, if the average length of a stack of boxes is approximately two meters, the computing device may use a basic 3D rectangular solid with a two-meter length and a width/height that is substantially the same as a given portion of the 2D image.”), which could save on computing resources if necessary.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Gu et al. (US 6,721,444 B1) teaches a system for recognizing objects by capturing stereo images of an object, identifying 2D features of the object, determining the depth to each point on the object by calculating the parallax between features, and directing a robotic arm to interact with the object. This method is also applied to many objects in a stack.
Marapane et al. (Region-Based Stereo Analysis for Robotic Applications. IEEE Transactions on Systems, Man, and Cybernetics. Vol 19. No 6. 1989.) teaches a method of stereo matching where objects are recognized and distance information is determined using stereo imaging. Corresponding regions in both images are compared to determine the distance to each region, and a depth map is created. Objects are recognized from the depth maps.
Ohara et al. (US 10,614,585 B2) teaches an apparatus which receives two images and calculates the parallax between the images by selecting a corresponding block between the images and calculating the shift.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC JAMES SHOEMAKER whose telephone number is (571)272-6605. The examiner can normally be reached Monday through Friday from 8am to 5pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner' s supervisor, JENNIFER MEHMOOD, can be reached at (571)272-2976. The fax phone number for the organization where this application or proceeding is assigned is (571)273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Eric Shoemaker/
Patent Examiner
/JENNIFER MEHMOOD/Supervisory Patent Examiner, Art Unit 2664