Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-7,10-17,20 are rejected under 35 U.S.C. 102a1 as being anticipated by
Chaubard US11481751B1
Regarding Claim 1,
a plurality of cameras positioned at the POS system configured to capture the plurality of images of the plurality of items positioned at the POS system,
wherein each camera captures a corresponding image with a corresponding Field of View (FOV) of the plurality of items thereby capturing different item parameters associated with each item; and
Chaubard is directed to a checkout station that uses sensors and imagery to detect products. (Chaubard, abstract, “A retail store automated checkout system uses images, video, or depth data to recognize products being purchased to expedite the checkout process and improve accuracy. All of a store's products, including those not sold in packages, such as fruits and vegetables, are imaged from a series of different angles and in different lighting conditions, to produce a library of images for each product. This library is used in a checkout system that takes images, video, and depth sensor readings as the products pass through the checkout area and remove bottlenecks later in the checkout process. Recognition of product identifiers or attributes such as barcode, QR code or other symbols, as well as OCR of product names, as well as the size and material of the product, can be additional or supplemental devices for identifying products being purchased.”; col.4 lns. 1-26, “In another implementation of the invention, an existing retail self-checkout station is retrofitted with equipment of the invention. In the upgraded form, the self-checkout station will recognize, by computer vision, produce and other non-coded items. One or more cameras are positioned at the station, at different viewing angles, for imaging each item brought forth by the customer. A scale is already included for produce sold by weight, and preferably an interactive touchscreen is provided for confirmation by the customer in the event, for example, the system does not recognize to a sufficiently high probability that selected produce is oranges or grapefruits, or between two different types of oranges or apples. Again, this further information can be added to the library once the customer has confirmed product identity.”))
at least one processor;
a memory coupled with the at least one processor, the memory including instructions that, when executed by the at least one processor cause the at least one processor to:
identify each corresponding item positioned at the POS system when the plurality of item parameters from the plurality of images of each item captured by the plurality of cameras match the item parameters of a previously identified item thereby resulting in a known item …
(Chaubard,col.15,ln.5-col.16,ln.30, “4. Per Object, Perform the Following Analysis a. Measure Length, Width, and Height of the Object and x, y, z of the centroid in global coordinate system This is very straightforward to do once you have a merged track. b. 3D CNN to attempt to classify the Object as a specific UPC and deliver a probability distribution over the possible UPCs This is very straightforward to do once you have a merged track with template matching or a CNN trained on those UPCs. c. 3D CNN to identify the superclass of the Object as a type of product, the colors of the product, the material it is made of, the shape of it, etc. We would train a 3D CNN to output superclass information like this. d. OCR to read any text on the Object We would use an Optical Character Recognition (OCR) algorithm on each image captured at time t. There are many OCR algorithms to choose from. In one embodiment, we would use a CNN to infer a rotated bounding box [x1, y1, x2, y2, angle] for every word in the image. We would then input each bounding box crop in the image into a Convolutional Recurrent Neural Network (CRNN) to output the character sequence inside the box. In some embodiments, we may use a lexicon of possible words and use a Jaccard Similarity between each word in the lexicon and the outputted word to find the true word. If the detected words are near each other, we may merge them into a phrase. In all embodiments, we would take each detected set of words or phrases and assign it to the Object from which the bounding box has the highest Intersection Over Union with. e. Barcode Detection on all sides of the Object Same as OCR except we look for barcodes, not text. f. 3D CNN and 2D CNN to compute an embedding vector for the Object In some embodiments we would use a 3D CNN to compute an embedding vector per Object. This 3D CNN would be trained by taking a database of known Objects some of the same UPCs. At each stage of training we would select two training Objects of the same UPC and one training Object of a different UPC, and use a loss function like triplet embedding loss to give penalty to the 3D CNN if the model does not recognize the same Objects as the same and the different Object as different. In other embodiments we would use a 2D CNN to detect general purpose bounding boxes per camera. Then we would match each bounding box across all cameras to a specific Object. For each bounding box, we would compute an embedding vector that would encode and represent the set of pixels in the bounding box as a vector of n numbers. The 2D CNN would be trained in a similar process as the 3D CNN but instead the training Objects would be training bounding boxes. In some embodiments we will use both approaches”)
and fail to identify each corresponding item when the plurality of item parameters from the plurality of images of each item fail to match the item parameters of a previously identified item thereby resulting in an unknown item,
(Chaubard, summary, “If an item in the checkout area is recognized to a sufficiently high probability, meeting a preset threshold of, e.g., 90%, that item is added to the checkout list and charged to the customer. If probability is too low, below the preset threshold, the system may prompt the customer or cashier to reposition the item if it appears stacking or occlusion has occurred, or to remove and scan that item using the store's existing scanning system.”)
map a plurality of image pixels associated with each unknown item as extracted from each image of each unknown item as captured by the plurality of cameras to a plurality of real world coordinates associated with each unknown item as extracted from a position of each unknown item as positioned at the POS system,
(Chaubard, col.14,lns.10-38, “Every camera in the system has its own coordinate system. To fuse this information for each camera in a coherent way, the system will have a fixed and known position and orientation (position=x,y,z, orientation=α, β, γ and scaling=s.sub.x, s.sub.y, s.sub.z) from the global origin to each cameras coordinate system. These 9 numbers will be theoretically static for the life of the system but may require a recalibration from time to time. These parameters can be used to compute an Extrinsic Matrix, H, per camera which includes the following transformations: Rotation, Translation, and Scale. Depending on the type of camera used, each camera may also have a Intrinsic Matrix, K, which defines the deformation from the real world 3D space to the 2D space on the cameras sensor that is not modeled by H. Then H and K together will be used to convert each camera's specific point cloud in its own coordinate system to the global coordinate system, giving us a final fused point cloud across all cameras D.sub.t=[[R, G, B, X, Y, Z] . . . ], which if the cameras are positioned correctly and the products are not stacked, give us a full 360 volumetric coverage of each product.
(56) Once we have the global fused point cloud D.sub.t, we can then run a clustering algorithm on D.sub.t to detect the existence of distinct objects. There are many clustering algorithms that can be used for this task. In one embodiment, we use a 3D CNN or RNN to cluster each specific point p=(x, y, z, R, G, B) to one of an unknown number of centroids. In other embodiments, we may be able to achieve the same thing with a simpler model like Density-based Scanning (DB-Scan) or Mean Shift.”; col.14,lns.42-50, : In another embodiment, we apply an Oriented 3D volume detection from Pixel-wise neural network predictions where we take each cameras pixels and push them through a fully-convolutional neural network as the backbone and merge this data with a Header Network that is tasked with localizing the Objects size (height, length, width), shape (classification, embedding, or otherwise), and heading (angle/orientation in the global coordinate system in α, β, and γ).”)
generate a corresponding bounding polygon for each unknown item that encapsulates each unknown item within the corresponding bounding polygon based on the image pixels with each unknown item as mapped to the real world coordinates associated with each unknown item, and
project the corresponding bounding polygon onto each unknown item positioned at the POS system to encapsulate each unknown item thereby providing visual feedback for each unknown item positioned at the POS system.
(Chaubard, col.5,ln.55-col.6,ln.2, “If there are overlapping items or stacked items that occlude the camera or cameras, or if there are items in the tunnel that the cameras failed to capture or identify to the threshold probability, or if the tunnel otherwise made an error, then the non-identified items will be shown visually on the screen 20. As an example, a red bounding box can be shown around the unidentified item. The shopper or cashier or auditor in the store, can then manually correct the error by taking the item or items that were not identified and moving them (dashed line 25) by the pre-existing barcode scanner 24 or another scanner, as indicated at 26, therefore correcting the list of items and the amount owed. Alternatively, the system can prompt the customer or cashier to select from a list of screen-shown choices to confirm the correct product from a list of system-generated guesses.”)
Regarding Claim 2, Chaubard discloses the system of claim 1.
extract the plurality of item parameters associated with each item positioned at the POS system from the plurality of images of each item captured by the plurality of cameras positioned at the POS system, wherein the item parameters associated with each item when combined are indicative as to an identification of each corresponding item thereby enabling the identification of each corresponding item;
analyze the plurality of item parameters associated with each item positioned at the POS system to determine whether the item parameters when combined matches corresponding item parameters stored in an item parameter identification database,
wherein the item parameter identification database stores different combinations of item parameters with each different combination of item parameters associated with a corresponding item thereby identifying each corresponding item based on different combination of item parameters associated with each corresponding item; and
identify each corresponding item positioned at the POS system when the plurality of item parameters when combined matches corresponding item parameters as stored in the item parameter identification database and fail to identify each corresponding item when the plurality of item parameters when combined fails to match corresponding item parameters.
See prior art rejection of claim 1.
Regarding Claim 3, Chaubard discloses the system of claim 2.
calibrate each camera positioned at the POS system to determine the plurality of real world coordinates of the POS system relative to a corresponding position of each camera, wherein the calibration of each camera at the POS system enables the real world coordinates of the POS system to be mapped to the plurality of image pixels as extracted from each image captured by each camera.
(Chaubard, col.14, lns.10-17, “Every camera in the system has its own coordinate system. To fuse this information for each camera in a coherent way, the system will have a fixed and known position and orientation (position=x,y,z, orientation=α, β, γ and scaling=s.sub.x, s.sub.y, s.sub.z) from the global origin to each cameras coordinate system. These 9 numbers will be theoretically static for the life of the system but may require a recalibration from time to time.”)
Regarding Claim 4, Chaubard discloses the system of claim 1.
extract a plurality of metrology features as included in the plurality of item parameters of each item from each image captured by the plurality of cameras for each unknown item positioned
at the POS system, wherein the plurality of metrology features is indicative as to a physical appearance of each unknown item positioned at the POS system.
See prior art rejection of claim 1.
Regarding Claim 5, Chaubard discloses the system of claim 4.
map the plurality of pixels associated with each unknown item as extracted from each image of each unknown item of each unknown item to the plurality of real world coordinates associated with each unknown item from the position of each unknown item based on the metrology features of each unknown item, wherein the metrology features of each unknown item enable the plurality of pixels to be mapped to the real world coordinates of each unknown item based on the physical appearance of each unknown item at the POS system.
(Chaubard, col.14,lns.42-50, : In another embodiment, we apply an Oriented 3D volume detection from Pixel-wise neural network predictions where we take each cameras pixels and push them through a fully-convolutional neural network as the backbone and merge this data with a Header Network that is tasked with localizing the Objects size (height, length, width), shape (classification, embedding, or otherwise), and heading (angle/orientation in the global coordinate system in α, β, and γ).”)
Regarding Claim 6, Chaubard discloses the system of claim 5.
generate the corresponding bounding polygon for each unknown item that encapsulates each unknown item within the corresponding bounding polygon based on the image pixels of the metrology features associated with each unknown item as mapped to the real world coordinates of the metrology features associated with each unknown item thereby enabling the corresponding bounding polygon to encapsulate the physical appearance of each unknown item at the POS system; and
project the corresponding bounding onto each unknown item positioned at the POS system to encapsulate the physical appearance of each unknown item based on the metrology features of the image pixels associated with each unknown item mapped to the metrology features of the real world coordinates of each unknown item thereby providing visual feedback for each unknown item positioned at the POS system.
(Chaubard, col.8,ln.65-col.9,ln.15, “The preferred embodiment works in the following way. First a Convolutional Neural Network (CNN) is run to infer if there exists a product at each part of the image, and if so impose a bounding shape to fit around each distinct product (perhaps a box or polygon, and if using depth, a bounding cuboid or prism or frustrum. Each bounding shape is processed individually, running optical character recognition (OCR) on it to attempt to read any words on the packaging of the product and using a barcode detection algorithm to try to find and recognize barcodes if in the image, if such exist or are able to be read. In addition, a template matching algorithm is applied, taking a database of labeled “known” products or by using a classifier that was already trained to detect those “known” products and attempting to match each one to the pixels in the bounding shape. This template matching process takes in a known image and the bounding box image and outputs a probability of match.”)
Regarding Claim 7, Chaubard discloses the system of claim 6.
analyze an accumulation of metrology features and item parameters as associated with each corresponding item in the item parameter identification database to determine the metrology features of each unknown item positioned at the POS system;
map the plurality of pixels associated with each unknown item as extracted from each image of each unknown item of each unknown item to the plurality of real world coordinates associated with each unknown item from the position of each unknown item based on the accumulation of metrology features and item of each unknown item as stored in the item parameter identification database, wherein the accumulation metrology features and item parameters of each unknown item enable the plurality of pixels to be mapped to the real world coordinates of each unknown item based on the physical appearance of each unknown item at the POS system; and
generate the corresponding bounding polygon for each unknown item that encapsulates each unknown item within the corresponding bounding polygon based on the image pixels of the accumulation of metrology features and item parameters associated with each unknown item as mapped to the real world coordinates of the metrology features associated with each unknown item thereby enabling the corresponding bounding polygon to encapsulate the physical appearance of each unknown item at the POS system.
(Chaubard, col.15,ln.35-col.16,ln.37, “4. Per Object, Perform the Following Analysis a. Measure Length, Width, and Height of the Object and x, y, z of the centroid in global coordinate system This is very straightforward to do once you have a merged track. b. 3D CNN to attempt to classify the Object as a specific UPC and deliver a probability distribution over the possible UPCs This is very straightforward to do once you have a merged track with template matching or a CNN trained on those UPCs. c. 3D CNN to identify the superclass of the Object as a type of product, the colors of the product, the material it is made of, the shape of it, etc. We would train a 3D CNN to output superclass information like this. d. OCR to read any text on the Object We would use an Optical Character Recognition (OCR) algorithm on each image captured at time t. There are many OCR algorithms to choose from. In one embodiment, we would use a CNN to infer a rotated bounding box [x1, y1, x2, y2, angle] for every word in the image. We would then input each bounding box crop in the image into a Convolutional Recurrent Neural Network (CRNN) to output the character sequence inside the box. In some embodiments, we may use a lexicon of possible words and use a Jaccard Similarity between each word in the lexicon and the outputted word to find the true word. If the detected words are near each other, we may merge them into a phrase. In all embodiments, we would take each detected set of words or phrases and assign it to the Object from which the bounding box has the highest Intersection Over Union with. e. Barcode Detection on all sides of the Object Same as OCR except we look for barcodes, not text. f. 3D CNN and 2D CNN to compute an embedding vector for the Object In some embodiments we would use a 3D CNN to compute an embedding vector per Object. This 3D CNN would be trained by taking a database of known Objects some of the same UPCs. At each stage of training we would select two training Objects of the same UPC and one training Object of a different UPC, and use a loss function like triplet embedding loss to give penalty to the 3D CNN if the model does not recognize the same Objects as the same and the different Object as different. In other embodiments we would use a 2D CNN to detect general purpose bounding boxes per camera. Then we would match each bounding box across all cameras to a specific Object. For each bounding box, we would compute an embedding vector that would encode and represent the set of pixels in the bounding box as a vector of n numbers. The 2D CNN would be trained in a similar process as the 3D CNN but instead the training Objects would be training bounding boxes. In some embodiments we will use both approaches!
(67) 5. Ensemble of all Features to Classify the Object Ft Features for this Object
(68) The key to detecting and identifying each Object with high certainty is to take a number of separate approaches and ensemble the results.
(69) The output of this process will be a predicted SKU per Object with a corresponding confidence score.”)
Regarding Claim 10, Chaubard discloses the system of claim 1.
automatically display via a user interface a notification of each unknown item positioned at the POS system as encapsulated within the corresponding bounding polygon as projected onto each unknown item positioned at the POS system.
See prior art rejection of claim 1.
Allowable Subject Matter
Claims 8,9,18,19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALLEN C CHEIN whose telephone number is (571)270-7985. The examiner can normally be reached Monday-Friday 8am -5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Florian Zeender can be reached at (571) 272-6790. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ALLEN C CHEIN/Primary Examiner, Art Unit 3627