DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of the Claims
Claims 1-20, as originally filed, are currently pending and have been considered below.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-5, 9-13 and 16-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goncalves, U.S. Publication No. 2011/0215147, hereinafter, “Goncalves”, and further in view of Falcão, João, et al. "Faim: Vision and weight sensing fusion framework for autonomous inventory monitoring in convenience stores." Frontiers in Built Environment 6 (2020): 568372, hereinafter, “Falcao”.
As per claim 1, Goncalves discloses a method, comprising:
by a point of sale (POS) system operationally coupled to a load sensor device and an optical sensor device, with the load sensor device being operable to measure a load of an object while positioned on a load surface of the POS system, the optical sensor device having a field of view associated with the load surface and operable to capture an image (Goncalves, Abstract, A system and method is disclosed for using object recognition/verification and weight information to confirm the accuracy of a UPC scan, or to provide an affirmative recognition where no UPC scan was made. In the preferred embodiment, the checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more image of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known object … and identify one of the known objects based on a best match of the geometric transform; Goncalves, ¶0002, One particular field of the invention relates to systems and methods for using visual appearance and weight information to augment universal product code (UPC) scans in order to insure that items are properly identified and accounted for at ring up),
obtaining an image captured by the optical sensor device that includes a visual representation of at least a portion of a target object and a load measurement associated with the target object that is performed by the load sensor device while the target object is positioned on the load surface to enable object classification or identification of the target object based on both an object recognition of the target object represented in the captured image and a prediction of the target object from the load measurement associated with the target object (Goncalves, ¶0004, a checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more images of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known objects … and identify one of the known objects based on a best match of the geometric transform; Goncalves, ¶0017, The self-checkout stations 100, 200 in these embodiments include a counter top 102 with a UPC scanner 120, a scale 180 for determining the weight of an item … One or more video cameras are trained on the counter and the bagging area for purposes of detecting the presence and/or identity of items of merchandise as they are scanned and bagged; Goncalves, ¶0018, The weight scale 180 is incorporated into the belt conveyor 140 in FIG. 2 so as to determine the weight of an item as it is passed to the bagging area 150. In still other embodiments, the scale is incorporated into the UPC scanner bed 120; Goncalves, ¶0019, As shown in FIG. 2, a camera 160 may be trained to capture images of items of the belt 140; Goncalves ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code. As described above, one or more images are acquired and geometric point features extracted from the images. The extracted features are compared to the visual features of known objects in the image database. The identity of the item as well as its UPC code can then be determined based on the number and quality of matching visual features; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent with the UPC code on the item; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item. If all data is consistent, the item is added to the checkout list),
Goncalves does not explicitly disclose the following limitations as further recited however Falcao discloses
with the object recognition being independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Falcao with Goncalves because they are in the same field of endeavor. One skilled in the art would have been motivated to include the visual appearance model of Falcao in the system of Goncalves in order to provide a means to recognize products that might be occluded or otherwise unclear due to camera resolution (Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities).
As per claim 2, Goncalves and Falcao disclose the method of claim 1, further comprising: receiving, by a processing circuit of the POS system, from the load sensor device, an indication that includes the load measurement (Goncalves, ¶0018, The weight scale 180 is incorporated into the belt conveyor 140 in FIG. 2 so as to determine the weight of an item).
As per claim 3, Goncalves and Falcao disclose the method of claim 1, further comprising: receiving, by a processing circuit of the POS system, from the optical sensor device, an indication that includes the captured image (Goncalves, ¶0004, a checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more images of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known objects).
As per claim 4, Goncalves and Falcao disclose the method of claim 1. Falcao discloses further comprising:
determining to capture an image of the target object responsive to determining that a weight change event has occurred based on the load measurement (Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework … FAIM’s pipeline is triggered when a change in the total weight of a shelf is detected; Falcao, page 5, 3.4. Customer-Shelf Interaction Detection, The first step in FAIM’s pipeline is to detect when an event took place (i.e., a customer picked up or put back an item on a shelf). In our proposed system architecture, displayed in Figure 2, the processing of every change in the inventory starts with a weight change trigger … the weight difference on the load sensors is generally enough to detect an event); and
sending, by the processing circuit of the POS system, to the optical sensor device, an indication that includes a request to capture the image (Falcao, page 6, 3.5. Vision Event Extraction, uses the Weight Change Event Detection trigger from section 3.4 to start analyzing the images). The motivation would be the same as above in claim 1.
As per claim 5, Goncalves and Falcao disclose the method of claim 1, further comprising: performing object recognition of the target object represented in the captured image based on the captured image, with the object recognition being performed independent of the focal distance at which the target object was captured by the optical sensor device (Goncalves, ¶0004, a checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more images of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known objects; generate a geometric transform between the extracted geometric point features and the features of known objects for a subset of known objects corresponding to matches; and identify one of the known objects based on a best match of the geometric transform; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities).
As per claim 9, Goncalves and Falcao disclose the method of claim 1, further comprising:
recognizing the target object based on the captured image to obtain one or more vision-based predicted objects and corresponding vision-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code),
with the recognition being performed independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them).
As per claim 10, Goncalves and Falcao disclose the method of claim 1. Falcao discloses further comprising: predicting the target object based on the weight measurement of the target object to obtain one or more weight-based predicted objects and corresponding weight-based confidence levels (Falcao, page 3, 3.1. System Overview, FAIM’s pipeline is triggered when a change in the total weight of a shelf is detected. From that it extracts two features: the absolute weight difference and the spatial distribution of the weight. The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model ... the probability of Δμm given Δμ can be defined as a normal distribution ... using the Bayes’s rule the probability of the item belonging to each product class is determined). The motivation would be the same as above in claim 1.
As per claim 11, Goncalves and Falcao disclose the method of claim 1, further comprising: performing object classification or identification of the target object based on one or more vision-based predicted objects and corresponding vision-based confidence levels and one or more weight-based predicted objects and corresponding weight-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item).
As per claim 12, Goncalves and Falcao disclose the method of claim 1, further comprising:
object recognizing the target object based on the captured image to obtain one or more vision-based predicted objects and corresponding vision-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code),
with the object recognition being performed independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them);
predicting the target object based on the weight measurement of the target object to obtain one or more weight-based predicted objects and corresponding weight-based confidence levels (Falcao, page 3, 3.1. System Overview, FAIM’s pipeline is triggered when a change in the total weight of a shelf is detected. From that it extracts two features: the absolute weight difference and the spatial distribution of the weight. The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model ... the probability of Δμm given Δμ can be defined as a normal distribution ... using the Bayes’s rule the probability of the item belonging to each product class is determined); and
performing object classification or identification of the target object based on the one or more vision-based predicted objects and the corresponding vision-based confidence levels and the one or more weight-based predicted objects and the corresponding weight-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item).
As per claim 13, Goncalves discloses a point of sale (POS) system, comprising:
with the POS system being operationally coupled to a load sensor device and an optical sensor device, with the load sensor device being operable to measure a load of an object while positioned on a load surface of the POS system, the optical sensor device having a field of view that includes the load surface and being operable to capture an image (Goncalves, Abstract, A system and method is disclosed for using object recognition/verification and weight information to confirm the accuracy of a UPC scan, or to provide an affirmative recognition where no UPC scan was made. In the preferred embodiment, the checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more image of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known object … and identify one of the known objects based on a best match of the geometric transform; Goncalves, ¶0002, One particular field of the invention relates to systems and methods for using visual appearance and weight information to augment universal product code (UPC) scans in order to insure that items are properly identified and accounted for at ring up),
wherein the POS system further includes a memory, the memory containing instructions executable by the processing circuitry whereby the processing circuitry is configured to: obtain an image captured by the optical sensor device that includes a visual representation of at least a portion of a target object and a load measurement associated with the target object that is performed by the load sensor device while the target object is positioned on the load surface to enable object classification or identification of the target object based on both an object recognition of the target object represented in the captured image and a prediction of the target object from the load measurement associated with the target object (Goncalves, ¶0004, a checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more images of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known objects … and identify one of the known objects based on a best match of the geometric transform; Goncalves, ¶0017, The self-checkout stations 100, 200 in these embodiments include a counter top 102 with a UPC scanner 120, a scale 180 for determining the weight of an item … One or more video cameras are trained on the counter and the bagging area for purposes of detecting the presence and/or identity of items of merchandise as they are scanned and bagged; Goncalves, ¶0018, The weight scale 180 is incorporated into the belt conveyor 140 in FIG. 2 so as to determine the weight of an item as it is passed to the bagging area 150. In still other embodiments, the scale is incorporated into the UPC scanner bed 120; Goncalves, ¶0019, As shown in FIG. 2, a camera 160 may be trained to capture images of items of the belt 140; Goncalves ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code. As described above, one or more images are acquired and geometric point features extracted from the images. The extracted features are compared to the visual features of known objects in the image database. The identity of the item as well as its UPC code can then be determined based on the number and quality of matching visual features; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent with the UPC code on the item; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item. If all data is consistent, the item is added to the checkout list).
Goncalves does not explicitly disclose the following limitations as further recited however Falcao discloses
with the object recognition being independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Falcao with Goncalves because they are in the same field of endeavor. One skilled in the art would have been motivated to include the visual appearance model of Falcao in the system of Goncalves in order to provide a means to recognize products that might be occluded or otherwise unclear due to camera resolution (Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities).
As per claim 16, Goncalves and Falcao disclose the POS system of claim 13, wherein the memory includes further instructions executable by the processing circuitry whereby the processing circuitry is configured to:
recognize the target object based on the captured image to obtain one or more vision-based predicted objects and corresponding confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code),
with the recognition being performed independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them).
As per claim 17, Goncalves and Falcao disclose the POS system of claim 13. Falcao discloses wherein the memory includes further instructions executable by the processing circuitry whereby the processing circuitry is configured to:
predict the target object based on the weight measurement of the target object to obtain one or more weight-based predicted objects and corresponding confidence levels (Falcao, page 3, 3.1. System Overview, FAIM’s pipeline is triggered when a change in the total weight of a shelf is detected. From that it extracts two features: the absolute weight difference and the spatial distribution of the weight. The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model ... the probability of Δμm given Δμ can be defined as a normal distribution ... using the Bayes’s rule the probability of the item belonging to each product class is determined). The motivation would be the same as above in claim 13.
As per claim 18, Goncalves and Falcao disclose the POS system of claim 13, wherein the memory includes further instructions executable by the processing circuitry whereby the processing circuitry is configured to:
perform classification or identification of the target object based on one or more vision-based predicted objects and corresponding vision-based confidence levels and one or more weight-based predicted objects and corresponding weight-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item).
As per claim 19, Goncalves and Falcao disclose the POS system of claim 13, wherein the memory includes further instructions executable by the processing circuitry whereby the processing circuitry is configured to:
object recognize the target object based on the captured image to obtain one or more vision-based predicted objects and corresponding vision-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code),
with the object recognition being performed independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them);
predict the target object based on the weight measurement of the target object to obtain one or more weight-based predicted objects and corresponding weight-based confidence levels (Falcao, page 3, 3.1. System Overview, FAIM’s pipeline is triggered when a change in the total weight of a shelf is detected. From that it extracts two features: the absolute weight difference and the spatial distribution of the weight. The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model ... the probability of Δμm given Δμ can be defined as a normal distribution ... using the Bayes’s rule the probability of the item belonging to each product class is determined); and
perform classification or identification of the target object based on the one or more vision-based predicted objects and the corresponding vision-based confidence levels and the one or more weight-based predicted objects and the corresponding weight-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item).
As per claim 20, Goncalves discloses a point of service (POS) system, comprising:
a load sensor device operable to measure a load of an object while positioned on a load surface of the POS system; optical sensor device having a field of view associated with the load surface and operable to capture an image (Goncalves, Abstract, A system and method is disclosed for using object recognition/verification and weight information to confirm the accuracy of a UPC scan, or to provide an affirmative recognition where no UPC scan was made. In the preferred embodiment, the checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more image of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known object … and identify one of the known objects based on a best match of the geometric transform; Goncalves, ¶0002, One particular field of the invention relates to systems and methods for using visual appearance and weight information to augment universal product code (UPC) scans in order to insure that items are properly identified and accounted for at ring up); and
a processing circuitry and a memory containing instructions executable by the processing circuitry whereby the processing circuitry is operative to: obtain an image captured by the optical sensor device that includes a visual representation of at least a portion of a target object and a load measurement associated with the target object that is performed by the load sensor device while the target object is positioned on the load surface to enable object classification or identification of the target object based on both an object recognition of the target object represented in the captured image and a prediction of the target object from the load measurement associated with the target object (Goncalves, ¶0004, a checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more images of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known objects … and identify one of the known objects based on a best match of the geometric transform; Goncalves, ¶0017, The self-checkout stations 100, 200 in these embodiments include a counter top 102 with a UPC scanner 120, a scale 180 for determining the weight of an item … One or more video cameras are trained on the counter and the bagging area for purposes of detecting the presence and/or identity of items of merchandise as they are scanned and bagged; Goncalves, ¶0018, The weight scale 180 is incorporated into the belt conveyor 140 in FIG. 2 so as to determine the weight of an item as it is passed to the bagging area 150. In still other embodiments, the scale is incorporated into the UPC scanner bed 120; Goncalves, ¶0019, As shown in FIG. 2, a camera 160 may be trained to capture images of items of the belt 140; Goncalves ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code. As described above, one or more images are acquired and geometric point features extracted from the images. The extracted features are compared to the visual features of known objects in the image database. The identity of the item as well as its UPC code can then be determined based on the number and quality of matching visual features; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent with the UPC code on the item; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item. If all data is consistent, the item is added to the checkout list).
Goncalves does not explicitly disclose the following limitations as further recited however Falcao discloses
with the object recognition being independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Falcao with Goncalves because they are in the same field of endeavor. One skilled in the art would have been motivated to include the visual appearance model of Falcao in the system of Goncalves in order to provide a means to recognize products that might be occluded or otherwise unclear due to camera resolution (Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities).
Claim(s) 6-8, 14 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over- Goncalves, U.S. Publication No. 2011/0215147, hereinafter, “Goncalves”, in view of Falcão, João, et al. "Faim: Vision and weight sensing fusion framework for autonomous inventory monitoring in convenience stores." Frontiers in Built Environment 6 (2020): 568372, hereinafter, “Falcao” as applied to claim 5 and 13 above, and further in view of Rodriguez et al., U.S. Publication No. 2021/0334590, hereinafter, “Rodriguez”.
As per claim 6, Goncalves and Falcao disclose the method of claim 5, wherein the step of performing object recognition further includes:
sending, by a processing circuit of the POS system, to an artificial intelligence circuit, an indication that includes a request to perform the object recognition of the target object represented in the captured image based on the captured image, with the artificial intelligence circuit being trained on a set of training images of a certain object that is configured to enable the classification or identification of the certain object independent of the focal distance at which the certain object was captured by the optical sensor device (Falcao, pages 7-8, 3.6.3. Item Appearance Model, the best visual identification results are obtained through two data-driven approaches: (1) using visual descriptors ... and (2) Convolutional Neural Network (CNN) based models … We trained a CNN and generated visual descriptors for all 33 products. We then used the visual descriptors to validate the training accuracy for similar looking products … We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those products); and
receiving, by the processing circuit of the POS system, from the artificial intelligence circuit, an indication that includes one or more visual-based predicted objects (Falcao, pages 7-8, 3.6.3. Item Appearance Model, We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those product).
Goncalves and Falcao further disclose (Falcao, page 10, 4.3. Vision-Based Item Identification, to output a single probability value for each product class. Unlike weight and location, which are very hard to occlude, visual classifiers often suffer from temporary occlusions - especially for smaller items ... As a consequence, simply concatenating (i.e., multiplying) the logits (classification score) of all objects would lead to undesired results, since an item not detected in a frame would end with a probability of 0 regardless of how confident all other frames were. We instead propose using a noisy OR model, which in essence computes the probability PV(I = i) that each product was seen by taking the complement of the probability that the product was never seen) but do not explicitly disclose the following limitation as further recited however Rodriguez discloses
corresponding visual-based confidence levels (Rodriguez, ¶0038, if a barcode or other machine-readable label is present, the system identifies and logs the item based on this label (e.g., using an optical scanner, or by identifying the label in the captured Image 225). If no label is found (e.g., if the system searches the Image 225 and cannot locate any machine-readable identifiers), the Machine Learning Component 220 can instead process the image to classify it and return an Identification 230; Rodriguez, ¶0039, in some embodiments, the Machine Learning Component 220 generates a confidence of the Identification 230).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Rodriguez with Goncalves and Falcao because they are in the same field of endeavor. One skilled in the art would have been motivated to substitute the confidence of the identification as taught by Rodriguez for the probability as taught by Goncalves and Falcao as an alternate means to determine the classification of the product (Rodriguez, ¶0039).
As per claim 7, Goncalves, Falcao and Rodriguez disclose the method of claim 6, wherein the set of training images of the certain object are captured by an optical sensor device at a certain distance from the certain object, with the certain distance corresponding to a distance in which the optical sensor device captures an image of the target object while positioned on the load surface (Rodriguez, ¶0021, FIG. 1 illustrates an environment 100 configured to automatically collect and curate training data and deploy machine learning models, according to one embodiment disclosed herein. In the illustrated embodiment, a Camera 115 or other imaging device captures images and/or video of Items 125 in a Sensing Zone 105, and relays the data to a Computer 120. The Sensing Zone 105 generally includes any area or location where data about one or more Items 125 is collected. For example, the Sensing Zone 105 may include an optical scanner (e.g., to identify bar codes), one or more load cells to determine the weight of the Item 125, and the like. Although depicted as a discrete component for conceptual clarity, in some embodiments, the Camera 115 may be integrated into the Sensing Zone 105).
As per claim 8, Goncalves and Falcao disclose the method of claim 1, wherein the step of performing object recognition further includes:
sending, by the POS system, to a network node having an artificial intelligence circuit, an indication that includes a request to perform the object recognition of the target object represented in the captured image based on the captured image, with the artificial intelligence circuit being trained on a set of training images of a certain object, with the set of training images being configured to enable the object classification or identification of the certain object independent of the focal distance at which the certain object was captured by the optical sensor device (Falcao, pages 7-8, 3.6.3. Item Appearance Model, the best visual identification results are obtained through two data-driven approaches: (1) using visual descriptors ... and (2) Convolutional Neural Network (CNN) based models … We trained a CNN and generated visual descriptors for all 33 products. We then used the visual descriptors to validate the training accuracy for similar looking products … We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those products); and
receiving, by the POS system, from the network node, an indication that includes one or more visual-based predicted objects (Falcao, pages 7-8, 3.6.3. Item Appearance Model, We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those product).
Goncalves and Falcao further disclose (Falcao, page 10, 4.3. Vision-Based Item Identification, to output a single probability value for each product class. Unlike weight and location, which are very hard to occlude, visual classifiers often suffer from temporary occlusions - especially for smaller items ... As a consequence, simply concatenating (i.e., multiplying) the logits (classification score) of all objects would lead to undesired results, since an item not detected in a frame would end with a probability of 0 regardless of how confident all other frames were. We instead propose using a noisy OR model, which in essence computes the probability PV(I = i) that each product was seen by taking the complement of the probability that the product was never seen) but do not explicitly disclose the following limitation as further recited however Rodriguez discloses
corresponding visual-based confidence levels (Rodriguez, ¶0038, if a barcode or other machine-readable label is present, the system identifies and logs the item based on this label (e.g., using an optical scanner, or by identifying the label in the captured Image 225). If no label is found (e.g., if the system searches the Image 225 and cannot locate any machine-readable identifiers), the Machine Learning Component 220 can instead process the image to classify it and return an Identification 230; Rodriguez, ¶0039, in some embodiments, the Machine Learning Component 220 generates a confidence of the Identification 230).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Rodriguez with Goncalves and Falcao because they are in the same field of endeavor. One skilled in the art would have been motivated to substitute the confidence of the identification as taught by Rodriguez for the probability as taught by Goncalves and Falcao as an alternate means to determine the classification of the product (Rodriguez, ¶0039).
As per claim 14, Goncalves and Falcao disclose the POS system of claim 13, wherein the memory includes further instructions executable by the processing circuitry whereby the processing circuitry is configured to:
send, by a processing circuit of the POS system, to an artificial intelligence circuit, an indication that includes a request to perform the object recognition of the target object represented in the captured image based on the captured image, with the artificial intelligence circuit being trained on a set of training images of a certain object that is configured to enable the classification or identification of the certain object independent of the focal distance at which the certain object was captured by the optical sensor device (Falcao, pages 7-8, 3.6.3. Item Appearance Model, the best visual identification results are obtained through two data-driven approaches: (1) using visual descriptors ... and (2) Convolutional Neural Network (CNN) based models … We trained a CNN and generated visual descriptors for all 33 products. We then used the visual descriptors to validate the training accuracy for similar looking products … We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those products);
receive, by the processing circuit of the POS system, from the artificial intelligence circuit, an indication that includes one or more vision-based predicted objects (Falcao, pages 7-8, 3.6.3. Item Appearance Model, We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those product).
Goncalves and Falcao further disclose (Falcao, page 10, 4.3. Vision-Based Item Identification, to output a single probability value for each product class. Unlike weight and location, which are very hard to occlude, visual classifiers often suffer from temporary occlusions - especially for smaller items ... As a consequence, simply concatenating (i.e., multiplying) the logits (classification score) of all objects would lead to undesired results, since an item not detected in a frame would end with a probability of 0 regardless of how confident all other frames were. We instead propose using a noisy OR model, which in essence computes the probability PV(I = i) that each product was seen by taking the complement of the probability that the product was never seen) but do not explicitly disclose the following limitation as further recited however Rodriguez discloses
corresponding vision-based confidence levels (Rodriguez, ¶0038, if a barcode or other machine-readable label is present, the system identifies and logs the item based on this label (e.g., using an optical scanner, or by identifying the label in the captured Image 225). If no label is found (e.g., if the system searches the Image 225 and cannot locate any machine-readable identifiers), the Machine Learning Component 220 can instead process the image to classify it and return an Identification 230; Rodriguez, ¶0039, in some embodiments, the Machine Learning Component 220 generates a confidence of the Identification 230); and
wherein the set of training images are captured by an optical sensor device at a certain distance from the certain object, with the certain distance corresponding to a distance in which the optical sensor device captures an image of the target object while positioned on the load surface (Rodriguez, ¶0021, FIG. 1 illustrates an environment 100 configured to automatically collect and curate training data and deploy machine learning models, according to one embodiment disclosed herein. In the illustrated embodiment, a Camera 115 or other imaging device captures images and/or video of Items 125 in a Sensing Zone 105, and relays the data to a Computer 120. The Sensing Zone 105 generally includes any area or location where data about one or more Items 125 is collected. For example, the Sensing Zone 105 may include an optical scanner (e.g., to identify bar codes), one or more load cells to determine the weight of the Item 125, and the like. Although depicted as a discrete component for conceptual clarity, in some embodiments, the Camera 115 may be integrated into the Sensing Zone 105).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Rodriguez with Goncalves and Falcao because they are in the same field of endeavor. One skilled in the art would have been motivated to substitute the confidence of the identification as taught by Rodriguez for the probability as taught by Goncalves and Falcao as an alternate means to determine the classification of the product (Rodriguez, ¶0039).
As per claim 15, Goncalves and Falcao disclose the POS system of claim 13, wherein the memory includes further instructions executable by the processing circuitry whereby the processing circuitry is configured to:
send, to a network node having an artificial intelligence circuit, an indication that includes a request to perform the object recognition of the target object represented in the captured image based on the captured image, with the artificial intelligence circuit being trained on a set of training images of a certain object that is configured to enable the classification or identification of the certain object independent of the focal distance at which the certain object was captured by the optical sensor device (Falcao, pages 7-8, 3.6.3. Item Appearance Model, the best visual identification results are obtained through two data-driven approaches: (1) using visual descriptors ... and (2) Convolutional Neural Network (CNN) based models … We trained a CNN and generated visual descriptors for all 33 products. We then used the visual descriptors to validate the training accuracy for similar looking products … We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those products);
receive, from the network node, an indication that includes one or more visual-based predicted objects (Falcao, pages 7-8, 3.6.3. Item Appearance Model, We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those product).
Goncalves and Falcao further disclose (Falcao, page 10, 4.3. Vision-Based Item Identification, to output a single probability value for each product class. Unlike weight and location, which are very hard to occlude, visual classifiers often suffer from temporary occlusions - especially for smaller items ... As a consequence, simply concatenating (i.e., multiplying) the logits (classification score) of all objects would lead to undesired results, since an item not detected in a frame would end with a probability of 0 regardless of how confident all other frames were. We instead propose using a noisy OR model, which in essence computes the probability PV(I = i) that each product was seen by taking the complement of the probability that the product was never seen) but do not explicitly disclose the following limitation as further recited however Rodriguez discloses
corresponding visual-based confidence levels (Rodriguez, ¶0038, if a barcode or other machine-readable label is present, the system identifies and logs the item based on this label (e.g., using an optical scanner, or by identifying the label in the captured Image 225). If no label is found (e.g., if the system searches the Image 225 and cannot locate any machine-readable identifiers), the Machine Learning Component 220 can instead process the image to classify it and return an Identification 230; Rodriguez, ¶0039, in some embodiments, the Machine Learning Component 220 generates a confidence of the Identification 230).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Rodriguez with Goncalves and Falcao because they are in the same field of endeavor. One skilled in the art would have been motivated to substitute the confidence of the identification as taught by Rodriguez for the probability as taught by Goncalves and Falcao as an alternate means to determine the classification of the product (Rodriguez, ¶0039).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRACY MANGIALASCHI whose telephone number is (571)270-5189. The examiner can normally be reached M-F, 9:30AM TO 6:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached at (571) 272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TRACY MANGIALASCHI/Primary Examiner, Art Unit 2668