Prosecution Insights
Last updated: April 19, 2026
Application No. 18/616,843

OBJECT CLASSIFICATION AND IDENTIFICATION AT POINT OF SALE

Non-Final OA §103
Filed
Mar 26, 2024
Examiner
MANGIALASCHI, TRACY
Art Unit
2668
Tech Center
2600 — Communications
Assignee
Toshiba Global Commerce Solutions, Inc.
OA Round
1 (Non-Final)
75%
Grant Probability
Favorable
1-2
OA Rounds
3y 2m
To Grant
99%
With Interview

Examiner Intelligence

Grants 75% — above average
75%
Career Allow Rate
435 granted / 582 resolved
+12.7% vs TC avg
Strong +28% interview lift
Without
With
+28.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
15 currently pending
Career history
597
Total Applications
across all art units

Statute-Specific Performance

§101
7.9%
-32.1% vs TC avg
§103
53.9%
+13.9% vs TC avg
§102
15.7%
-24.3% vs TC avg
§112
15.5%
-24.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 582 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Status of the Claims Claims 1-20, as originally filed, are currently pending and have been considered below. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claim(s) 1-5, 9-13 and 16-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goncalves, U.S. Publication No. 2011/0215147, hereinafter, “Goncalves”, and further in view of Falcão, João, et al. "Faim: Vision and weight sensing fusion framework for autonomous inventory monitoring in convenience stores." Frontiers in Built Environment 6 (2020): 568372, hereinafter, “Falcao”. As per claim 1, Goncalves discloses a method, comprising: by a point of sale (POS) system operationally coupled to a load sensor device and an optical sensor device, with the load sensor device being operable to measure a load of an object while positioned on a load surface of the POS system, the optical sensor device having a field of view associated with the load surface and operable to capture an image (Goncalves, Abstract, A system and method is disclosed for using object recognition/verification and weight information to confirm the accuracy of a UPC scan, or to provide an affirmative recognition where no UPC scan was made. In the preferred embodiment, the checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more image of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known object … and identify one of the known objects based on a best match of the geometric transform; Goncalves, ¶0002, One particular field of the invention relates to systems and methods for using visual appearance and weight information to augment universal product code (UPC) scans in order to insure that items are properly identified and accounted for at ring up), obtaining an image captured by the optical sensor device that includes a visual representation of at least a portion of a target object and a load measurement associated with the target object that is performed by the load sensor device while the target object is positioned on the load surface to enable object classification or identification of the target object based on both an object recognition of the target object represented in the captured image and a prediction of the target object from the load measurement associated with the target object (Goncalves, ¶0004, a checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more images of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known objects … and identify one of the known objects based on a best match of the geometric transform; Goncalves, ¶0017, The self-checkout stations 100, 200 in these embodiments include a counter top 102 with a UPC scanner 120, a scale 180 for determining the weight of an item … One or more video cameras are trained on the counter and the bagging area for purposes of detecting the presence and/or identity of items of merchandise as they are scanned and bagged; Goncalves, ¶0018, The weight scale 180 is incorporated into the belt conveyor 140 in FIG. 2 so as to determine the weight of an item as it is passed to the bagging area 150. In still other embodiments, the scale is incorporated into the UPC scanner bed 120; Goncalves, ¶0019, As shown in FIG. 2, a camera 160 may be trained to capture images of items of the belt 140; Goncalves ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code. As described above, one or more images are acquired and geometric point features extracted from the images. The extracted features are compared to the visual features of known objects in the image database. The identity of the item as well as its UPC code can then be determined based on the number and quality of matching visual features; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent with the UPC code on the item; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item. If all data is consistent, the item is added to the checkout list), Goncalves does not explicitly disclose the following limitations as further recited however Falcao discloses with the object recognition being independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Falcao with Goncalves because they are in the same field of endeavor. One skilled in the art would have been motivated to include the visual appearance model of Falcao in the system of Goncalves in order to provide a means to recognize products that might be occluded or otherwise unclear due to camera resolution (Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities). As per claim 2, Goncalves and Falcao disclose the method of claim 1, further comprising: receiving, by a processing circuit of the POS system, from the load sensor device, an indication that includes the load measurement (Goncalves, ¶0018, The weight scale 180 is incorporated into the belt conveyor 140 in FIG. 2 so as to determine the weight of an item). As per claim 3, Goncalves and Falcao disclose the method of claim 1, further comprising: receiving, by a processing circuit of the POS system, from the optical sensor device, an indication that includes the captured image (Goncalves, ¶0004, a checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more images of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known objects). As per claim 4, Goncalves and Falcao disclose the method of claim 1. Falcao discloses further comprising: determining to capture an image of the target object responsive to determining that a weight change event has occurred based on the load measurement (Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework … FAIM’s pipeline is triggered when a change in the total weight of a shelf is detected; Falcao, page 5, 3.4. Customer-Shelf Interaction Detection, The first step in FAIM’s pipeline is to detect when an event took place (i.e., a customer picked up or put back an item on a shelf). In our proposed system architecture, displayed in Figure 2, the processing of every change in the inventory starts with a weight change trigger … the weight difference on the load sensors is generally enough to detect an event); and sending, by the processing circuit of the POS system, to the optical sensor device, an indication that includes a request to capture the image (Falcao, page 6, 3.5. Vision Event Extraction, uses the Weight Change Event Detection trigger from section 3.4 to start analyzing the images). The motivation would be the same as above in claim 1. As per claim 5, Goncalves and Falcao disclose the method of claim 1, further comprising: performing object recognition of the target object represented in the captured image based on the captured image, with the object recognition being performed independent of the focal distance at which the target object was captured by the optical sensor device (Goncalves, ¶0004, a checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more images of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known objects; generate a geometric transform between the extracted geometric point features and the features of known objects for a subset of known objects corresponding to matches; and identify one of the known objects based on a best match of the geometric transform; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities). As per claim 9, Goncalves and Falcao disclose the method of claim 1, further comprising: recognizing the target object based on the captured image to obtain one or more vision-based predicted objects and corresponding vision-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code), with the recognition being performed independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them). As per claim 10, Goncalves and Falcao disclose the method of claim 1. Falcao discloses further comprising: predicting the target object based on the weight measurement of the target object to obtain one or more weight-based predicted objects and corresponding weight-based confidence levels (Falcao, page 3, 3.1. System Overview, FAIM’s pipeline is triggered when a change in the total weight of a shelf is detected. From that it extracts two features: the absolute weight difference and the spatial distribution of the weight. The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model ... the probability of Δμm given Δμ can be defined as a normal distribution ... using the Bayes’s rule the probability of the item belonging to each product class is determined). The motivation would be the same as above in claim 1. As per claim 11, Goncalves and Falcao disclose the method of claim 1, further comprising: performing object classification or identification of the target object based on one or more vision-based predicted objects and corresponding vision-based confidence levels and one or more weight-based predicted objects and corresponding weight-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item). As per claim 12, Goncalves and Falcao disclose the method of claim 1, further comprising: object recognizing the target object based on the captured image to obtain one or more vision-based predicted objects and corresponding vision-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code), with the object recognition being performed independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them); predicting the target object based on the weight measurement of the target object to obtain one or more weight-based predicted objects and corresponding weight-based confidence levels (Falcao, page 3, 3.1. System Overview, FAIM’s pipeline is triggered when a change in the total weight of a shelf is detected. From that it extracts two features: the absolute weight difference and the spatial distribution of the weight. The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model ... the probability of Δμm given Δμ can be defined as a normal distribution ... using the Bayes’s rule the probability of the item belonging to each product class is determined); and performing object classification or identification of the target object based on the one or more vision-based predicted objects and the corresponding vision-based confidence levels and the one or more weight-based predicted objects and the corresponding weight-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item). As per claim 13, Goncalves discloses a point of sale (POS) system, comprising: with the POS system being operationally coupled to a load sensor device and an optical sensor device, with the load sensor device being operable to measure a load of an object while positioned on a load surface of the POS system, the optical sensor device having a field of view that includes the load surface and being operable to capture an image (Goncalves, Abstract, A system and method is disclosed for using object recognition/verification and weight information to confirm the accuracy of a UPC scan, or to provide an affirmative recognition where no UPC scan was made. In the preferred embodiment, the checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more image of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known object … and identify one of the known objects based on a best match of the geometric transform; Goncalves, ¶0002, One particular field of the invention relates to systems and methods for using visual appearance and weight information to augment universal product code (UPC) scans in order to insure that items are properly identified and accounted for at ring up), wherein the POS system further includes a memory, the memory containing instructions executable by the processing circuitry whereby the processing circuitry is configured to: obtain an image captured by the optical sensor device that includes a visual representation of at least a portion of a target object and a load measurement associated with the target object that is performed by the load sensor device while the target object is positioned on the load surface to enable object classification or identification of the target object based on both an object recognition of the target object represented in the captured image and a prediction of the target object from the load measurement associated with the target object (Goncalves, ¶0004, a checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more images of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known objects … and identify one of the known objects based on a best match of the geometric transform; Goncalves, ¶0017, The self-checkout stations 100, 200 in these embodiments include a counter top 102 with a UPC scanner 120, a scale 180 for determining the weight of an item … One or more video cameras are trained on the counter and the bagging area for purposes of detecting the presence and/or identity of items of merchandise as they are scanned and bagged; Goncalves, ¶0018, The weight scale 180 is incorporated into the belt conveyor 140 in FIG. 2 so as to determine the weight of an item as it is passed to the bagging area 150. In still other embodiments, the scale is incorporated into the UPC scanner bed 120; Goncalves, ¶0019, As shown in FIG. 2, a camera 160 may be trained to capture images of items of the belt 140; Goncalves ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code. As described above, one or more images are acquired and geometric point features extracted from the images. The extracted features are compared to the visual features of known objects in the image database. The identity of the item as well as its UPC code can then be determined based on the number and quality of matching visual features; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent with the UPC code on the item; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item. If all data is consistent, the item is added to the checkout list). Goncalves does not explicitly disclose the following limitations as further recited however Falcao discloses with the object recognition being independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Falcao with Goncalves because they are in the same field of endeavor. One skilled in the art would have been motivated to include the visual appearance model of Falcao in the system of Goncalves in order to provide a means to recognize products that might be occluded or otherwise unclear due to camera resolution (Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities). As per claim 16, Goncalves and Falcao disclose the POS system of claim 13, wherein the memory includes further instructions executable by the processing circuitry whereby the processing circuitry is configured to: recognize the target object based on the captured image to obtain one or more vision-based predicted objects and corresponding confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code), with the recognition being performed independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them). As per claim 17, Goncalves and Falcao disclose the POS system of claim 13. Falcao discloses wherein the memory includes further instructions executable by the processing circuitry whereby the processing circuitry is configured to: predict the target object based on the weight measurement of the target object to obtain one or more weight-based predicted objects and corresponding confidence levels (Falcao, page 3, 3.1. System Overview, FAIM’s pipeline is triggered when a change in the total weight of a shelf is detected. From that it extracts two features: the absolute weight difference and the spatial distribution of the weight. The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model ... the probability of Δμm given Δμ can be defined as a normal distribution ... using the Bayes’s rule the probability of the item belonging to each product class is determined). The motivation would be the same as above in claim 13. As per claim 18, Goncalves and Falcao disclose the POS system of claim 13, wherein the memory includes further instructions executable by the processing circuitry whereby the processing circuitry is configured to: perform classification or identification of the target object based on one or more vision-based predicted objects and corresponding vision-based confidence levels and one or more weight-based predicted objects and corresponding weight-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item). As per claim 19, Goncalves and Falcao disclose the POS system of claim 13, wherein the memory includes further instructions executable by the processing circuitry whereby the processing circuitry is configured to: object recognize the target object based on the captured image to obtain one or more vision-based predicted objects and corresponding vision-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code), with the object recognition being performed independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them); predict the target object based on the weight measurement of the target object to obtain one or more weight-based predicted objects and corresponding weight-based confidence levels (Falcao, page 3, 3.1. System Overview, FAIM’s pipeline is triggered when a change in the total weight of a shelf is detected. From that it extracts two features: the absolute weight difference and the spatial distribution of the weight. The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model ... the probability of Δμm given Δμ can be defined as a normal distribution ... using the Bayes’s rule the probability of the item belonging to each product class is determined); and perform classification or identification of the target object based on the one or more vision-based predicted objects and the corresponding vision-based confidence levels and the one or more weight-based predicted objects and the corresponding weight-based confidence levels (Goncalves, ¶0021, The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features ... When configured to do verification, the image processor confirms the identity of the item … queries the image database using the UPC, retrieves a plurality of associated visual features, and compares the features of the object having that UPC with the features extracted from the one or more images of the item captured at the checkout station. The identity of the item is confirmed if, for example, a predetermined number of feature descriptors are matched with sufficient quality, an accurate geometric transformation exists between the set of matching features, the normalized correlation of the transformed model exceeds a predetermined threshold, or combination thereof; Goncalves, ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item). As per claim 20, Goncalves discloses a point of service (POS) system, comprising: a load sensor device operable to measure a load of an object while positioned on a load surface of the POS system; optical sensor device having a field of view associated with the load surface and operable to capture an image (Goncalves, Abstract, A system and method is disclosed for using object recognition/verification and weight information to confirm the accuracy of a UPC scan, or to provide an affirmative recognition where no UPC scan was made. In the preferred embodiment, the checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more image of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known object … and identify one of the known objects based on a best match of the geometric transform; Goncalves, ¶0002, One particular field of the invention relates to systems and methods for using visual appearance and weight information to augment universal product code (UPC) scans in order to insure that items are properly identified and accounted for at ring up); and a processing circuitry and a memory containing instructions executable by the processing circuitry whereby the processing circuitry is operative to: obtain an image captured by the optical sensor device that includes a visual representation of at least a portion of a target object and a load measurement associated with the target object that is performed by the load sensor device while the target object is positioned on the load surface to enable object classification or identification of the target object based on both an object recognition of the target object represented in the captured image and a prediction of the target object from the load measurement associated with the target object (Goncalves, ¶0004, a checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more images of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known objects … and identify one of the known objects based on a best match of the geometric transform; Goncalves, ¶0017, The self-checkout stations 100, 200 in these embodiments include a counter top 102 with a UPC scanner 120, a scale 180 for determining the weight of an item … One or more video cameras are trained on the counter and the bagging area for purposes of detecting the presence and/or identity of items of merchandise as they are scanned and bagged; Goncalves, ¶0018, The weight scale 180 is incorporated into the belt conveyor 140 in FIG. 2 so as to determine the weight of an item as it is passed to the bagging area 150. In still other embodiments, the scale is incorporated into the UPC scanner bed 120; Goncalves, ¶0019, As shown in FIG. 2, a camera 160 may be trained to capture images of items of the belt 140; Goncalves ¶0022, In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code. As described above, one or more images are acquired and geometric point features extracted from the images. The extracted features are compared to the visual features of known objects in the image database. The identity of the item as well as its UPC code can then be determined based on the number and quality of matching visual features; Goncalves, ¶0023, The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent with the UPC code on the item; Goncalves, ¶0025, the checkout system can typically confirm both the weight and visual appearance of the scanned item. If all data is consistent, the item is added to the checkout list). Goncalves does not explicitly disclose the following limitations as further recited however Falcao discloses with the object recognition being independent of the focal distance at which the target object was captured by the optical sensor device (Falcao, page 2, 1. Introduction, Using weight sensors on each shelf, our system identifies the item being taken based on the location and absolute weight change of an event, which is fused with visual object identification; Falcao, page 3, 3.1. System Overview, Figure 2 shows FAIM’s system framework, The weight change-based prediction computes the probability of each product class by comparing the absolute weight difference to each product’s average weight ... The vision-based prediction leverages human pose estimation and background subtraction to focus the visual object classifier’s attention to identify the object(s); Falcao, page 5, 3.3.2. Vision Sensing, As vision processing improves, camera specification constraints can be relaxed. From our initial experiments, camera resolution doesn’t play a huge role (in fact most deep learning networks downsize the input image to about 300–720 pixels wide for training and computation efficiency purposes) … From our initial experiments we empirically noticed weight sensors to be a much more robust—and cheaper— predictor of what item was picked up or put back on a shelf. Therefore, we do not consider shelf-mounted cameras; Falcao, page 6, 3.5. Vision Event Extraction, The Vision Event Extraction pipeline is divided in two sequential tasks: Vision Event Preprocessing and Product Detections Spatial Selection. The former gathers different sources of visual evidence … The latter then aggregates all the information … As the output of the Vision Event Extraction pipeline, those detections together with their associated product probabilities, are fed into the Vision-based Item Identification module (section 4.3) which tries to determine what product was picked; Falcao, page 9, 4.2. Weight Change-Based Item Identification, Product prediction based on weight change is fairly straightforward. The main idea is to estimate how close the event’s weight change Δμ is to the distribution of each product, given by the item weight model; Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities, FAIM’s last stage of the pipeline fuses all sources of information to emit a final product prediction, and the one with the highest probability score will be selected ... we approximate this likelihood as a weighted linear combination of each individual sensor modality—weight and vision—prediction ... information from weight modality is a more robust product predictor—partially because it is less affected by occlusions, thus we assign it a higher relevance ... It is also worth noting that, as discussed in section 4.3, cameras can be occluded, lighting conditions may change, etc., therefore an object not being seen should not result in a final probability of 0. For these reasons, FAIM sums both modalities predictions instead of multiplying to fuse them). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Falcao with Goncalves because they are in the same field of endeavor. One skilled in the art would have been motivated to include the visual appearance model of Falcao in the system of Goncalves in order to provide a means to recognize products that might be occluded or otherwise unclear due to camera resolution (Falcao, pages 10-11, 4.4. Item Identification Combining All Sensing Modalities). Claim(s) 6-8, 14 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over- Goncalves, U.S. Publication No. 2011/0215147, hereinafter, “Goncalves”, in view of Falcão, João, et al. "Faim: Vision and weight sensing fusion framework for autonomous inventory monitoring in convenience stores." Frontiers in Built Environment 6 (2020): 568372, hereinafter, “Falcao” as applied to claim 5 and 13 above, and further in view of Rodriguez et al., U.S. Publication No. 2021/0334590, hereinafter, “Rodriguez”. As per claim 6, Goncalves and Falcao disclose the method of claim 5, wherein the step of performing object recognition further includes: sending, by a processing circuit of the POS system, to an artificial intelligence circuit, an indication that includes a request to perform the object recognition of the target object represented in the captured image based on the captured image, with the artificial intelligence circuit being trained on a set of training images of a certain object that is configured to enable the classification or identification of the certain object independent of the focal distance at which the certain object was captured by the optical sensor device (Falcao, pages 7-8, 3.6.3. Item Appearance Model, the best visual identification results are obtained through two data-driven approaches: (1) using visual descriptors ... and (2) Convolutional Neural Network (CNN) based models … We trained a CNN and generated visual descriptors for all 33 products. We then used the visual descriptors to validate the training accuracy for similar looking products … We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those products); and receiving, by the processing circuit of the POS system, from the artificial intelligence circuit, an indication that includes one or more visual-based predicted objects (Falcao, pages 7-8, 3.6.3. Item Appearance Model, We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those product). Goncalves and Falcao further disclose (Falcao, page 10, 4.3. Vision-Based Item Identification, to output a single probability value for each product class. Unlike weight and location, which are very hard to occlude, visual classifiers often suffer from temporary occlusions - especially for smaller items ... As a consequence, simply concatenating (i.e., multiplying) the logits (classification score) of all objects would lead to undesired results, since an item not detected in a frame would end with a probability of 0 regardless of how confident all other frames were. We instead propose using a noisy OR model, which in essence computes the probability PV(I = i) that each product was seen by taking the complement of the probability that the product was never seen) but do not explicitly disclose the following limitation as further recited however Rodriguez discloses corresponding visual-based confidence levels (Rodriguez, ¶0038, if a barcode or other machine-readable label is present, the system identifies and logs the item based on this label (e.g., using an optical scanner, or by identifying the label in the captured Image 225). If no label is found (e.g., if the system searches the Image 225 and cannot locate any machine-readable identifiers), the Machine Learning Component 220 can instead process the image to classify it and return an Identification 230; Rodriguez, ¶0039, in some embodiments, the Machine Learning Component 220 generates a confidence of the Identification 230). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Rodriguez with Goncalves and Falcao because they are in the same field of endeavor. One skilled in the art would have been motivated to substitute the confidence of the identification as taught by Rodriguez for the probability as taught by Goncalves and Falcao as an alternate means to determine the classification of the product (Rodriguez, ¶0039). As per claim 7, Goncalves, Falcao and Rodriguez disclose the method of claim 6, wherein the set of training images of the certain object are captured by an optical sensor device at a certain distance from the certain object, with the certain distance corresponding to a distance in which the optical sensor device captures an image of the target object while positioned on the load surface (Rodriguez, ¶0021, FIG. 1 illustrates an environment 100 configured to automatically collect and curate training data and deploy machine learning models, according to one embodiment disclosed herein. In the illustrated embodiment, a Camera 115 or other imaging device captures images and/or video of Items 125 in a Sensing Zone 105, and relays the data to a Computer 120. The Sensing Zone 105 generally includes any area or location where data about one or more Items 125 is collected. For example, the Sensing Zone 105 may include an optical scanner (e.g., to identify bar codes), one or more load cells to determine the weight of the Item 125, and the like. Although depicted as a discrete component for conceptual clarity, in some embodiments, the Camera 115 may be integrated into the Sensing Zone 105). As per claim 8, Goncalves and Falcao disclose the method of claim 1, wherein the step of performing object recognition further includes: sending, by the POS system, to a network node having an artificial intelligence circuit, an indication that includes a request to perform the object recognition of the target object represented in the captured image based on the captured image, with the artificial intelligence circuit being trained on a set of training images of a certain object, with the set of training images being configured to enable the object classification or identification of the certain object independent of the focal distance at which the certain object was captured by the optical sensor device (Falcao, pages 7-8, 3.6.3. Item Appearance Model, the best visual identification results are obtained through two data-driven approaches: (1) using visual descriptors ... and (2) Convolutional Neural Network (CNN) based models … We trained a CNN and generated visual descriptors for all 33 products. We then used the visual descriptors to validate the training accuracy for similar looking products … We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those products); and receiving, by the POS system, from the network node, an indication that includes one or more visual-based predicted objects (Falcao, pages 7-8, 3.6.3. Item Appearance Model, We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those product). Goncalves and Falcao further disclose (Falcao, page 10, 4.3. Vision-Based Item Identification, to output a single probability value for each product class. Unlike weight and location, which are very hard to occlude, visual classifiers often suffer from temporary occlusions - especially for smaller items ... As a consequence, simply concatenating (i.e., multiplying) the logits (classification score) of all objects would lead to undesired results, since an item not detected in a frame would end with a probability of 0 regardless of how confident all other frames were. We instead propose using a noisy OR model, which in essence computes the probability PV(I = i) that each product was seen by taking the complement of the probability that the product was never seen) but do not explicitly disclose the following limitation as further recited however Rodriguez discloses corresponding visual-based confidence levels (Rodriguez, ¶0038, if a barcode or other machine-readable label is present, the system identifies and logs the item based on this label (e.g., using an optical scanner, or by identifying the label in the captured Image 225). If no label is found (e.g., if the system searches the Image 225 and cannot locate any machine-readable identifiers), the Machine Learning Component 220 can instead process the image to classify it and return an Identification 230; Rodriguez, ¶0039, in some embodiments, the Machine Learning Component 220 generates a confidence of the Identification 230). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Rodriguez with Goncalves and Falcao because they are in the same field of endeavor. One skilled in the art would have been motivated to substitute the confidence of the identification as taught by Rodriguez for the probability as taught by Goncalves and Falcao as an alternate means to determine the classification of the product (Rodriguez, ¶0039). As per claim 14, Goncalves and Falcao disclose the POS system of claim 13, wherein the memory includes further instructions executable by the processing circuitry whereby the processing circuitry is configured to: send, by a processing circuit of the POS system, to an artificial intelligence circuit, an indication that includes a request to perform the object recognition of the target object represented in the captured image based on the captured image, with the artificial intelligence circuit being trained on a set of training images of a certain object that is configured to enable the classification or identification of the certain object independent of the focal distance at which the certain object was captured by the optical sensor device (Falcao, pages 7-8, 3.6.3. Item Appearance Model, the best visual identification results are obtained through two data-driven approaches: (1) using visual descriptors ... and (2) Convolutional Neural Network (CNN) based models … We trained a CNN and generated visual descriptors for all 33 products. We then used the visual descriptors to validate the training accuracy for similar looking products … We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those products); receive, by the processing circuit of the POS system, from the artificial intelligence circuit, an indication that includes one or more vision-based predicted objects (Falcao, pages 7-8, 3.6.3. Item Appearance Model, We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those product). Goncalves and Falcao further disclose (Falcao, page 10, 4.3. Vision-Based Item Identification, to output a single probability value for each product class. Unlike weight and location, which are very hard to occlude, visual classifiers often suffer from temporary occlusions - especially for smaller items ... As a consequence, simply concatenating (i.e., multiplying) the logits (classification score) of all objects would lead to undesired results, since an item not detected in a frame would end with a probability of 0 regardless of how confident all other frames were. We instead propose using a noisy OR model, which in essence computes the probability PV(I = i) that each product was seen by taking the complement of the probability that the product was never seen) but do not explicitly disclose the following limitation as further recited however Rodriguez discloses corresponding vision-based confidence levels (Rodriguez, ¶0038, if a barcode or other machine-readable label is present, the system identifies and logs the item based on this label (e.g., using an optical scanner, or by identifying the label in the captured Image 225). If no label is found (e.g., if the system searches the Image 225 and cannot locate any machine-readable identifiers), the Machine Learning Component 220 can instead process the image to classify it and return an Identification 230; Rodriguez, ¶0039, in some embodiments, the Machine Learning Component 220 generates a confidence of the Identification 230); and wherein the set of training images are captured by an optical sensor device at a certain distance from the certain object, with the certain distance corresponding to a distance in which the optical sensor device captures an image of the target object while positioned on the load surface (Rodriguez, ¶0021, FIG. 1 illustrates an environment 100 configured to automatically collect and curate training data and deploy machine learning models, according to one embodiment disclosed herein. In the illustrated embodiment, a Camera 115 or other imaging device captures images and/or video of Items 125 in a Sensing Zone 105, and relays the data to a Computer 120. The Sensing Zone 105 generally includes any area or location where data about one or more Items 125 is collected. For example, the Sensing Zone 105 may include an optical scanner (e.g., to identify bar codes), one or more load cells to determine the weight of the Item 125, and the like. Although depicted as a discrete component for conceptual clarity, in some embodiments, the Camera 115 may be integrated into the Sensing Zone 105). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Rodriguez with Goncalves and Falcao because they are in the same field of endeavor. One skilled in the art would have been motivated to substitute the confidence of the identification as taught by Rodriguez for the probability as taught by Goncalves and Falcao as an alternate means to determine the classification of the product (Rodriguez, ¶0039). As per claim 15, Goncalves and Falcao disclose the POS system of claim 13, wherein the memory includes further instructions executable by the processing circuitry whereby the processing circuitry is configured to: send, to a network node having an artificial intelligence circuit, an indication that includes a request to perform the object recognition of the target object represented in the captured image based on the captured image, with the artificial intelligence circuit being trained on a set of training images of a certain object that is configured to enable the classification or identification of the certain object independent of the focal distance at which the certain object was captured by the optical sensor device (Falcao, pages 7-8, 3.6.3. Item Appearance Model, the best visual identification results are obtained through two data-driven approaches: (1) using visual descriptors ... and (2) Convolutional Neural Network (CNN) based models … We trained a CNN and generated visual descriptors for all 33 products. We then used the visual descriptors to validate the training accuracy for similar looking products … We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those products); receive, from the network node, an indication that includes one or more visual-based predicted objects (Falcao, pages 7-8, 3.6.3. Item Appearance Model, We see that the products in our database are visually distinguishable from each other. This is natural as different brands continuously attempt to create their unique visual identity in order for consumers to easily pick out their products from all the similar competitors’ products. Although this technique allows us to distinguish the object based on visual information, it is highly sensitive to occlusions. We therefore use this technique as an indicator of which objects are more similar to each other in order to then test and validate the performance of our trained CNN with those product). Goncalves and Falcao further disclose (Falcao, page 10, 4.3. Vision-Based Item Identification, to output a single probability value for each product class. Unlike weight and location, which are very hard to occlude, visual classifiers often suffer from temporary occlusions - especially for smaller items ... As a consequence, simply concatenating (i.e., multiplying) the logits (classification score) of all objects would lead to undesired results, since an item not detected in a frame would end with a probability of 0 regardless of how confident all other frames were. We instead propose using a noisy OR model, which in essence computes the probability PV(I = i) that each product was seen by taking the complement of the probability that the product was never seen) but do not explicitly disclose the following limitation as further recited however Rodriguez discloses corresponding visual-based confidence levels (Rodriguez, ¶0038, if a barcode or other machine-readable label is present, the system identifies and logs the item based on this label (e.g., using an optical scanner, or by identifying the label in the captured Image 225). If no label is found (e.g., if the system searches the Image 225 and cannot locate any machine-readable identifiers), the Machine Learning Component 220 can instead process the image to classify it and return an Identification 230; Rodriguez, ¶0039, in some embodiments, the Machine Learning Component 220 generates a confidence of the Identification 230). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine the teachings of Rodriguez with Goncalves and Falcao because they are in the same field of endeavor. One skilled in the art would have been motivated to substitute the confidence of the identification as taught by Rodriguez for the probability as taught by Goncalves and Falcao as an alternate means to determine the classification of the product (Rodriguez, ¶0039). Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRACY MANGIALASCHI whose telephone number is (571)270-5189. The examiner can normally be reached M-F, 9:30AM TO 6:00PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached at (571) 272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /TRACY MANGIALASCHI/Primary Examiner, Art Unit 2668
Read full office action

Prosecution Timeline

Mar 26, 2024
Application Filed
Mar 07, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602936
LONG-RANGE 3D OBJECT DETECTION USING 2D BOUNDING BOXES
2y 5m to grant Granted Apr 14, 2026
Patent 12592055
MACHINE-LEARNING MODEL ANNOTATION AND TRAINING TECHNIQUES
2y 5m to grant Granted Mar 31, 2026
Patent 12586194
Arrangement and Method for the Optical Assessment of Crop in a Harvesting Machine
2y 5m to grant Granted Mar 24, 2026
Patent 12568876
METHOD FOR CLASSIFYING PLANTS FOR AGRICULTURAL PURPOSES
2y 5m to grant Granted Mar 10, 2026
Patent 12567246
FAIR NEURAL NETWORKS
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
75%
Grant Probability
99%
With Interview (+28.4%)
3y 2m
Median Time to Grant
Low
PTA Risk
Based on 582 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month