Last updated: May 29, 2026
Application No. 18/158,950
SYSTEMS AND METHODS FOR PROCESSING IMAGES CAPTURED AT A PRODUCT STORAGE FACILITY

Non-Final OA §103
Filed
Jan 24, 2023
Examiner
VANCHY JR, MICHAEL J
Art Unit
2666
Tech Center
2600 — Communications
Assignee
Walmart Apollo LLC
OA Round
3 (Non-Final)
Interview Optional

— +20.1% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 67% grant rate with +20.1% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 608 resolved cases, 2023–2026
Examiner Intelligence

VANCHY JR, MICHAEL J View full profile →
Grants 67% — above average
Career Allowance Rate
406 granted / 608 resolved
+4.8% vs TC avg
Strong +20% interview lift
Without
With
+20.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
10 currently pending
Career history
625
Total Applications
across all art units
Statute-Specific Performance

§101
2.1%
-37.9% vs TC avg
§103
92.9%
+52.9% vs TC avg
§102
2.6%
-37.4% vs TC avg
§112
1.2%
-38.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 608 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 03/17/2026 has been entered.
 
Response to Arguments
Applicant’s arguments with respect to the claim(s) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Prior art Guo et al., US 10,839,452 B1 (Guo) has been newly added to assist in teaching the newly added claim amendments.
Claims 1-7, 10-17, and 20 are pending; claims 8, 9, 18, and 19 have been canceled; claims 1-3, 10, 12-15, and 20 have been amended.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-7, 10, 12-17, and 20 (rejected as 14-17, 20, 1-7, 10, 12, and 13 for clarity) are rejected under 35 U.S.C. 103 as being unpatentable over Staudinger et al., US 2020/0334856 A1 (Staudinger), Saptharishi et al., US 2022/0027618 A1 (Saptharishi), Wen et al., US 2020/0394441 A1 (Wen), Schwartz, US 2016/0171429 A1 (Schwartz), and further in view of Guo et al., US 10,839,452 B1 (Guo).
Regarding claim 14, Staudinger teaches a method for labeling objects in images (labeling objects depicted in images) (Abstract), the method comprising: 
selecting, by a control circuit (processing circuitry) ([0015]), a set of unprocessed images from a plurality of unprocessed images of objects captured (receiving a plurality of images of a scene) ([0007]); 
receiving, by the control circuit (processing circuitry) ([0015]), a selected configuration based on data resulting from the set of unprocessed images using a machine learning model (machine learning model) (Abstract and [0025]) to select: a pretrained model (applying the images to a pre-trained model) ([0007]), a feature extraction layer of the pretrained model (to extract and embed the images with respective feature vectors) ([0007]), or a type of clustering (performing a cluster analysis of the images with respective feature vectors to group the images in a plurality of clusters) ([0007]); 
clustering, by the control circuit (processing circuitry) ([0015]), each unprocessed image of the plurality of unprocessed images into a corresponding group of a plurality of groups based on the selected configuration (performing a cluster analysis of the images with respective feature vectors to group the images in a plurality of clusters) ([0007]); 
selecting, by the control circuit (processing circuitry) ([0015]), a plurality of clustered images from each of the plurality of groups (selecting at least some but not all images in each of the plurality of clusters) ([0007]); 
for each of the selected plurality of clustered images from each of the plurality of groups (selecting at least some but not all images in each of the plurality of clusters) ([0007])
outputting, by the control circuit (processing circuitry) ([0015]), the plurality of clustered images from each group (outputting the images from the clusters to the label module 110) (Fig. 1; [0007] and [0033]); 
for each of the selected plurality of clustered images from each of the plurality of groups (selecting at least some but not all images in each of the plurality of clusters) ([0007]):
displaying, by a user interface operable on an electronic device (wherein the apparatus 400 has a user interface including a display 410) (Fig. 4; [0048] and [0053]), the clustered image (the images being displayed/depicted to the user for user input for labeling) ([0007] and [0033]); 
receiving, by the user interface (user interface) ([0053]), a user input (wherein the user can label objects in the subset of images) ([0007] and [0033]); and 
training, by the control circuit (processing circuitry) ([0015]), a machine learning model based on a labeled dataset (producing the training set of images for training a machine learning model) ([0033-0034]). 
However, Staudinger does not explicitly teach that the images are “captured at a product storage facility”, “iteratively processing” the set of unprocessed images, “detecting an object from the clustered image and enclosing the detected object in a bounding box”, displaying “the bounding box and a plurality of candidate product identifiers each potentially corresponding to the detected object enclosed by the bounding box”, a user input “comprising a selection of a correct product identifier of the displayed plurality of candidate product identifiers associated with detected object enclosed by the bounding box”, or  “comprising the user input for each of the selected plurality of clustered images from each of the plurality of groups”.
Saptharishi teaches a camera system that comprises an image capturing device, object detection module, object tracking module, and match classifier (Abstract); a method for labeling objects in images (labeling objects within the image data) ([0068]) captured at a product storage facility (wherein the camera system can be used with a facility) ([0117]); and wherein receiving, by a control circuit (hardware circuit) ([0039]), a selected configuration based on data resulting from iteratively processing the set of unprocessed imaged based on at least one of a pretrained model (iteratively processing the image data based on data from the trained classifier and wherein the performance is acceptable or not) (Fig. 9; [0067-0068]); wherein detecting an object (detecting an object) ([0026] and [0082]) from the clustered image (from the group of multiple images of an object) ([0026]) and enclosing the detected object in a bounding box (enclosing the object within a bounding box) ([0082]);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Staudinger to include iteratively processing the images since it allows the system to generate a classifier with acceptable performance (Saptharishi; [0069]), such that the object detection accuracy and classification accuracy are improved by feeding back accurate tracking data (Saptharishi; [0022]).
However, neither explicitly teaches “a first parameter comprising one of a pretrained model, a feature extraction layer of the pretrained model, or a type of clustering; and a second parameter comprising one of the pretrained model, the feature extraction layer, and the type of clustering, wherein the second parameter is different than the first parameter; clustering images into a corresponding group of a plurality of groups based on a combination of the first parameter and the second parameter” or displaying “the bounding box and a plurality of candidate product identifiers each potentially corresponding to the detected object enclosed by the bounding box”, a user input “comprising a selection of a correct product identifier of the displayed plurality of candidate product identifiers associated with detected object enclosed by the bounding box”, or  “comprising the user input for each of the selected plurality of clustered images from each of the plurality of groups”.
Wen teaches an image classification system is provided for determining a likely classification of an image using multiple machine learning models that share a base machine learning model (Abstract); wherein a first parameter comprising one of a pretrained model (a machine learning model) (Fig. 3; [0031-0032]), a feature extraction layer of the pretrained model (using the feature vector of the image as input into the several machine learning models) (Fig. 3; [0027]), or a type (type of image classification) ([0030-0031] and [0040]) of clustering (having a first parameter (i.e., a first machine learning model) that has a different feature extraction layer for grouping for a different subset or superset of classification) (Fig. 3; [0030-0031] and [0040]); and a second parameter comprising one of the pretrained model (another, second, machine learning model) (Fig. 3; [0031-0032]), the feature extraction layer (using the feature vector of the image as input into the several machine learning models) (Fig. 3; [0027]), and the type of clustering (having a second parameter (i.e., a second machine learning model) that has a different feature extraction layer for grouping for a different subset or superset of classification) (Fig. 3; [0030-0031] and [0040]), wherein the second parameter is different than the first parameter (wherein the first machine learning model is different than the second machine learning model) (Fig. 3; [0030-0031] and [0040]); and clustering images into a corresponding group of a plurality of groups (wherein the images are placed in different category groups and/or subcategory groups) ([0012], [0015], and [0048]) based on a combination of the first parameter and the second parameter (wherein the final classification may be based on additional, alternative, and/or combinations of classification criteria) ([0051-0052]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of prior arts, and specifically the machine learning model development of Staudinger that clusters images based on feature vectors (Staudinger; [0007]) to include multiple machine learning models that are each trained to classify an image to a specific category since using multiple machine learning models allows the system to make detailed classification determinations on subcategories while yielding higher accuracy (Wen; [0040]); and maintain highly accurate predictions while advantageously lowering the burden on computing resources by only executing small, well-trained work models as necessary (Wen; [0031]).
However, none of them explicitly teaches displaying “the bounding box and a plurality of candidate product identifiers each potentially corresponding to the detected object enclosed by the bounding box”, a user input “comprising a selection of a correct product identifier of the displayed plurality of candidate product identifiers associated with detected object enclosed by the bounding box”, or  “comprising the user input for each of the selected plurality of clustered images from each of the plurality of groups”.
Schwartz teaches an image recognition system to receive a realogram image including a plurality of organized objects and to detect and identify objects in the realogram image of one or more items on a retail shelf (Abstract); wherein detecting an object from the clustered image (identifying objects in the images) (Fig. 2; [0046]) and enclosing the detected object in a bounding box (wherein the detected objects/products can be enclosed with a bounding box) ([0061]);  the bounding box (wherein the detected objects/products can be enclosed with a bounding box) ([0061]) and a plurality of candidate product identifiers each potentially corresponding to the detected object enclosed by the bounding box (to match detected objects with images of known objects stored in a search database; such as product identifiers/features which can include an object ID) ([0061]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of prior arts to include identifying objects corresponding to products since it increases the modality of the system for identifying certain products efficiently (Schwartz; [0064]).
However, none of them explicitly teaches “displaying the bounding box and a plurality of candidate product identifiers each potentially corresponding to the detected object enclosed by the bounding box”, a user input “comprising a selection of a correct product identifier of the displayed plurality of candidate product identifiers associated with detected object enclosed by the bounding box”, or  “comprising the user input for each of the selected plurality of clustered images from each of the plurality of groups”.
Guo teaches technologies to detect unpackaged, unlabeled, or mislabeled products based on product images (Abstract); wherein displaying the bounding box (wherein the system can identify the unlabeled product in the image and determine a bounding box that encloses the unlabeled product) (col. 16, lines 56-58) and a plurality of candidate product identifiers each potentially corresponding to the detected object (wherein one or more highly ranked product candidates may be displayed via user interface) (col. 6, lines 50-53) enclosed by the bounding box (wherein the system can identify the unlabeled product in the image and determine a bounding box that encloses the unlabeled product) (col. 16, lines 56-58); wherein a user input comprising a selection of a correct product identifier of the displayed plurality of candidate product identifiers associated with detected object (wherein the user can select one or more of the highly ranked product candidates displayed to the user via the user interface) (col. 6, lines 50-56) enclosed by the bounding box (wherein the system can identify the unlabeled product in the image and determine a bounding box that encloses the unlabeled product) (col. 16, lines 56-58); and comprising the user input for each of the selected plurality of clustered images from each of the plurality of groups (wherein when the user selects one of the product candidates for the unlabeled product image; it can then be used for training) (col. 6, lines 42-53).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of prior arts to include displaying the candidate products for the user to select (instead of manually inserting such as in Saptharishi) since by displaying a user only needs to confirm one product from a handful of options compared to the onerous task of seeking a needle in a haystack as in some conventional systems (Guo; col. 6, lines 53-56); therefore, the user will save time for the transaction (Guo; col. 6, lines 56-57); and further, the transaction is likely accurate with the correct product because selecting the correct product from a handful options is a relative easy task (Guo; col. 6, lines 57-60).

Regarding claim 15, Saptharishi teaches further comprising: 
selecting, by the control circuit (hardware circuit) ([0039]), a next plurality of clustered images from each of the plurality of groups (selecting a second set of clustered images related to a second object) ([0024-0026] and [0067]); 
displaying, by the user interface (a user interface 104 includes a display 114 and input devices 116) (Fig. 1; [0029]), each of the next plurality of clustered images (presented to the user for manual labeling) ([0068-0069]); 
receiving, by the user interface (a user interface 104 includes a display 114 and input devices 116) (Fig. 1; [0029]), a next user input labeling one or more objects shown in each of the plurality of clustered images resulting in next labeled dataset comprising a next set of labeled images (wherein the user can label the next set of images of a second object or confused object) ([0068]); and
training, by the control circuit (hardware circuit) ([0039]), the machine learning model based on the next labeled dataset (training the match classifier based on the next labeled set of images) ([0068]) until a threshold number of labeled datasets have been used to train the machine learning model (classification threshold and until the performance is above a predefined performance level) ([0033] and [0068-0069]).  

Regarding claim 16, Saptharishi teaches further configured to: 
selecting, by the control circuit (hardware circuit) ([0039]), a second plurality of clustered images from each of the plurality of groups (selecting a second set of clustered images related to a second object) ([0024-0026] and [0067]); 
automatically labeling, by the control circuit (hardware circuit) ([0039]) using the machine learning model (wherein the image data is classified using the trained classifier) ([0068-0069]), the one or more objects shown in each of the plurality of clustered images resulting in automatically labeled set of images (the objects in the image data being automatically labeled) ([0068-0069]); 
displaying, by the user interface (a user interface 104 includes a display 114 and input devices 116) (Fig. 1; [0029]), each image of the automatically labeled set of images (wherein the images are displayed to the user) ([0068-0069]); 
receiving, by the user interface (a user interface 104 includes a display 114 and input devices 116) (Fig. 1; [0029]), a second user input relabeling mislabeled objects of the one or more objects shown in each of the second plurality of clustered images resulting in a correctly labeled set of images (wherein the user can label objects with a bad performance, and mislabeled as confusing objects, with new manual labels) ([0069]); and 
training, by the control circuit (hardware circuit) ([0039]), the machine learning model based on the correctly labeled set of images (retraining the match classifier based on the new correct manual labels) ([0069]).  

Regarding claim 17, Staudinger teaches wherein the user interface comprises a graphical user interface (wherein the apparatus 400 has a user interface including a display 410) (Fig. 4; [0048] and [0053]) used by a user to associate each of the objects shown in each of the plurality of clustered images (wherein the user can label objects depicted in the subset of images to produce a training set of images) ([0007] and [0033]). Saptharishi teaches wherein the user interface comprises a graphical user interface (a user interface 104 includes a display 114 and input devices 116) (Fig. 1; [0029]) used by a user to associate each of the objects shown in each of the plurality of clustered images (wherein the user can label the set of images of a second object or confused object) ([0068-0069]). Wen teaches an image classification system is provided for determining a likely classification of an image using multiple machine learning models that share a base machine learning model (Abstract). 
However, none of them explicitly states to associate each of the objects…”to a corresponding product”.
Schwartz teaches an image recognition system to receive a realogram image including a plurality of organized objects and to detect and identify objects in the realogram image of one or more items on a retail shelf (Abstract); and wherein the user interface comprises a graphical user interface (display device, for displaying each label region for further analysis and/or display to a user) ([0046] and [0154]) each of the objects shown in each of the plurality of clustered images to a corresponding product (image recognition results for specific products grouped into facings) ([0040], [0054], and [0067]).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of prior arts to include identifying objects corresponding to products since it increases the modality of the system for identifying certain products efficiently (Schwartz; [0064]).

Regarding claim 20, Schwartz teaches further comprising: 
storing, by a database, a plurality of processed images (wherein the data stores image analysis results along with the received images) ([0042]), wherein each processed image shows at least one object inside a bounding box indicating the at least one object has been detected in the processed image (the object recognition module generates an identified region or bounding box for each identified object in the one or more images and outputs a recognition result) ([0061]); 
text associated with the plurality of candidate product identifiers (text associated with a prices and labels) ([0116-0120] and [0124]); and 
a plurality of stored product images associated with the plurality of candidate product identifiers (the data storage includes images along with image analysis results; which includes labels) ([0042-0043]); 
comparing, by the control circuit using the trained machine learning model (using the shelf/label detection module; which can be trained to recognize labels) ([0124] and [0151]), detected text in a bounded object (bounding box for each identified object in the one or more images) ([0061]) of a processed image with the text associated with the corresponding product identifiers to determine a first set of matches (once prices and labels have been detected the detection module identifies matching prices and labels detected in the image) ([0118-0119]), wherein each match of the first set of matches is associated with a first corresponding probability value and a first respective product identifier of the match (matching the text and may select the match with the highest score or confidence value) ([0118-0119])); 
comparing, by the control circuit using the trained machine learning model (using the shelf/label detection module; which can be trained to recognize labels) ([0124] and [0151]), one or more detected visual images of the bounded object (bounding box for each identified object in the one or more images) ([0061]) with the plurality of stored product images to determine a second set of matches the object recognition module can match detected objects with images of known objects stored in a search database on the data storage) ([0052] and [0061]), wherein each match of the second set of matches is associated with a second corresponding probability value and a second respective product identifier of the match (wherein the recognition results for each identified object may also include other information including a confidence of the object in identifying the product) ([0061]); and 
determining, by the control circuit using the trained machine learning model (using the shelf/label detection module; which can be trained to recognize labels) ([0124] and [0151]), a third set of matches, wherein the third set of matches are those matches in the first set of matches and the second set of matches that are associated with probability values that are greater than a threshold value (wherein the detection module chooses the best shelf and label locations using the information from shelf location hypotheses based on appearance, shelf locations hypotheses based on context and label and price detection; greater than a number found and a score) ([0127]).

Regarding claim 1, see the rejection made to claim 14, as well as prior art Staudinger for a system (Fig. 1; [0026]), for they teach all the limitations within this claim.
Regarding claim 2, see the rejection made to claim 15, as well as prior art Staudinger for a system (Fig. 1; [0026]), for they teach all the limitations within this claim.
Regarding claim 3, see the rejection made to claim 16, as well as prior art Staudinger for a system (Fig. 1; [0026]), for they teach all the limitations within this claim.

Regarding claim 4, Saptharishi teaches further comprising: one or more image capture devices configured to capture the plurality of unprocessed images of objects at the product storage facility (wherein the camera system 100 includes image capturing devices 102 to capture images of objects at the facility) (Fig. 1; [0117] and [0120]); and a database configured to store the plurality of unprocessed images (wherein the images can be stored in the one or more storage systems 110 or from the remote storage unit 106) ([0103] and [0112]).  

Regarding claim 5, Schwartz teaches wherein at least one of the one or more image capture devices is coupled to a motorized robotic unit (wherein the imaging device 115 can be a robot mounted camera) (Fig. 1; [0060]).  

Regarding claim 6, Schwartz teaches wherein the objects comprise items for sale (organized objects, such as a retail display) (Fig. 4; [0059-0060]) and price tags (price tags) ([0056]).  

Regarding claim 7, see the rejection made to claim 17, as well as prior art Staudinger for a system (Fig. 1; [0026]), for they teach all the limitations within this claim.

Regarding claim 10, Schwartz teaches further comprising: 
a database, configured to store: a plurality of processed images (wherein the data stores image analysis results along with the received images) ([0042]), wherein each processed image shows at least one object inside a bounding box indicating the at least one object has been detected in the processed image (the object recognition module generates an identified region or bounding box for each identified object in the one or more images and outputs a recognition result) ([0061]); 
text associated with the plurality of candidate product identifiers (text associated with a prices and labels) ([0116-0120] and [0124]); and 
a plurality of stored product images associated with the plurality of candidate product identifiers (the data storage includes images along with image analysis results; which includes labels) ([0042-0043]); 
wherein the control circuit (application-specific integrated circuit (ASIC)) ([0053] and [0057]) in classifying each detected object (detect and identify the object) ([0061]) as potentially associated with the plurality of corresponding candidate product identifiers (to match detected objects with images of known objects stored in a search database; such as product identifiers/features which can include an object ID) ([0061]) is further configured to: compare, using the control circuit using the machine learning model (using the shelf/label detection module; which can be trained to recognize labels) ([0124] and [0151]), detected text in a bounded object (bounding box for each identified object in the one or more images) ([0061]) of a processed image with the text associated with the corresponding product identifiers to determine a first set of matches (once prices and labels have been detected the detection module identifies matching prices and labels detected in the image) ([0118-0119]), wherein each match of the first set of matches is associated with a first corresponding probability value and a first respective product identifier of the match (matching the text and may select the match with the highest score or confidence value) ([0118-0119])); 
compare, using the trained machine learning model (using the shelf/label detection module; which can be trained to recognize labels) ([0124] and [0151]), one or more detected visual images of the bounded object (bounding box for each identified object in the one or more images) ([0061]) with the plurality of stored product images to determine a second set of matches (the object recognition module can match detected objects with images of known objects stored in a search database on the data storage) ([0052] and [0061]), wherein each match of the second set of matches is associated with a second corresponding probability value and a second respective product identifier of the match (wherein the recognition results for each identified object may also include other information including a confidence of the object in identifying the product) ([0061]); and 
determine, using the trained machine learning model (using the shelf/label detection module; which can be trained to recognize labels) ([0124] and [0151]), a third set of matches, wherein the third set of matches are those matches in the first set of matches and the second set of matches that are associated with probability values that are greater than a threshold value (wherein the detection module chooses the best shelf and label locations using the information from shelf location hypotheses based on appearance, shelf locations hypotheses based on context and label and price detection; greater than a number found and a score) ([0127]).

Regarding claim 12, Schwartz teaches wherein the control circuit (application-specific integrated circuit (ASIC)) ([0053] and [0057]) uses another machine learning model (object recognition module) ([0061]) to detect the objects and enclose each detected object inside the bounding box (generating an identified region or bounding box for each identified object in the one or more images) ([0061]), and wherein the other machine learning model is distinct from the machine learning model (wherein the object recognition module is different than the shelf/label detection module; which can be trained to recognize labels) ([0124] and [0151]).  

Regarding claim 13, Saptharishi teaches wherein the plurality of unprocessed images are images that have not gone through objection detection or object classification by the control circuit (wherein the image data 901 is, for example, raw video data) ([0067]).

Claim(s) 11 is rejected under 35 U.S.C. 103 as being unpatentable over Staudinger et al., US 2020/0334856 A1 (Staudinger), Saptharishi et al., US 2022/0027618 A1 (Saptharishi), Wen et al., US 2020/0394441 A1 (Wen), Schwartz, US 2016/0171429 A1 (Schwartz), Guo et al., US 10,839,452 B1 (Guo), and further in view of Adato et al., US 2019/0149725 A1 (Adato).
Regarding claim 11, Staudinger teaches wherein the user interface comprises a graphical user interface (wherein the apparatus 400 has a user interface including a display 410) (Fig. 4; [0048] and [0053]) used by a user to associate each of the objects shown in each of the plurality of clustered images (wherein the user can label objects depicted in the subset of images to produce a training set of images) ([0007] and [0033]). Saptharishi teaches wherein the user interface comprises a graphical user interface (a user interface 104 includes a display 114 and input devices 116) (Fig. 1; [0029]) used by a user to associate each of the objects shown in each of the plurality of clustered images (wherein the user can label the set of images of a second object or confused object) ([0068-0069]). Wen teaches an image classification system is provided for determining a likely classification of an image using multiple machine learning models that share a base machine learning model (Abstract). Schwartz teaches a user interface comprises a graphical user interface (display device, for displaying each label region for further analysis and/or display to a user) ([0046] and [0154]); and wherein outputting a plurality of corresponding product identifiers (generating a list of inliers and product identifiers/IDs) ([0056] and [0061]); and indicating a correct product identifier selected from the plurality of corresponding candidate product identifiers (based on the detection module a hypothesis can be validated as a correct product identifier) ([0061], [0148-0154]). Guo teaches technologies to detect unpackaged, unlabeled, or mislabeled products based on product images (Abstract).
However, none of them explicitly teaches “wherein the control circuit is further configured to: determine that not a single probability value in the first set of matches and the second set of matches is greater than the threshold value; receive a third user input via the user interface, the third user input comprising one or more words associated with the bounded object of the processed image; search product identifiers associated with the bounded object using the third user input; output the product identifiers associated with the bounded object to the user interface; receive a fourth user input via the user interface associating a correct product identifier selected from the product identifiers associated with the bounded object; and train the trained machine learning model with a processed image including the bounded object associated with the correct product identifier”.
Adato teaches systems, methods, and devices for identifying products in retail stores ([0002]); wherein the control circuit (Application Specific Integrated Circuits (ASICs)) ([0129]) is further configured to: determine that not a single probability value in the first set of matches and the second set of matches is greater than the threshold value (determining if the confidence in the matches are below a certain threshold; following a second course of action) ([0125]); receive a third user input via the user interface (interface system) ([0268]), the third user input comprising one or more words associated with the bounded object of the processed image (wherein the user can input an indication of a type of product such as in text format) ([0268-0269]); search product identifiers associated with the bounded object (searching for the type of object associated with the bounded object) (Fig. 15; [0335]) using the third user input; output the product identifiers associated with the bounded object to the user interface (outputting to a GUI for the user to see that the system was unable to identify the product) (Fig. 15; [0335]); receive a fourth user input via the user interface associating a correct product identifier selected from the product identifiers associated with the bounded object (wherein the user can input the product type in the GUI, such as in box 1503; the user may also enter the product identification number or code) (Fig. 15; [0335] and [0460-0464]); and train the trained machine learning model with a processed image (utilizing suitably trained machine learning algorithms and models to perform the product identification) ([0342]) including the bounded object associated with the correct product identifier (wherein uses the user input to update the product model subset) ([0335] and [0344]).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of prior arts to include using the user interface to correct the product identifier when the output is incorrect since it can increase the efficiency of recognizing new products or new packages while also lowering the inaccuracy while recognizing products in the images (Adato; [0267]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Desai et al., US 2018/0114334 A1 teaches: Given an image matching the target visual domain, an object detection mechanism extracts existing objects, and, for each object, a generic machine learning model stored in each device is used to generate machine learning model features and label recommendations to human users who then select a correct label ([0033]); wherein labels are presented to a user who selects the correct label ([0044]); and wherein users can be asked to select the correct label for each object/box and hence should identify fewer bounding boxes to minimize users' labeling efforts ([0095]). 

Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J VANCHY JR whose telephone number is (571)270-1193. The examiner can normally be reached Monday - Friday 9am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emily Terrell can be reached at (571) 270-3717. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL J VANCHY  JR/Primary Examiner, Art Unit 2666                                                                                                                                                                                                        Michael.Vanchy@uspto.gov
Read full office action
Prosecution Timeline

Show 6 earlier events
Dec 23, 2025
Final Rejection mailed — §103
Feb 11, 2026
Interview Requested
Mar 02, 2026
Applicant Interview (Telephonic)
Mar 02, 2026
Examiner Interview Summary
Mar 17, 2026
Request for Continued Examination
Mar 19, 2026
Response after Non-Final Action
Apr 01, 2026
Non-Final Rejection mailed — §103
May 27, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

17/654,891
Patent 12633151
ANOMALOUS EVENT PREDICTION USING CONTRASTIVE LEARNING
4y 2m to grant Granted May 19, 2026
18/332,680
Patent 12633130
METHOD, PROCESSOR CIRCUIT AND COMPUTER-READABLE STORAGE MEDIUM FOR PEDESTRIAN DETECTION BY A PROCESSOR CIRCUIT OF A MOTOR VEHICLE
2y 11m to grant Granted May 19, 2026
18/101,071
Patent 12626531
Systems, Methods and Media for Deep Shape Prediction
3y 3m to grant Granted May 12, 2026
18/145,724
Patent 12614386
METHOD OF PROCESSING VIDEO, METHOD OF QUERING VIDEO, AND METHOD OF TRAINING MODEL
3y 4m to grant Granted Apr 28, 2026
18/160,186
Patent 12602906
IMAGE RECOGNITION APPARATUS
3y 2m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
67%
Grant Probability
87%
With Interview (+20.1%)
3y 3m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 608 resolved cases by this examiner. Grant probability derived from career allowance rate.