Last updated: May 29, 2026

Application No. 18/906,996

AUGMENTED REALITY OF ITEM IDENTIFICATION DURING ASSISTED CHECKOUT

Non-Final OA §102§OTHER

Filed

Oct 04, 2024

Priority

Oct 04, 2023 — provisional 63/587,874

Examiner

CHEIN, ALLEN C

Art Unit

3627

Tech Center

3600 — Transportation & Electronic Commerce

Assignee

Radiusai Inc.

OA Round

1 (Non-Final)

This examiner grants 44% of cases after interview

— +40.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 432 resolved cases, 2023–2026

Examiner Intelligence

CHEIN, ALLEN C View full profile →

Grants 44% of resolved cases

Career Allowance Rate

190 granted / 432 resolved

-8.0% vs TC avg

Strong +40% interview lift

Without

With

+40.3%

Interview Lift

resolved cases with interview

Typical timeline

3y 9m

Avg Prosecution

22 currently pending

Career history

470

Total Applications

across all art units

Statute-Specific Performance

§101

9.1%

-30.9% vs TC avg

§103

85.5%

+45.5% vs TC avg

§102

1.5%

-38.5% vs TC avg

§112

1.9%

-38.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 432 resolved cases

Office Action

§102 §OTHER

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


DETAILED ACTION

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-7,10-17,20 are rejected under 35 U.S.C. 102a1 as being anticipated by 
Chaubard US11481751B1

Regarding Claim 1,
a   plurality   of   cameras   positioned   at   the   POS   system   configured   to   capture   the   plurality   of    images   of   the   plurality   of   items   positioned   at   the   POS   system,   
wherein   each   camera   captures   a    corresponding   image   with   a   corresponding   Field   of   View   (FOV)   of   the   plurality   of   items   thereby   capturing   different   item   parameters   associated   with   each   item;   and  
 Chaubard is directed to a checkout station that uses sensors and imagery to detect products.  (Chaubard, abstract, “A retail store automated checkout system uses images, video, or depth data to recognize products being purchased to expedite the checkout process and improve accuracy. All of a store's products, including those not sold in packages, such as fruits and vegetables, are imaged from a series of different angles and in different lighting conditions, to produce a library of images for each product. This library is used in a checkout system that takes images, video, and depth sensor readings as the products pass through the checkout area and remove bottlenecks later in the checkout process. Recognition of product identifiers or attributes such as barcode, QR code or other symbols, as well as OCR of product names, as well as the size and material of the product, can be additional or supplemental devices for identifying products being purchased.”; col.4 lns. 1-26, “In another implementation of the invention, an existing retail self-checkout station is retrofitted with equipment of the invention. In the upgraded form, the self-checkout station will recognize, by computer vision, produce and other non-coded items. One or more cameras are positioned at the station, at different viewing angles, for imaging each item brought forth by the customer. A scale is already included for produce sold by weight, and preferably an interactive touchscreen is provided for confirmation by the customer in the event, for example, the system does not recognize to a sufficiently high probability that selected produce is oranges or grapefruits, or between two different types of oranges or apples. Again, this further information can be added to the library once the customer has confirmed product identity.”))

at   least   one   processor;  
a   memory   coupled   with   the   at   least   one   processor,   the   memory   including   instructions   that,    when   executed   by   the   at   least   one   processor   cause   the   at   least   one   processor   to:  
identify   each   corresponding   item   positioned   at   the   POS   system   when   the   plurality    of   item   parameters   from   the   plurality   of   images   of   each   item   captured   by   the   plurality   of   cameras    match   the   item   parameters   of   a   previously   identified   item   thereby   resulting   in   a   known   item   …
 (Chaubard,col.15,ln.5-col.16,ln.30, “4. Per Object, Perform the Following Analysis a. Measure Length, Width, and Height of the Object and x, y, z of the centroid in global coordinate system This is very straightforward to do once you have a merged track. b. 3D CNN to attempt to classify the Object as a specific UPC and deliver a probability distribution over the possible UPCs This is very straightforward to do once you have a merged track with template matching or a CNN trained on those UPCs. c. 3D CNN to identify the superclass of the Object as a type of product, the colors of the product, the material it is made of, the shape of it, etc. We would train a 3D CNN to output superclass information like this. d. OCR to read any text on the Object We would use an Optical Character Recognition (OCR) algorithm on each image captured at time t. There are many OCR algorithms to choose from. In one embodiment, we would use a CNN to infer a rotated bounding box [x1, y1, x2, y2, angle] for every word in the image. We would then input each bounding box crop in the image into a Convolutional Recurrent Neural Network (CRNN) to output the character sequence inside the box. In some embodiments, we may use a lexicon of possible words and use a Jaccard Similarity between each word in the lexicon and the outputted word to find the true word. If the detected words are near each other, we may merge them into a phrase. In all embodiments, we would take each detected set of words or phrases and assign it to the Object from which the bounding box has the highest Intersection Over Union with. e. Barcode Detection on all sides of the Object Same as OCR except we look for barcodes, not text. f. 3D CNN and 2D CNN to compute an embedding vector for the Object In some embodiments we would use a 3D CNN to compute an embedding vector per Object. This 3D CNN would be trained by taking a database of known Objects some of the same UPCs. At each stage of training we would select two training Objects of the same UPC and one training Object of a different UPC, and use a loss function like triplet embedding loss to give penalty to the 3D CNN if the model does not recognize the same Objects as the same and the different Object as different. In other embodiments we would use a 2D CNN to detect general purpose bounding boxes per camera. Then we would match each bounding box across all cameras to a specific Object. For each bounding box, we would compute an embedding vector that would encode and represent the set of pixels in the bounding box as a vector of n numbers. The 2D CNN would be trained in a similar process as the 3D CNN but instead the training Objects would be training bounding boxes. In some embodiments we will use both approaches”)
and    fail   to   identify   each   corresponding   item   when   the   plurality   of   item   parameters   from   the   plurality    of   images   of   each   item   fail   to   match   the   item   parameters   of   a   previously   identified   item   thereby    resulting   in   an   unknown   item,  
(Chaubard, summary, “If an item in the checkout area is recognized to a sufficiently high probability, meeting a preset threshold of, e.g., 90%, that item is added to the checkout list and charged to the customer. If probability is too low, below the preset threshold, the system may prompt the customer or cashier to reposition the item if it appears stacking or occlusion has occurred, or to remove and scan that item using the store's existing scanning system.”)
map   a   plurality   of   image   pixels   associated   with   each   unknown   item   as   extracted    from   each   image   of   each   unknown   item   as   captured   by   the   plurality   of   cameras   to   a   plurality   of    real   world   coordinates   associated   with   each   unknown   item   as   extracted   from   a   position   of   each    unknown   item   as   positioned   at   the   POS   system,  
(Chaubard, col.14,lns.10-38, “Every camera in the system has its own coordinate system. To fuse this information for each camera in a coherent way, the system will have a fixed and known position and orientation (position=x,y,z, orientation=α, β, γ and scaling=s.sub.x, s.sub.y, s.sub.z) from the global origin to each cameras coordinate system. These 9 numbers will be theoretically static for the life of the system but may require a recalibration from time to time. These parameters can be used to compute an Extrinsic Matrix, H, per camera which includes the following transformations: Rotation, Translation, and Scale. Depending on the type of camera used, each camera may also have a Intrinsic Matrix, K, which defines the deformation from the real world 3D space to the 2D space on the cameras sensor that is not modeled by H. Then H and K together will be used to convert each camera's specific point cloud in its own coordinate system to the global coordinate system, giving us a final fused point cloud across all cameras D.sub.t=[[R, G, B, X, Y, Z] . . . ], which if the cameras are positioned correctly and the products are not stacked, give us a full 360 volumetric coverage of each product.
(56) Once we have the global fused point cloud D.sub.t, we can then run a clustering algorithm on D.sub.t to detect the existence of distinct objects. There are many clustering algorithms that can be used for this task. In one embodiment, we use a 3D CNN or RNN to cluster each specific point p=(x, y, z, R, G, B) to one of an unknown number of centroids. In other embodiments, we may be able to achieve the same thing with a simpler model like Density-based Scanning (DB-Scan) or Mean Shift.”; col.14,lns.42-50, : In another embodiment, we apply an Oriented 3D volume detection from Pixel-wise neural network predictions where we take each cameras pixels and push them through a fully-convolutional neural network as the backbone and merge this data with a Header Network that is tasked with localizing the Objects size (height, length, width), shape (classification, embedding, or otherwise), and heading (angle/orientation in the global coordinate system in α, β, and γ).”)
generate   a   corresponding   bounding   polygon   for   each   unknown   item   that    encapsulates   each   unknown   item   within   the   corresponding   bounding   polygon   based   on   the   image    pixels   with   each   unknown   item   as   mapped   to   the   real   world   coordinates   associated   with   each    unknown   item,   and  
project   the   corresponding   bounding   polygon   onto   each   unknown   item   positioned   at    the   POS   system   to   encapsulate   each   unknown   item   thereby   providing   visual   feedback   for   each    unknown   item   positioned   at   the   POS   system.  
(Chaubard, col.5,ln.55-col.6,ln.2, “If there are overlapping items or stacked items that occlude the camera or cameras, or if there are items in the tunnel that the cameras failed to capture or identify to the threshold probability, or if the tunnel otherwise made an error, then the non-identified items will be shown visually on the screen 20. As an example, a red bounding box can be shown around the unidentified item. The shopper or cashier or auditor in the store, can then manually correct the error by taking the item or items that were not identified and moving them (dashed line 25) by the pre-existing barcode scanner 24 or another scanner, as indicated at 26, therefore correcting the list of items and the amount owed. Alternatively, the system can prompt the customer or cashier to select from a list of screen-shown choices to confirm the correct product from a list of system-generated guesses.”)
  
Regarding Claim 2, Chaubard discloses the system of claim 1.
extract   the   plurality   of   item   parameters   associated   with   each   item   positioned   at   the   POS    system   from   the   plurality   of   images   of   each   item   captured   by   the   plurality   of   cameras   positioned    at   the   POS   system,   wherein   the   item   parameters   associated   with   each   item   when   combined   are    indicative   as   to   an   identification   of   each   corresponding   item   thereby   enabling   the   identification   of    each   corresponding   item;  
analyze   the   plurality   of   item   parameters   associated   with   each   item   positioned   at   the   POS  system   to   determine   whether   the   item   parameters   when   combined   matches   corresponding   item    parameters   stored   in   an   item   parameter   identification   database,  
wherein   the   item   parameter    identification   database   stores   different   combinations   of   item   parameters   with   each   different    combination   of   item   parameters   associated   with   a   corresponding   item   thereby   identifying   each    corresponding   item   based   on   different   combination   of   item   parameters   associated   with   each    corresponding   item;   and  
identify   each   corresponding   item   positioned   at   the   POS   system   when   the   plurality   of   item   parameters   when   combined   matches   corresponding   item   parameters   as   stored   in   the   item   parameter    identification   database   and   fail   to   identify   each   corresponding   item   when   the   plurality   of   item    parameters   when   combined   fails   to   match   corresponding   item   parameters.  
See prior art rejection of claim 1. 

Regarding Claim 3, Chaubard discloses the system of claim 2.

calibrate   each   camera   positioned   at   the   POS   system   to   determine   the   plurality   of   real   world    coordinates   of   the   POS   system   relative   to   a   corresponding   position   of   each   camera,   wherein   the    calibration   of   each   camera   at   the   POS   system   enables   the   real   world   coordinates   of   the   POS   system    to   be   mapped   to   the   plurality   of   image   pixels   as   extracted   from   each   image   captured   by   each    camera.  
(Chaubard, col.14, lns.10-17, “Every camera in the system has its own coordinate system. To fuse this information for each camera in a coherent way, the system will have a fixed and known position and orientation (position=x,y,z, orientation=α, β, γ and scaling=s.sub.x, s.sub.y, s.sub.z) from the global origin to each cameras coordinate system. These 9 numbers will be theoretically static for the life of the system but may require a recalibration from time to time.”)
 
Regarding Claim 4, Chaubard discloses the system of claim 1.
extract   a   plurality   of   metrology   features   as   included   in   the   plurality   of   item   parameters   of    each   item   from   each   image   captured   by   the   plurality   of   cameras   for   each   unknown   item   positioned  
at   the   POS   system,   wherein   the   plurality   of   metrology   features   is   indicative   as   to   a   physical    appearance   of   each   unknown   item   positioned   at   the   POS   system.  
 See prior art rejection of claim 1.
 
 Regarding Claim 5, Chaubard discloses the system of claim 4.
map   the   plurality   of   pixels   associated   with   each   unknown   item   as   extracted   from   each    image   of   each   unknown   item   of   each   unknown   item   to   the   plurality   of   real   world   coordinates    associated   with   each   unknown   item   from   the   position   of   each   unknown   item   based   on   the    metrology   features   of   each   unknown   item,   wherein   the   metrology   features   of   each   unknown   item    enable   the   plurality   of   pixels   to   be   mapped   to   the   real   world   coordinates   of   each   unknown   item    based   on   the   physical   appearance   of   each   unknown   item   at   the   POS   system.  
(Chaubard, col.14,lns.42-50, : In another embodiment, we apply an Oriented 3D volume detection from Pixel-wise neural network predictions where we take each cameras pixels and push them through a fully-convolutional neural network as the backbone and merge this data with a Header Network that is tasked with localizing the Objects size (height, length, width), shape (classification, embedding, or otherwise), and heading (angle/orientation in the global coordinate system in α, β, and γ).”)
 
Regarding Claim 6, Chaubard discloses the system of claim 5.
generate   the   corresponding   bounding   polygon   for   each   unknown   item   that   encapsulates    each   unknown   item   within   the   corresponding   bounding   polygon   based   on   the   image   pixels   of   the    metrology   features   associated   with   each   unknown   item   as   mapped   to   the   real   world   coordinates   of    the   metrology   features   associated   with   each   unknown   item   thereby   enabling   the   corresponding    bounding   polygon   to   encapsulate   the   physical   appearance   of   each   unknown   item   at   the   POS    system;   and  
project   the   corresponding   bounding   onto   each   unknown   item   positioned   at   the   POS   system    to   encapsulate   the   physical   appearance   of   each   unknown   item   based   on   the   metrology   features   of    the   image   pixels   associated   with   each   unknown   item   mapped   to   the   metrology   features   of   the   real    world   coordinates   of   each   unknown   item   thereby   providing   visual   feedback   for   each   unknown    item   positioned   at   the   POS   system.  
(Chaubard, col.8,ln.65-col.9,ln.15, “The preferred embodiment works in the following way. First a Convolutional Neural Network (CNN) is run to infer if there exists a product at each part of the image, and if so impose a bounding shape to fit around each distinct product (perhaps a box or polygon, and if using depth, a bounding cuboid or prism or frustrum. Each bounding shape is processed individually, running optical character recognition (OCR) on it to attempt to read any words on the packaging of the product and using a barcode detection algorithm to try to find and recognize barcodes if in the image, if such exist or are able to be read. In addition, a template matching algorithm is applied, taking a database of labeled “known” products or by using a classifier that was already trained to detect those “known” products and attempting to match each one to the pixels in the bounding shape. This template matching process takes in a known image and the bounding box image and outputs a probability of match.”)
 
Regarding Claim 7, Chaubard discloses the system of claim 6.
analyze   an   accumulation   of   metrology   features   and   item   parameters   as   associated   with   each   corresponding   item   in   the   item   parameter   identification   database   to   determine   the   metrology    features   of   each   unknown   item   positioned   at   the   POS   system;  
map   the   plurality   of   pixels   associated   with   each   unknown   item   as   extracted   from   each    image   of   each   unknown   item   of   each   unknown   item   to   the   plurality   of   real   world   coordinates    associated   with   each   unknown   item   from   the   position   of   each   unknown   item   based   on   the    accumulation   of   metrology   features   and   item   of   each   unknown   item   as   stored   in   the   item   parameter     identification   database,   wherein   the   accumulation   metrology   features   and   item   parameters   of   each     unknown   item   enable   the   plurality   of   pixels   to   be   mapped   to   the   real   world   coordinates   of   each    unknown   item   based   on   the   physical   appearance   of   each   unknown   item   at   the   POS   system;   and  
generate   the   corresponding   bounding   polygon   for   each   unknown   item   that   encapsulates    each   unknown   item   within   the   corresponding   bounding   polygon   based   on   the   image   pixels   of   the    accumulation   of   metrology   features   and   item   parameters   associated   with   each   unknown   item   as    mapped   to   the   real   world   coordinates   of   the   metrology   features   associated   with   each   unknown   item    thereby   enabling   the   corresponding   bounding   polygon   to   encapsulate   the   physical   appearance   of    each   unknown   item   at   the   POS   system.  
(Chaubard, col.15,ln.35-col.16,ln.37, “4. Per Object, Perform the Following Analysis a. Measure Length, Width, and Height of the Object and x, y, z of the centroid in global coordinate system This is very straightforward to do once you have a merged track. b. 3D CNN to attempt to classify the Object as a specific UPC and deliver a probability distribution over the possible UPCs This is very straightforward to do once you have a merged track with template matching or a CNN trained on those UPCs. c. 3D CNN to identify the superclass of the Object as a type of product, the colors of the product, the material it is made of, the shape of it, etc. We would train a 3D CNN to output superclass information like this. d. OCR to read any text on the Object We would use an Optical Character Recognition (OCR) algorithm on each image captured at time t. There are many OCR algorithms to choose from. In one embodiment, we would use a CNN to infer a rotated bounding box [x1, y1, x2, y2, angle] for every word in the image. We would then input each bounding box crop in the image into a Convolutional Recurrent Neural Network (CRNN) to output the character sequence inside the box. In some embodiments, we may use a lexicon of possible words and use a Jaccard Similarity between each word in the lexicon and the outputted word to find the true word. If the detected words are near each other, we may merge them into a phrase. In all embodiments, we would take each detected set of words or phrases and assign it to the Object from which the bounding box has the highest Intersection Over Union with. e. Barcode Detection on all sides of the Object Same as OCR except we look for barcodes, not text. f. 3D CNN and 2D CNN to compute an embedding vector for the Object In some embodiments we would use a 3D CNN to compute an embedding vector per Object. This 3D CNN would be trained by taking a database of known Objects some of the same UPCs. At each stage of training we would select two training Objects of the same UPC and one training Object of a different UPC, and use a loss function like triplet embedding loss to give penalty to the 3D CNN if the model does not recognize the same Objects as the same and the different Object as different. In other embodiments we would use a 2D CNN to detect general purpose bounding boxes per camera. Then we would match each bounding box across all cameras to a specific Object. For each bounding box, we would compute an embedding vector that would encode and represent the set of pixels in the bounding box as a vector of n numbers. The 2D CNN would be trained in a similar process as the 3D CNN but instead the training Objects would be training bounding boxes. In some embodiments we will use both approaches!

(67) 5. Ensemble of all Features to Classify the Object Ft Features for this Object

(68) The key to detecting and identifying each Object with high certainty is to take a number of separate approaches and ensemble the results.

(69) The output of this process will be a predicted SKU per Object with a corresponding confidence score.”)
 

  Regarding Claim 10, Chaubard discloses the system of claim 1.
automatically   display   via   a   user   interface   a   notification   of   each   unknown   item   positioned    at   the   POS   system   as   encapsulated   within   the   corresponding   bounding   polygon   as   projected   onto    each   unknown   item   positioned   at   the   POS   system.  
See prior art rejection of claim 1.

Allowable Subject Matter
Claims 8,9,18,19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.





 
 Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALLEN C CHEIN whose telephone number is (571)270-7985. The examiner can normally be reached Monday-Friday 8am -5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Florian Zeender can be reached at (571) 272-6790. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALLEN C CHEIN/Primary Examiner, Art Unit 3627

Read full office action

Prosecution Timeline

Oct 04, 2024

Application Filed

Feb 03, 2026

Non-Final Rejection mailed — §102, §OTHER (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/455,387

Patent 12632828

DETERMINING PICK PALLET BUILD OPERATIONS AND PICK SEQUENCING

2y 8m to grant Granted May 19, 2026

18/464,959

Patent 12628793

AUTOMATIC ANIMAL FEEDING SYSTEM FOR AUTOMATICALLY REMOVING FOOD POD LIDS

2y 8m to grant Granted May 19, 2026

18/645,622

Patent 12632845

CLOUD-BASED CONFIGURABLE TRANSACTION MANAGEMENT CONTROLLER AND METHOD THEREOF

2y 0m to grant Granted May 19, 2026

18/458,169

Patent 12612252

CYCLIC ROTARY DISPENSING DEVICE AND VENDING DEVICE HAVING SAME

2y 8m to grant Granted Apr 28, 2026

18/812,012

Patent 12609005

DETECTING A SKIMMER VIA A VIBRATION SENSOR

1y 8m to grant Granted Apr 21, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

44%

Grant Probability

84%

With Interview (+40.3%)

3y 9m (~2y 1m remaining)

Median Time to Grant

Low

PTA Risk

Based on 432 resolved cases by this examiner. Grant probability derived from career allowance rate.