DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
For the purpose of examination with regard to prior art, the effective filing date of the instant application is January 4th, 2023 based on documents filed in India.
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in India on 1/4/2023. It is noted, however, that applicant has not filed a certified copy of the IN202311000684 application as required by 37 CFR 1.55.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: Whether a Claim is to a Statutory Category
In the instant case, claims 1-9 recite a method/ process, claims 10-18 recite a system/ machine and claims 19-20 recite a system/ machine that are performing a series of functions. Therefore, these claims fall within the four statutory categories of invention of a machine and a process. Step 1 is satisfied.
Step2A – Prong 1: Does the Claim Recite a Judicial Exception
Exemplary claim 1 (and similarly claims 10 and 19) recites the following abstract concepts that are found to include an enumerated “abstract idea”:
A method to localize a product within one or more images, the method comprising:
receiving the one or more images from an imaging device;
identifying one or more background objects and one or more foreground objects within each of the one or more images using a background detection learning model;
identifying one or more body parts within the one or more foreground objects within each of the one or more images using a trained body part segmentation model;
identifying pixels associated with the product within each of the one or more images by segmenting out the one or more background objects and the one or more body parts from each of the one or more images and removing pixels associated with the one or more background objects and the one or more body parts; and
creating a localized product within each of the one or more image by enclosing the pixels associated with the product within each of the one or more images with an outline.
[Emphasis added to show the bolded abstract idea being executed by unbolded additional elements that do not meaningfully limit the abstract idea]
This method claim is grouped within the "mental processes” grouping of abstract ideas in prong one of step 2A of the Alice/Mayo test because the claims involve a series of steps for observation, evaluation and judgement/ opinion to localize a product within one or more images which is a process that is encompassed by the abstract idea of mental processes. The steps of receiving (observation), identifying (evaluation), removing (judgement/ opinion) and creating (judgement/ opinion) in the context of this claim encompass a human imagining a picture of objects they have seen, detecting said objects in the background or foreground of said picture, outlining different objects of interest in said picture and focusing on one object of interest while removing others, except for the use of technical components disclosed at a high level of generality. The examiner has reviewed each abstract idea from each step individually and in combination with each other limitation, and still finds that the claim 1 recites abstract idea. See e.g., MPEP 2106.04(a)(2)(III)(D); 2106.05(f)&(h); and July 2024 Subject Matter Eligibility Example 47 claim 2. Accordingly, claim 1 (and similarly claims 10 and 19) recite an abstract idea.
Step2A – Prong 2: Does the Claim Recite Additional Elements that Integrate the Judicial Exception into a Practical Application
This judicial exception is not integrated into a practical application because, when analyzed under prong two of step 2A of the Alice/Mayo test, the additional elements of the claims such as imaging device, background detection learning model and trained body part segmentation model merely use a computer as a tool to perform an abstract idea and/or generally link the use of a judicial exception to a particular technological environment. Specifically, the imaging device, background detection learning model and trained body part segmentation model perform the steps or functions of observation, evaluation and judgement/ opinion to localize a product within one or more images. The use of a processor/computer as a tool to implement the abstract idea and/or generally linking the use of the abstract idea to a particular technological environment does not integrate the abstract idea into a practical application because it requires no more than a computer (or technical elements disclosed at a high level of generality such as imaging device, background detection learning model and trained body part segmentation model) performing functions of receiving, identifying, removing and creating that correspond to acts required to carry out the abstract idea (MPEP 2106.05(f) and (h)). Accordingly, the additional elements do not impose any meaningful limits on practicing the abstract idea, and the claims are directed to an abstract idea.
Step2B: Does the Claim Amount to Significantly More
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when analyzed under step 2B of the Alice/Mayo test, the additional elements of imaging device, background detection learning model and trained body part segmentation model being used to perform the steps of receiving, identifying, removing and creating amounts to no more than using a computer or processor to automate and/or implement the abstract idea of observation, evaluation and judgement/ opinion to localize a product within one or more images. As discussed above, taking the claim elements separately, imaging device, background detection learning model and trained body part segmentation model perform the steps or functions of mental processes of observation, evaluation and judgement/ opinion to localize a product within one or more images. These functions correspond to the actions required to perform the abstract idea. Viewed as a whole, the combination of elements recited in the claims merely recite the concept of mental processes of observation, evaluation and judgement/ opinion to localize a product within one or more images because said combination of elements remains disclosed at a high level of generality. Therefore, the use of these additional elements does no more than employ the computer as a tool to automate and/or implement the abstract idea. The use of a computer or processor to merely automate and/or implement the abstract idea cannot provide significantly more than the abstract idea itself (MPEP 2106.05(l)(A)(f) & (h)). Therefore, the claims are not patent eligible.
Independent claim 10 describes a system performing the functions of receiving, identifying, removing and creating also relating to mental processes without additional elements beyond technical elements disclosed at a high level of generality such as a imaging device, computing system, processors and memory that provide significantly more than the abstract idea of mental processes of observation, evaluation and judgement/ opinion to localize a product within one or more images as noted above regarding claim 1. Therefore, this independent claim is also not patent eligible.
Independent claim 19 describes a system performing the functions of receiving, identifying, removing, creating, determining, estimating and retrieving also relating to mental processes without additional elements beyond technical elements disclosed at a high level of generality such as a self-checkout unit, point-of-sale terminal, imaging device, computing system, processor, memory, background detection learning model and trained body part segmentation model that provide significantly more than the abstract idea of mental processes of observation, evaluation and judgement/ opinion to localize a product within one or more images as noted above regarding claim 1. Therefore, this independent claim is also not patent eligible.
Dependent claims 2-7, 11, 13-17 and 20 further describe the abstract idea of mental processes of observation, evaluation and judgement/ opinion. These claims do not include additional elements to perform their respective functions of determining, capturing, estimating, retrieving, identifying, sending, segmenting and selecting beyond the technical elements disclosed at a high level of generality such as self-checkout unit, point-of-sale terminal, imaging device, background detection learning model and object segmentation model and as disclosed in independent claims 1, 10 and 19, respectively, that integrate the abstract idea into a practical application or that provide significantly more than the abstract idea. Therefore, these dependent claims are also not patent eligible. Further, the dependency of these claims on ineligible independent claim 1 also renders dependent claims 2-7, 11, 13-17 and 20 as not patent eligible.
Dependent claims 8 and 18 further describe the abstract idea of mental processes of observation, evaluation and judgement/ opinion. Although these dependent claims add that the background detection learning model is a Mixture of Gaussians 2 (MOG2) model and that the trained body part segmentation model is a video object segmentation (VOS) model, this mere designation of model types does not include additional elements to perform the respective functions executed by said models beyond the technical elements disclosed at a high level of generality as disclosed in independent claims 1 and 10 that integrate the abstract idea into a practical application or that provide significantly more than the abstract idea. This is because said claims do not reflect a training process to show how said models execute their respective functions nor how said trained body part segmentation model is trained. See July 2024 Subject Matter Eligibility Example 47. Therefore, these dependent claims are also not patent eligible. Further, the dependency of this claim on ineligible independent claims 1 and 10 also renders dependent claims 8 and 18 as not patent eligible.
Dependent claims 9 and 12 include additional descriptive material to the steps of their respective independent claims such that said steps remain as executed by technical elements disclosed at a high level of generality in a manner that does not amount to more than mere computer implementation of an abstract idea. Therefore, dependent claims 9 and 12 do not provide significantly more than the abstract idea of mental processes of observation, evaluation and judgement/ opinion as noted above regarding independent claims 1 and 10. Therefore, these dependent claims are also not patent eligible. Further, the dependency of these claims on ineligible independent claims 1 and 10 also renders dependent claims 9 and 12 as not patent eligible.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-7, 9-17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wen et al. (US 2021/0183212 A1) in view of Birnie et al. (US 2020/0410825 A1).
Regarding claim 1, Wen teaches:
A method to localize a product within one or more images (See Wen ¶ [0067-0069] - extract a moving object as foreground from a static background during an item scanning event), the method comprising:
receiving the one or more images from an imaging device (See Wen ¶ [0230] – self-service checkout terminal using a camera as an image capture apparatus);
identifying one or more background objects and one or more foreground objects within each of the one or more images using a background detection learning model (See Wen ¶ [0066-0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background);
identifying one or more body parts within the one or more foreground objects within each of the one or more images using a trained body part segmentation model (See Wen ¶ [0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background, including a plurality of key points of hands of a user may be obtained through algorithms such as OpenPose. Postures of the hands of the user are determined according to the key points);
identifying pixels associated with the product within each of the one or more images by segmenting out the one or more background objects and the one or more body parts from each of the one or more images (See Wen ¶ [0067] – Motion segmentation is to mark pixels associated with each independent motion in a plurality of types of motions of sequence features and cluster the pixels according to a media object to which each of the pixels belongs. The main objective is to extract a moving object as foreground from a static background. In addition, a plurality of key points of hands of a user may be obtained through algorithms such as OpenPose. Postures of the hands of the user are determined according to the key points) …; and
creating a localized product within each of the one or more image by enclosing the pixels associated with the product within each of the one or more images with an outline (See Wen ¶ [0067-0069] – When an object moves, a brightness pattern [outline] of a point corresponding to the object in an image also moves…mark pixels associated with each independent motion in a plurality of types of motions of sequence features and cluster the pixels according to a media object to which each of the pixels belongs, wherein said media objects are items to be scanned or hands of users and object movement.).
While Wen teaches a self-service checkout system using machine learning techniques to segment video data of a user scanning objects into background and foreground image segments by detecting object movement and body parts (Wen ¶ [0066-0067]), Wen does not explicitly teach removing pixels associated with the one or more background objects and the one or more body parts. This is taught by Birnie (See Birnie ¶ [0028] – Item tracker may also be trained on background images for the bagging area and the staging area for purposes of being able to remove pixel data associated with the background and separate foreground item pixel data from background pixel data and [0032] - there is more image pixel noise in the captured image data that must be removed during the security processing (the conventional images include people and other structures unrelated to the items being processed for transaction security checking)). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the machine learning based video data segmenting for product recognition self-service checkout system of Wen the use of image background pixel removal as taught by Birnie to make security for the transaction more accurate and reduces false positive security alerts (Birnie ¶ [0032]), thereby increasing the accuracy and efficiency of Wen’s machine learning based self-service checkout system.
Regarding claim 2, modified Wen teaches:
The method of claim 1, wherein all objects within each of the one or more images that are not identified to be the one or more background image by the background detection learning model are determined to be the one or more foreground objects (See Wen ¶ [0066-0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background).
Regarding claim 3, modified Wen teaches:
The method of claim 1, wherein the one or more images are captured by the imaging device are of a user checking out a product at a self-checkout unit within a retail store (See Wen ¶ [0230] – self-service checkout terminal using a camera as an image capture apparatus).
Regarding claim 4, modified Wen teaches:
The method of claim 3, further comprising:
determining whether the user is performing a scanning action (See Wen ¶ [0176] – determining user behavior corresponding to a scanning action);
upon determining that the user is performing the scanning action, estimating a time interval at which the user performs the scanning action (See Wen ¶ [0176] – determining time information corresponding to a scanning action according to the posture data of the user, where the time information includes at least one of the following: a start time, an end time, and a time period in which the action takes place.);
retrieving transaction data for the time interval from a point-of-sale terminal (See Wen ¶ [0177] – receiving a scanning result of an item that is sent by the self-service checkout terminal);
determining whether a checkout transaction was recorded among the transaction data for the time interval (See Wen ¶ [0177] – obtaining a detection result of skip-scanning behavior of the user according to the video and the scanning result); and
upon determining that the checkout transaction was not recorded among the transaction data for the time interval, determining that a miss scan theft occurred (See Wen ¶ [0178] – identifying attribute information of the item according to the video; determining attribute information of the item according to the scanning result of the item; and determining that the user has skip-scanning behavior if the attribute information of the item identified according to the video is inconsistent with the attribute information of the item determined according to the scanning result.).
Regarding claim 5, modified Wen teaches:
The method of claim 4, wherein determining whether the user is performing the scanning action includes:
determining whether a position of the localized product among the one or more images moves from one side of a flatbed area of the self-checkout unit to another side of the flatbed area the self-checkout unit (See Wen ¶ [0059-0061] – self-service checkout terminal with stand [flatbed] regions A, B and C for staging an item, scanning an item and placing scanned items, thereby showing movement across said stand, [0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background and Fig. 3 – showing said self-service checkout terminal with a flat stand for scanning items).
Regarding claim 6, modified Wen teaches:
The method of claim 4, wherein estimating the time interval at which the user performs the scanning action includes:
identifying a set of images among the one or more images where the localized product is positioned over a flatbed area of the self-checkout unit (See Wen ¶ [0059-0063] – self-service checkout terminal with stand [flatbed] regions A, B and C, wherein video [set of images] of a user scanning items is acquired by an image-capturing apparatus, [0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background, thereby localizing said object and Fig. 3 – showing said self-service checkout terminal with a flat stand for scanning items);
determining timestamps associated with the set of images (See Wen ¶ [0070-0071] – determining timestamps of user actions based on user behavior extracted from the video); and
determining a time interval that spans across the timestamps (See Wen ¶ [0071] – the corresponding time period may be determined based on a timestamp of the first action and a timestamp of the second action.).
Regarding claim 7, modified Wen teaches:
The method of claim 4, further comprising:
sending a notification to an employee of the retail store alerting the employee that the miss scan theft occurred (See Wen ¶ [0097] – If the quantity of skipped scans of the user is greater than or equal to the first preset threshold, settlement on the user may be obstructed. The settlement-forbidden interface is displayed, and the warning information may be further sent to the on-site monitoring terminal to alert an on-site monitoring person.).
Regarding claim 9, modified Wen teaches:
The method of claim 1, wherein the one or more background objects include static objects within each of the one or more images that do not change position over a period of time and the one or more foreground objects are all objects within the one or more images that are not the one or more background objects (See Wen ¶ [0066-0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background).
Regarding claim 10, modified Wen teaches:
A system to localize a product within one or more images, the system comprising (See Wen ¶ [0067-0069] - extract a moving object as foreground from a static background during an item scanning event):
an imaging device (See Wen ¶ [0230] – self-service checkout terminal using a camera as an image capture apparatus);
a computing system comprising:
a processor (See Wen ¶ [0230] – self-service checkout terminal using a camera as an image capture apparatus connected to a processor);
a memory communicatively connected to the processor which stores program instructions executable by the processor, wherein, when executed the program instructions (See Wen ¶ [0239-0240] – program instructions stored in memory, executed by a processor) cause the system to:
receive the one or more images from the imaging device (See Wen ¶ [0230] – self-service checkout terminal using a camera as an image capture apparatus);
identify one or more background objects and one or more foreground objects within each of the one or more images using a background detection learning model (See Wen ¶ [0066-0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background);
identify one or more body parts within the one or more foreground objects within each of the one or more images using a trained body part segmentation model (See Wen ¶ [0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background, including a plurality of key points of hands of a user may be obtained through algorithms such as OpenPose. Postures of the hands of the user are determined according to the key points);
identify pixels associated with the product within each of the one or more images by segmenting out the one or more background objects and the one or more body parts from each of the one or more images (See Wen ¶ [0067] – Motion segmentation is to mark pixels associated with each independent motion in a plurality of types of motions of sequence features and cluster the pixels according to a media object to which each of the pixels belongs. The main objective is to extract a moving object as foreground from a static background. In addition, a plurality of key points of hands of a user may be obtained through algorithms such as OpenPose. Postures of the hands of the user are determined according to the key points)…; and
create a localized product within each of the one or more image by enclosing the pixels associated with the product within each of the one or more images with an outline (See Wen ¶ [0067-0069] – When an object moves, a brightness pattern [outline] of a point corresponding to the object in an image also moves…mark pixels associated with each independent motion in a plurality of types of motions of sequence features and cluster the pixels according to a media object to which each of the pixels belongs, wherein said media objects are items to be scanned or hands of users and object movement.).
While Wen teaches a self-service checkout system using machine learning techniques to segment video data of a user scanning objects into background and foreground image segments by detecting object movement and body parts (Wen ¶ [0066-0067]), Wen does not explicitly teach removing pixels associated with the one or more background objects and the one or more body parts. This is taught by Birnie (See Birnie ¶ [0028] – Item tracker may also be trained on background images for the bagging area and the staging area for purposes of being able to remove pixel data associated with the background and separate foreground item pixel data from background pixel data and [0032] - there is more image pixel noise in the captured image data that must be removed during the security processing (the conventional images include people and other structures unrelated to the items being processed for transaction security checking)). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the machine learning based video data segmenting for product recognition self-service checkout system of Wen the use of image background pixel removal as taught by Birnie to make security for the transaction more accurate and reduces false positive security alerts (Birnie ¶ [0032]), thereby increasing the accuracy and efficiency of Wen’s machine learning based self-service checkout system.
Regarding claim 11, modified Wen teaches:
The system of claim 10, wherein segmenting out the one or more background objects and the one or more body parts from an image includes selecting all pixels within the image that are not identified as belonging to one of the one or more background objects or the one or more body parts (See Wen ¶ [0066-0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background).
Regarding claim 12, modified Wen teaches:
The system of claim 10, wherein the one or more background objects include static objects within each of the one or more images that do not change position over a period of time and the one or more foreground objects are all objects within the one or more images that are not the one or more background objects (See Wen ¶ [0066-0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background).
Regarding claim 13, modified Wen teaches:
The system of claim 10, wherein the one or more images are captured by the imaging device are of a user checking out a product at a self-checkout unit within a retail store (See Wen ¶ [0230] – self-service checkout terminal using a camera as an image capture apparatus).
Regarding claim 14, modified Wen teaches:
The system of claim 13, wherein when executed, the program instructions further cause the system to:
determine whether the user is performing a scanning action (See Wen ¶ [0176] – determining user behavior corresponding to a scanning action);
upon determining that the user is performing the scanning action, estimate a time interval at which the user performs the scanning action (See Wen ¶ [0176] – determining time information corresponding to a scanning action according to the posture data of the user, where the time information includes at least one of the following: a start time, an end time, and a time period in which the action takes place.);
retrieve transaction data for the time interval from a point-of-sale terminal associated with the self-checkout unit (See Wen ¶ [0177] – receiving a scanning result of an item that is sent by the self-service checkout terminal);
determine whether a checkout transaction was recorded among the transaction data for the time interval (See Wen ¶ [0177] – obtaining a detection result of skip-scanning behavior of the user according to the video and the scanning result); and
upon determining that the checkout transaction was not recorded among the transaction data for the time interval, determine that a miss scan theft occurred (See Wen ¶ [0178] – identifying attribute information of the item according to the video; determining attribute information of the item according to the scanning result of the item; and determining that the user has skip-scanning behavior if the attribute information of the item identified according to the video is inconsistent with the attribute information of the item determined according to the scanning result.).
Regarding claim 15, modified Wen teaches:
The system of claim 14, wherein to determine whether the user is performing the scanning action includes to:
determine whether a position of the localized product among the one or more images moves from one side of a flatbed area of the self-checkout unit to another side of the flatbed area the self-checkout unit (See Wen ¶ [0059-0061] – self-service checkout terminal with stand [flatbed] regions A, B and C for staging an item, scanning an item and placing scanned items, thereby showing movement across said stand, [0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background and Fig. 3 – showing said self-service checkout terminal with a flat stand for scanning items).
Regarding claim 16, modified Wen teaches:
The system of claim 14, wherein to estimate the time interval at which the user performs the scanning action includes to:
identify a set of images among the one or more images where the localized product is positioned over a flatbed area of the self-checkout unit (See Wen ¶ [0059-0063] – self-service checkout terminal with stand [flatbed] regions A, B and C, wherein video [set of images] of a user scanning items is acquired by an image-capturing apparatus, [0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background, thereby localizing said object and Fig. 3 – showing said self-service checkout terminal with a flat stand for scanning items);
determine timestamps associated with the set of images (See Wen ¶ [0070-0071] – determining timestamps of user actions based on user behavior extracted from the video); and
determine a time interval that spans across the timestamps (See Wen ¶ [0071] – the corresponding time period may be determined based on a timestamp of the first action and a timestamp of the second action.).
Regarding claim 17, modified Wen teaches:
The system of claim 14, wherein when executed, the program instructions further cause the system to:
send a notification to an employee of the retail store alerting the employee that the miss scan theft occurred (See Wen ¶ [0097] – If the quantity of skipped scans of the user is greater than or equal to the first preset threshold, settlement on the user may be obstructed. The settlement-forbidden interface is displayed, and the warning information may be further sent to the on-site monitoring terminal to alert an on-site monitoring person.).
Regarding claim 19, modified Wen teaches:
A system to detect miss scan theft (See Wen ¶ [0059-0061] – self-service checkout terminal to detect skip-scanning events), the system comprising :
a self-checkout unit comprising (See Wen ¶ [0059-0061] – self-service checkout terminal to detect skip-scanning events):
a flatbed area (See Wen ¶ [0059-0061] – self-service checkout terminal with a stand [flatbed]);
a point-of-sale terminal (See Wen ¶ [0059-0061] – self-service checkout terminal with a display screen functioning as a point-of sale terminal);
an imaging device (See Wen ¶ [0059-0061] – self-service checkout terminal with a camera); and
a computing system comprising:
a processor (See Wen ¶ [0230] – self-service checkout terminal using a camera as an image capture apparatus connected to a processor);
a memory communicatively connected to the processor which stores program instructions executable by the processor, wherein, when executed the program instructions (See Wen ¶ [0239-0240] – program instructions stored in memory, executed by a processor) cause the system to:
receive one or more images from the imaging device (See Wen ¶ [0230] – self-service checkout terminal using a camera as an image capture apparatus);
identify one or more background objects and one or more foreground objects within each of the one or more images using a background detection learning model (See Wen ¶ [0066-0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background);
identify one or more body parts within the one or more foreground objects within each of the one or more images using a trained body part segmentation model (See Wen ¶ [0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background, including a plurality of key points of hands of a user may be obtained through algorithms such as OpenPose. Postures of the hands of the user are determined according to the key points);
identify pixels associated with a product within each of the one or more images by segmenting out the one or more background objects and the one or more body parts from each of the one or more images (See Wen ¶ [0067] – Motion segmentation is to mark pixels associated with each independent motion in a plurality of types of motions of sequence features and cluster the pixels according to a media object to which each of the pixels belongs. The main objective is to extract a moving object as foreground from a static background. In addition, a plurality of key points of hands of a user may be obtained through algorithms such as OpenPose. Postures of the hands of the user are determined according to the key points)…;
create a localized product within each of the one or more image by enclosing the pixels associated with the product within each of the one or more images with an outline (See Wen ¶ [0067-0069] – When an object moves, a brightness pattern [outline] of a point corresponding to the object in an image also moves…mark pixels associated with each independent motion in a plurality of types of motions of sequence features and cluster the pixels according to a media object to which each of the pixels belongs, wherein said media objects are items to be scanned or hands of users and object movement.);
determine whether a user is performing a scanning action at the self- checkout unit (See Wen ¶ [0176] – determining user behavior corresponding to a scanning action) by determining whether a position of the localized product among the one or more images moves from one side of the flatbed area of the self- checkout unit to another side of the flatbed area the self-checkout unit (See Wen ¶ [0059-0061] – self-service checkout terminal with stand [flatbed] regions A, B and C for staging an item, scanning an item and placing scanned items, thereby showing movement across said stand, [0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background and Fig. 3 – showing said self-service checkout terminal with a flat stand for scanning items);
upon determining that the user is performing the scanning action, estimate a time interval at which the user performs the scanning action (See Wen ¶ [0176] – determining time information corresponding to a scanning action according to the posture data of the user, where the time information includes at least one of the following: a start time, an end time, and a time period in which the action takes place.);
retrieve transaction data for the time interval from the point-of-sale terminal associated (See Wen ¶ [0177] – receiving a scanning result of an item that is sent by the self-service checkout terminal);
determine whether a checkout transaction was recorded among the transaction data for the time interval (See Wen ¶ [0177] – obtaining a detection result of skip-scanning behavior of the user according to the video and the scanning result); and
upon determining that the checkout transaction was not recorded among the transaction data for the time interval, determine that a miss scan theft occurred (See Wen ¶ [0178] – identifying attribute information of the item according to the video; determining attribute information of the item according to the scanning result of the item; and determining that the user has skip-scanning behavior if the attribute information of the item identified according to the video is inconsistent with the attribute information of the item determined according to the scanning result.).
While Wen teaches a self-service checkout system using machine learning techniques to segment video data of a user scanning objects into background and foreground image segments by detecting object movement and body parts (Wen ¶ [0066-0067]), Wen does not explicitly teach removing pixels associated with the one or more background objects and the one or more body parts. This is taught by Birnie (See Birnie ¶ [0028] – Item tracker may also be trained on background images for the bagging area and the staging area for purposes of being able to remove pixel data associated with the background and separate foreground item pixel data from background pixel data and [0032] - there is more image pixel noise in the captured image data that must be removed during the security processing (the conventional images include people and other structures unrelated to the items being processed for transaction security checking)). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the machine learning based video data segmenting for product recognition self-service checkout system of Wen the use of image background pixel removal as taught by Birnie to make security for the transaction more accurate and reduces false positive security alerts (Birnie ¶ [0032]), thereby increasing the accuracy and efficiency of Wen’s machine learning based self-service checkout system.
Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Wen et al. (US 2021/0183212 A1) in view of Birnie et al. (US 2020/0410825 A1) and Ma et al. (US 2022/0245709 A1).
Regarding claim 8, modified Wen teaches:
The method of claim 1, wherein the background detection learning model … and the trained body part segmentation model is a video object segmentation (VOS) model (See Wen ¶ [0066-0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background, including a plurality of key points of hands of a user may be obtained through algorithms such as OpenPose. Postures of the hands of the user are determined according to the key points.).
While Wen teaches a self-service checkout system using machine learning techniques to segment video data of a user scanning objects into background and foreground image segments by detecting object movement (Wen ¶ [0066-0067]), Wen does not explicitly teach that said machine learning techniques include Mixture of Gaussians 2 (MOG2) model. This is taught by Ma (See Ma ¶ [0086] – determining a product type using a Gaussian Mixtures or two). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the machine learning based video data segmenting for product recognition self-service checkout system of Wen the use of a Gaussian Mixtures or two machine learning technique for product determination as taught by Ma to beneficially reduce computing resources and costs (Ma ¶ [0114]), thereby increasing the efficiency of Wen’s machine learning based self-service checkout system.
Regarding claim 18, modified Wen teaches:
The system of claim 10, wherein the background detection learning model … and the trained body part segmentation model is a video object segmentation (VOS) model (See Wen ¶ [0066-0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background, including a plurality of key points of hands of a user may be obtained through algorithms such as OpenPose. Postures of the hands of the user are determined according to the key points.).
While Wen teaches a self-service checkout system using machine learning techniques to segment video data of a user scanning objects into background and foreground image segments by detecting object movement (Wen ¶ [0066-0067]), Wen does not explicitly teach that said machine learning techniques include Mixture of Gaussians 2 (MOG2) model. This is taught by Ma (See Ma ¶ [0086] – determining a product type using a Gaussian Mixtures or two). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the machine learning based video data segmenting for product recognition self-service checkout system of Wen the use of a Gaussian Mixtures or two machine learning technique for product determination as taught by Ma to beneficially reduce computing resources and costs (Ma ¶ [0114]), thereby increasing the efficiency of Wen’s machine learning based self-service checkout system.
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Wen et al. (US 2021/0183212 A1) in view of Birnie et al. (US 2020/0410825 A1) and Obinata et al. (US 2024/0212320 A1).
Regarding claim 20, modified Wen teaches:
The system of claim 19, wherein when executed, the program instructions further cause the system to:
identify one or more of common objects within the one or more foreground objects within each of the one or more images using an object segmentation model (See Wen ¶ [0067] – motion segmentation algorithm and other machine learning techniques to extract a moving object as foreground from a static background, including a plurality of key points of hands of a user may be obtained through algorithms such as OpenPose. Postures of the hands of the user are determined according to the key points),
; and
segment out the one or more common objects when identifying the pixels associated with the product within each of the one or more images (See Wen ¶ [0067-0069] – When an object moves, a brightness pattern [outline] of a point corresponding to the object in an image also moves…mark pixels associated with each independent motion in a plurality of types of motions of sequence features and cluster the pixels according to a media object to which each of the pixels belongs, wherein said media objects are items to be scanned or hands of users and object movement.).
While Wen teaches a self-service checkout system using machine learning techniques to segment video data of a user scanning objects into background and foreground image segments by detecting object movement (Wen ¶ [0066-0067]), Wen does not explicitly teach that said objects include one or more of: at least a portion of a shopping basket, at least a portion of a shopping cart, a cellular telephone, a purse, a wallet, or at least a portion of clothing associated with the user. This is taught by Obinata (See Obinata ¶ [0077] – detecting common objects such as shopping baskets, clothes and accessories). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the machine learning based video data segmenting for object recognition self-service checkout system of Wen the use of common object recognition as taught by Obinata to implement improvement of accuracy of recognizing a specific object from an image (Obinata ¶ [0060]), thereby increasing the accuracy of Wen’s machine learning based self-service checkout system.
Response to Arguments
Applicant's arguments filed 09/16/2025 have been fully considered but they are not persuasive.
Rejection under 35 U.S.C. § 101:
The amendments to independent claims 1, 10 and 19 do not improve patent eligibility of the claimed invention of the instant application and the previous rejection under 35 U.S.C. § 101 in maintained.
Contrary to the applicant’s assertion that amended independent claims 1, 10 and 19 improves a multi-stage computer vision process, said amendments leave the methods of said claims as executed by technical elements disclosed at a high level of generality such that said methods are not more than merely applying a computer to perform the functions required by said methods, which does not show integration into a practical application nor does it show significantly more than the abstract ideas discussed above in the current rejection under 35 U.S.C. § 101. The amended limitations: “removing pixels associated with the one or more background objects and the one or more body parts” are only focusing on specific areas [pixels] in a captured image. A technical reason for said “removing” is needed to show integration into a practical application and significantly more than the abstract idea. Any improvement of a claimed invention must be clearly reflected by said claims.
Dependent claims 2-9, 11-18 and 20 also remain rejected as described above in the current rejection under 35 U.S.C. § 101.
Rejection under 35 U.S.C. § 102:
The amendments to independent claims 1, 10 and 19 overcome the Wen prior art reference of record and the previous rejection under 35 U.S.C. § 102 is withdrawn.
However, the claimed invention of the instant application, as shown by independent claims 1, 10 and 19 remains unpatentable because it would be obvious to one of ordinary skill in the art to combine features of Birnie in the system of Wen as detailed in the current rejection under 35 U.S.C. § 103 above to teach the amended claimed functions of “removing pixels associated with the one or more background objects and the one or more body parts.”
The applicant gives no new arguments for any of dependent claims 2-7, 9 and 11-17 and said dependent remain rejected as shown above in the current rejection under 35 U.S.C. § 103.
Rejection under 35 U.S.C. § 103:
The applicant gives no new arguments for any of dependent claims 8, 18 and 20 other than the deficiencies of their respective independent base claims. Therefore, said dependent claims remain rejected as shown above in the current rejection under 35 U.S.C. § 103.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW S WERONSKI whose telephone number is (571)272-5802. The examiner can normally be reached M-F 8 am - 5 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fahd A. Obeid can be reached at 5712703324. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MATTHEW S WERONSKI/Examiner, Art Unit 3627
/MICHAEL JARED WALKER/Primary Examiner, Art Unit 3627