DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This is a nonfinal rejection in response to amendments/remarks filed on 11/24/2025. Claims 1, 3-4, 6, 8-11, 13, 15-16, 18 and 20 have been amended. Claims 2, 5, 7, 12, 14, 17, and 19 remain cancelled. Claims 21 and 22 have been cancelled. Claims 1, 3-4, 6, 8-11, 13, 15-16, 18 and 20 are pending and are examined herein.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/24/2025 has been entered.
Priority
The effective filing date is the filing date of the present application, 02/01/2023.
Claim Rejections – 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 3-4, 6, 8-11, 13, 15-16, 18, and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: Is the claim to a Process, Machine, Manufacture, or Composition of Matter?
Claim 1, and its dependent claims 3, 4, 6, 8-10 are directed to: A method...
Claim 11, and its dependent claims 13, and 15 are directed to: A system, comprising: a processor; and a memory having instructions stored thereon which, when executed on the processor, performs operations comprising:
Claim 16 and its dependent claims 18 and 20 are directed to: A non-transitory computer-readable medium having instructions stored thereon which, when executed by a processor, performs operations comprising:
Therefore, all of the claims are directed to at least potentially eligible subject matter category since 1, 3-4, 6, 8-10 are directed to a process, and claims 11, 13, 15, 16, 18, 20 are directed to a machine or manufacture. Therefore the claims are to be further analyzed under step 2 of the 2 step analysis.
Step 2a Prong 1: Is the claim directed to a Judicial Exception(A Law of Nature, a Natural Phenomenon (Product of Nature), or An Abstract Idea?)
The claims under the broadest reasonable interpretation in light of the specification are analyzed herein. Representative claims 1, 11, and 16 are marked up, isolating the abstract idea from additional elements, wherein the abstract idea is in bold and the additional elements have been italicized as follows:
Claim 1: A method comprising:
Claim 11: A system, comprising: a processor; and a memory having instructions stored thereon which, when executed on the processor, performs operations comprising:
Claim 16: A non-transitory computer-readable medium having instructions stored thereon which, when executed by a processor, performs operations comprising:
Claims 1, 11, and 16:
receiving, at a computing device comprising a sensor layer and a prediction layer, sensor data from a plurality of sensors arranged in a physical environment, the sensor data comprising position and sound data associated with physical movement and emitted sound within a predetermined vicinity of the physical environment;
transferring the sensor data to the sensor layer wherein the sensor layer is configured to execute pre-processing machine learning (ML) analysis on the sensor data;
responsive to executing the pre-processing ML analysis on the sensor data, generating a ML model vector, wherein the ML model vector comprises a sound characteristic, an object characteristic, and a user gesture associated with the sensor data;
transferring the ML model vector to the prediction layer, wherein the prediction layer is configured to train a prediction ML model;
detecting, using the prediction ML model, abnormalities in a fulfillment workflow metric stream; and
responsive to detecting the abnormalities in the fulfillment workflow metric stream, modifying at least one metric within the fulfillment workflow metric stream.
When evaluating the bolded limitations of the claims under the broadest reasonable interpretation in light of the specification, it is clear that representative claims 1, 11, and 16 recite an abstract idea categorized as “certain methods of organizing human activity.” This abstract idea grouping found in MPEP 2106.04(a)(2)(II) includes concepts related to “fundamental economic principles or practices,” “commercial or legal interactions,” and “managing personal behavior or relationships or interactions between people.” The present invention falls under “managing personal behavior or relationships or interactions between people,” which further includes social activities, teaching, and following rules or instructions. When considering the claims in bold, the claims recite: receiving, at a sensor layer and a prediction layer, data from a physical environment, the data comprising position and sound data associated with physical movement and emitted sound within a predetermined vicinity of the physical environment;
transferring the data to the sensor layer wherein the sensor layer is configured to execute pre-processing analysis on the data;
responsive to executing the pre-processing analysis on the data, generating a model vector, wherein the model vector comprises a sound characteristic, an object characteristic, and a user gesture associated with the data;
transferring the model vector to the prediction layer, wherein the prediction layer is configured to train a prediction model;
detecting, using the prediction model, abnormalities in a fulfillment workflow metric stream; and
responsive to detecting the abnormalities in the fulfillment workflow metric stream, modifying at least one metric within the fulfillment workflow metric stream.
When considering the limitations above, it is clear that the bolded claims recite “certain methods of organizing human activity” in that it generally recites the process of receiving input data including position and sound data from a physical environment, performing analysis on the data whilst merely reciting the intended outcome of the analysis, generating a model vector with sound, object, and user gesture characteristics which are merely analyzing human behaviors, training the prediction model to detect workflow abnormalities, and modifying the workflow in response to the detection. This is merely reciting the business practice of receiving data, detecting anomalies, and responding to the anomalies, along with data processing steps towards performing the same. When read in view of the specification, it is clear that this is a recitation of a “commercial or legal interaction” and “managing personal behavior, interactions or relationships between people” in view of paragraph [0029], “For example, the predicted issues 192 can identify issues that are predicted to impact success for order fulfillment for a business (e.g., a restaurant). The improvement recommendation 194 can identify recommended changes to improve order fulfillment. For example, the improvement recommendation 194 can recommend modified staffing levels (e.g., more employees, different employee assignments, etc.), modified order fulfillment flow, modified customer interactions (e.g., messaging to customers), or any other suitable recommendations. In an embodiment, the prediction ML model 174 can generate these improvements based on identifying the features most impactful in the predicted issues 192. Alternatively, or in addition, another ML model or an algorithmic technique can be used to analyze the prediction ML model 174 and identify features impacting the predicted issues 192, and this can be used to generate the improvement recommendation 194. For example, a feature relating to kitchen staffing could be identified as impactful in the prediction ML model, and the improvement recommendation could recommend changing this staffing. ”
The sensor layer, prediction layer, model, and model vector are recited broadly such that they are merely black box processes where the intended outcome is part of the abstract idea. Thus these elements, given that they are not tied to specific structure, are still part of the abstract idea. Furthermore, the amended step of “responsive to detecting the abnormalities in the fulfillment workflow metric stream, modifying at least one metric within the fulfillment workflow metric stream” does not necessarily preclude the step from being performed as instructions to a person to manually manage the workflow stream, and does not describe the limitation in a manner that necessarily invokes the use of computers. Therefore, this step is still part of the abstract idea of “certain methods of organizing human activity.”
Therefore, the claims still recite an abstract idea under “certain methods of organizing human activity,” and are to be further analyzed under step 2a Prong 2.
Step 2A Prong 2: Does the claim recite additional elements that integrate the judicial exception into a practical application?
Claims 1, 11, and 16 recite the following additional elements:
(a)- a processor in claim 11 and 16
(b)- a memory having instructions stored thereon which, when executed on the processor, performs operations comprising in claim 11
(c)- A non-transitory computer-readable medium having instructions stored thereon which, when executed by a processor, performs operations comprising: in claim 16
(d)-a plurality of sensors in claims 1, 11, and 16
(e)-that the data is specifically sensor data in claims 1, 11, and 16
(f)- a computing device comprising a sensor layer and a prediction layer, in claims 1, 11, and 16
(g) -that the pre-processing analysis is specifically pre-processing machine learning (ML) analysis in claims 1, 11, and 16
(h) – that the model is specifically a machine learning (ML) model in claims 1, 11, and 16
The additional elements listed above are no more than a recitation of the words “apply it” (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer on its ordinary capacity. In this case, the abstract idea of “receiving input data including position and sound data from a physical environment, performing analysis on the data whilst merely reciting the intended outcome of the analysis, generating a model vector with sound, object, and user gesture characteristics which are merely analyzing human behaviors, training the prediction model to detect workflow abnormalities, and modifying the workflow in response to the detection” is merely applied to and instructed to be performed on generic computing components (a, b, c, and f). The processor, memory, and non-transitory computer readable medium are generic computing components as evidenced in paragraph [0032], “[0032]Figure 2 is a block diagram illustrating a prediction controller for intelligent order fulfillment, according to one embodiment. The controller 200 includes a processor 202, a memory 210, and network components 220. The memory 210 may take the form of any non-transitory computer-readable medium. The processor 202 generally retrieves and executes programming instructions stored in the memory 210. The processor 202 is representative of a single central processing unit (CPU), multiple CPUs, a single CPU having multiple processing cores, graphics processing units (GPUs) having multiple execution paths, and the like.” Please see MPEP 2106.05(f) for more information on Mere Instructions to Apply An Exception. In this case, the claims are recited with such generality that they are no more than the equivalent of “apply it” or mere instructions to perform the abstract idea on a computer. Furthermore, the additional elements (d), and (e) are merely an example of “apply it” in that they recite the use of machinery/devices in their ordinary capacity to perform economic tasks related to the abstract idea. In this case, (d) describes a plurality of sensors in a physical environment, and (e) limits the data to be sensor data. However, this is reciting the use of sensors with such breadth that they are merely devices used in their ordinary capacity to perform economic tasks such as using sensors to collect sensor data.
In addition, the abstract idea is generally being linked to the technological environment of machine learning in a manner that does not meaningfully limit the use of the abstract idea. The claims simply limit the type of model being used to process the data to be a “machine learning,” along with the intended result of the output, which does not provide enough detail to be considered a technical improvement to the field of “machine learning.” Even though the claims recite the characteristics of the vector and the training of the model, these features are still inherent to the field of machine learning. Therefore, even with the inclusion of those features, it is still no more than a general link to machine learning or merely “apply it” as it encapsulates mere instructions to perform the learning algorithms on a machine in its ordinary capacity. Please refer to MPEP 2106.05(h) for technological environment and field of use, and see MPEP 2106.05(a) for what is considered an improvement to a technological environment or field of use. Therefore, even when considered the additional elements individually or as an ordered combination, and considering the claims as a whole, nothing in the claims integrates the abstract idea into a practical application. The combination of the generic computing components, with use of sensors, and high-level machine learning algorithms to perform the abstract idea does not integrate the abstract idea into a practical application. Therefore, the claims are directed to an abstract idea.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
The additional elements listed above are repeated as follows:
(a)- a processor in claim 11 and 16
(b)- a memory having instructions stored thereon which, when executed on the processor, performs operations comprising in claim 11
(c)- A non-transitory computer-readable medium having instructions stored thereon which, when executed by a processor, performs operations comprising: in claim 16
(d)-a plurality of sensors in claims 1, 11, and 16
(e)-that the data is specifically sensor data in claims 1, 11, and 16
(f)- a computing device comprising a sensor layer and a prediction layer, in claims 1, 11, and 16
(g) -that the pre-processing analysis is specifically pre-processing machine learning (ML) analysis in claims 1, 11, and 16
(h) – that the model is specifically a machine learning (ML) model in claims 1, 11, and 16
These additional elements have not been found to include significantly more for the same reasons set forth in the Prong 2 rejection, specifically because additional elements (a), (b), (c) are generic computing components instructed to carry out the abstract idea. “A plurality of sensors” are devices used in their ordinary capacity to carry out the abstract idea, ie, sensors to capture data.
Furthermore, the technological environment and field of use of machine learning is being generally applied to the abstract idea as ML is recited with such generality that it is merely claimed as a black box with the intended outcome of performing the abstract idea. The claims also do not purport any improvements to the field of machine learning which is a requirement in MPEP 2106.05(a) for information regarding Improvements to the Functioning of a Computer or To Any Other Technology or Technical Field. Even when considering the claims as whole, nothing in the claims meaningfully limits the claims such they amount to significantly more than the abstract idea. Therefore, claims 1, 11, and 16 are patent ineligible for being directed to an abstract idea without significantly more.
Dependent claims 3-4, 6, 8-11, 13, 15, 18, and 20 are also given the full two part analysis both individually and in combination with the claims they depend on herein:
Claims 3, 13 and 18 further limit the abstract idea by reciting that the prediction model data comprising satisfaction data and historical customer data. This is more of the same abstract idea because merely defining the inputs (wherein the inputs are examples of personal behavior), still falls within the scope of “managing personal behavior, interactions or relationships” which is still an abstract idea under certain methods of organizing human activity. Other than restating the prediction ML model, there are no further additional elements, and merely reciting the inputs of the ML model does not contribute to any technical improvements, therefore the additional elements still fall under “apply it” or “generally linking.” Even when viewed as a combination the additional elements fail to provide a specific arrangement that would integrate the abstract idea into a practical application. Even when viewing the claims as a whole, nothing in the claims meaningfully limits the claim to be significantly more than the abstract idea. Therefore, claims 3, 13, and 18 are also patent ineligible for being directed to an abstract idea without significantly more.
Claims 4, 6, 15, and 20 adds the additional steps of requiring the sensor data to be image data captured during a transaction at a point of sale system. This is merely limiting the abstract idea to be performed at a particular time (transaction), which still falls within the scope of “managing personal behavior, interactions or relationships” which is still an abstract idea under certain methods of organizing human activity. Furthermore, the additional element of a “point of sale system” is merely adding another “apply it” level element because a POS system is merely a generic computer programmed to process transactions. Therefore, sourcing data from image data during POS transactions is merely applying the abstract idea to a generic computer. Even when viewed as a combination the additional elements fail to provide a specific arrangement that would integrate the abstract idea into a practical application. Even when viewing the claims as a whole, nothing in the claims meaningfully limits the claim to be significantly more than the abstract idea. Therefore, claims 6, 15 and 20 are also patent ineligible for being directed to an abstract idea without significantly more.
Claim 8 adds the additional step of performing the modifying step, but based on predetermined actions associated with user input. However, this still falls within the scope of “certain methods of organizing human activity” because the scope includes any user providing the predetermined actions to the metric stream, to manage personal behavior. There are no further additional elements to consider and even when viewed in combination with the amended limitation, previous additional elements fail to provide a specific arrangement that would integrate the abstract idea into a practical application. Even when viewing the claims as a whole, nothing in the claims meaningfully limits the claim to be significantly more than the abstract idea. Therefore, claim 8 is also patent ineligible for being directed to an abstract idea without significantly more.
Claim 9 further recites that the computing device comprises a plurality of internet (IOT) devices. It is still the same abstract idea being performed in claim 1, without further limiting the abstract idea itself. The claim merely adds the additional element of the computing device comprising IOT devices, which is still an example of “apply it” because IOT devices can refer to any set of sensors, and computing devices operable in connection to the internet/other devices. Without specifically reciting the particular arrangement of computing devices, the claims broadly recite any generic computing device, and are therefore still performing the abstract idea on a generic computer. Even when viewed as a combination the additional elements fail to provide a specific arrangement that would integrate the abstract idea into a practical application. Even when viewing the claims as a whole, nothing in the claims meaningfully limits the claim to be significantly more than the abstract idea. Therefore, claim 9 also patent ineligible for being directed to an abstract idea without significantly more.
Claim 10 further defines the physical movement to be facial or bodily movement of a user in a predetermined vicinity. However, this is more of the same abstract idea because both facial and bodily movement being used as inputs is merely managing personal behavior of individuals, which still falls under “certain methods of organizing human activity.” Merely restricting these inputs to be within a predetermined vicinity does not count as an additional element, and is therefore still part of the abstract idea. There are no further additional elements to consider and even when viewed in combination with the amended limitation, previous additional elements fail to provide a specific arrangement that would integrate the abstract idea into a practical application. Even when viewing the claims as a whole, nothing in the claims meaningfully limits the claim to be significantly more than the abstract idea. Therefore, claim 10 is also patent ineligible for being directed to an abstract idea without significantly more.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-4, 6, 8-11, 13, 15-16, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang; Le (US 20210027485 A1) hereinafter Zhang, in view of Ciprian Petru et al. (US 20220122429 A1) hereinafter Ciprian Petru.
Regarding Claims 1, 11, 16:
Claim 1 – A method comprising: (Zhang [0012] In one general aspect, a method performed by one or more computers includes:..)
Claim 11 – A system, comprising: a processor; and a memory having instructions stored thereon which, when executed on the processor, performs operations comprising: (Zhang[0165] Embodiments of the invention and all of the functional operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the invention can be implemented as one or more computer program products, e.g., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus...The term “data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.)
Claim 16 – A non-transitory computer-readable medium having instructions stored thereon which, when executed by a processor, performs operations comprising: (Zhang[0165] The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them.)
Claims 1, 11, and 16:
- receiving, at a computing device comprising a sensor layer and a prediction layer, sensor data from a plurality of sensors arranged in a physical environment, (Zhang [0006] In general, the techniques herein enable a computer system to use a camera or other sensor to monitor an area, detect conditions in the monitored area that satisfy criteria, and notify one or more devices or users of the detected conditions. For example, one or more machine learning models can be used to detect an object or region but also to detect the state or condition of the object or region. In addition, the system can evaluate the detected conditions and determine which, if any, satisfy criteria for needing intervention or attention by a user. [0008] With data from cameras and/or other sensors, the system can identify issues in a space that need to be corrected, for example, to maintain consistency of presentation and usability of the space. This can be done using machine learning models to analyze image data and other sensor data to detect and localize events and conditions that may require correction or attention by a worker. This can include localizing specific portions of room or other monitored space where a condition needing attention exists. [0077] In stage (A), sensors capture information about the current status of an environment. For example, the cameras 110a and 110b capture image data of a location. The cameras 110a and 110b are arranged to capture different views of a public area, such as a restaurant. The camera 110a captures an image 111 of a dining area of a restaurant, and the camera 110b captures an image 112 of a display case showing food available at the restaurant. The cameras 110a, 110b can be fixed in position to allow repeated image capture and video for a consistent field of view of their respective areas of the restaurant. [0079] In stage (B), the computing system 120 processes the sensor data and generates input for one or more machine learning models. [0080] In stage (C), one or more machine learning models 123 process the input data representing the sensed parameters of the environment of the restaurant. For example, the models 123 can be neural network models that have been trained to perform various classification tasks.) Zhang teaches the computing device in [0006], the sensor layer in stage B in [0079], the prediction layer in stage (c) in [0080], and the plurality of sensors arranged in the environment in [0077].
- the sensor data comprising position and sound data associated with physical movement and emitted sound within a predetermined vicinity of the physical environment; (Zhang [0136] The monitoring system is often used in locations and situations that involve frequent movement and changes, such as during business hours while customers coming and going and making varied and often unpredictable movements. [0156] In step 612, the one or more computers generates and provides data indicating a set of detected conditions. In some implementations, this can include a list of detected objects, with position and status information. [0078] The system 100 can include other sensors used to monitor an environment. For example, the system can include a microphone 115 configured to detect audio and send audio data 116 to the computer system 120. One or more microphones 115 can be located to detect, for example, ambient sound in an area, conversations of employees (e.g., clerks taking orders at the register), or other audio in the restaurant.)
- transferring the sensor data to the sensor layer wherein the sensor layer is configured to execute pre-processing machine learning (ML) analysis on the sensor data(Zhang [0079] In stage (B), the computing system 120 processes the sensor data and generates input for one or more machine learning models. For example, the computing system 120 receives the image data 114a, 144b and the audio data 116 and can use a data pre-processor 121 to extract feature values to be provided as input. The data preprocessor 121 may perform a variety of other tasks to manipulate the sensor data and prepare input to the neural networks, according to a set of predetermined settings 122. These settings 122 can be customized for the particular restaurant and even for individual sensors (e.g., to use different settings for different cameras 110a, 110b). To facilitate data processing, each set of sensor data is associated with an accompanying set of metadata that indicates, for example, a timestamp indicating a time of capture, a sensor identifier (e.g., indicating which camera, microphone, etc. generated the data), a location identifier (e.g., indicating the particular restaurant, and or the portion of the restaurant where the sensor is located), and so on.)
- responsive to executing the pre-processing ML analysis on the sensor data, generating a ML model, (Zhang [0154] Processing the image data can be performed using one or more optional sub-steps 606, 608, 610, 612. For example, a set of input data for the one or more machine learning models can be generated. [0126] In some implementations, each of the neural network models can be generated by starting with a general object detection model, which may be trained to predict many more objects than are relevant to the intended use of the model after training is completed... In other words, by training model initially to detect a wide variety of objects, the network can better recognize those object or patterns as representing features (e.g., background) different from the limited set of features that are later trained to be most relevant.)
- wherein the ML model comprises a sound characteristic, (Zhang [0110] As noted above, the system 100 can use audio data 116 as well as image data 114a, 114b to detect events and conditions at monitored locations... using speech recognition models to obtain transcripts of speech or using keyword spotting models to determine the occurrence of specific keywords or phrases based on the acoustic properties. When analysis of the audio data 116 indicates that one or more of these keywords was spoken or other recognized sound type has occurred, the computer system 120 can detect the condition of a spill and can generate a task for the spill to be cleaned up.) The models based on sound data satisfy the “model comprises a sound characteristic” because a characteristic can be broadly interpreted to encapsulate any processing of sound data within the model. Therefore, a recognized sound type satisfies the limitation.
- an object characteristic, and a user gesture associated with the sensor data;(Zhang [0090] The models 123 may further extend typical region proposal and object detection networks by incorporating the prediction of object status into the overall model 123. For example, rather than simply detecting the presence of a chair in an image and providing a bounding box for the location of the chair, the models 123 may additionally classify the chair according to an occupied status (e.g., whether the chair is currently occupied or unoccupied), a cleanliness status (e.g., whether the chair is clean and free of litter or not), an appropriate positioning status (e.g., whether the chair is appropriately placed and oriented or not), and/or according to other status parameters. This determination of status can be made for any and/or all of the types of objects that the models 123 are trained to detect. For example, for a person, the models 123 may be trained to classify the activity of the person, e.g., to provide outputs indicating likelihoods whether the person is eating, waiting, ordering, passing through, etc. In addition, the models 123 can be configured to distinguish workers from customers, for example, and indicate whether a worker is present and if so, where the worker is located in the monitored area. This can be helpful to track the presence and movement throughout a location (e.g., over various different views of a store or restaurant) over time.) The object status is an example of object characteristic, and the activity of the person (eating, waiting, ordering, passing through) is an example of user gestures.
- transferring the ML model to the prediction layer, wherein the prediction layer is configured to train a prediction ML model;(Zhang [0126] In some implementations, each of the neural network models can be generated by starting with a general object detection model, which may be trained to predict many more objects than are relevant to the intended use of the model after training is completed. From this general model, modifications can be made for example, to replace the output layer with a smaller output layer representing the object classes and status classifications that are relevant. In some implementations, beginning with a general object detection model or including pre-training that is not focused on or even does not include the classes to be predicted by the final model may provide the model with a good sense of background objects that are not detected and can increase the overall robustness of predictions. In other words, by training model initially to detect a wide variety of objects, the network can better recognize those object or patterns as representing features (e.g., background) different from the limited set of features that are later trained to be most relevant. [0127] Each neural network model may be trained using labeled training examples showing the types of views that correspond to the model. [0108] The computer system 120 can periodically provide data 160 to the computer system 130 which can be used to further train the neural network models 123 as well as models used for other locations. The data 160 can include image data 114a, 114b and associated metadata, outputs of the neural network models, and actions taken by users. In some implementations, actions that users take after seeing output indicating detected objects and conditions can provide positive or negative feedback about predictions of the models 123. Some feedback may be explicit, such as indicating that a status or an object classification is correct or incorrect.)
- detecting, using the prediction ML model, abnormalities in a fulfillment workflow metric stream; and (Zhang [0094] One or more of the models 123 may be configured to detect an abnormal condition, e.g., a deviation or departure from the typical range or pattern for the monitored area, even if the model 123 has not been trained to detect what the abnormal condition is. [0100] The computer system 120 can also use other processing of the machine learning outputs to determine when the state of the monitored area needs to be reviewed or corrected. For example, the models 123 may indicate objects detected, and may indicate that 10 tables are present. The computer system 120 can have data indicating the desired or typical number of tables for the monitored area, e.g., from a reference value specified by a user or from a historical reference (e.g., a number of tables present over the last day, week, etc.). The computer system 120 compares the number of detected tables, e.g., 10, to a number of tables indicated by the reference value, e.g., 12, and so can determine that there is a condition needing intervention (e.g., the arrangement of the tables needs to be corrected). In this manner the computer system 120 can use the outputs of machine learning models 123 to determine whether the state of the monitored area is inconsistent with a baseline state. Thus, the models 123, may not be required to distinguish abnormal conditions from normal or expected ones (although they may be trained to do so in some implementations), and may simply characterize or describe the properties or state of the monitored area. Further processing of the computer system 120 to determine when and whether the set of properties indicated by the models 123 rises to the level of a condition for which a user needs to be informed and a task or other corrective action performed.) The broadest reasonable interpretation of a fulfillment workflow metric stream is any occurrence of order fulfillment data such as the data generated by the sensors in an establishment.
- responsive to detecting the abnormalities in the fulfillment workflow metric stream, modifying at least one metric within the fulfillment workflow metric stream.(Zhang [0149] The system can evaluate outcome measures associated with images can help the system determine which inconsistencies or differences from baseline characteristics need attention from a worker. The system can also use the outcome metrics to determine the urgency or priority with which conditions should be addressed. For example, some changes from the typical or usual state may be benign, such as adding new furniture. This change may be visually quite different from the typical prior images and would appear to be an inconsistency from the area's desirable baseline state. However, the outcome measures associated with this state may show that metrics such as revenue, customer satisfaction, and so on are stable or even improved with this change, allowing the system to determine that the change does not need correction, or at least does not need urgent attention. On the other hand, other changes detected may occur in small regions of the monitored area but may affect concurrent or subsequent outcome metrics negatively, which the system can interpret as a need for correction or intervention by a user, as well as representing a need to be addressed with higher priority. [0146] As a result, without knowing the specific percentages of product stocked or any other human labelling of the images, the system can use machine learning to identify the state that leads to higher sales of a specific product (e.g., partial stocking of the product in this example). The system can then indicate this state to a user, for example, providing images of the state determined to promote the highest sales rate and recommending that the state of the display case be made to appear in that state.) The BRI of modifying at least one metric within the fulfillment workflow metric stream is any modification of any metric in regards to managing orders. Therefore, in this case both Zhang [0149] which shows that metrics are improved with certain changes, and Zhang [0146] which shows a recommendation that would promote a high sales rate, which also falls within the scope of the limitation.
However, Zhang fails to teach:
- that the ML model is a ML model vector (Zhang teaches generating ML model wherein the ML model comprises a sound characteristic, an object characteristic, and a user gesture associated with the sensor data; transferring the ML model to the prediction layer. But does not specify the use of an ML model vector as the specific format of the model. A secondary reference below has been shown to remedy this deficiency.)
Alternatively, Ciprian Petru discloses a method and apparatus of using computer vision to perform anomaly detection on self-checkout retail environments. Ciprian Petru teaches:
-generating a ML model vector, (Ciprian Petru [0062] Meta-features are obtained based on a feature variation over a time interval by applying a statistical function to the feature, ...Having Y features, a number of N×Y statistics measures will be computed and encoded into the meta-feature vector. Therefore when a particular activity is occurring or in progress, the corresponding subset of features will be detected as active by the sensor network...Automatic selection can be implemented using neural networks or other machine network techniques. [0088] Based on the meta-feature vector, the decision unit then determines whether or not an alert should be issued. The decision unit compares the determined meta-feature vector with a predefined set of vectors or classification model and detects an anomaly based on the comparison. Initially a classifier classifies the input data into defined output categories based on a mathematical model obtained through a supervised learning process. As part of the learning process the mathematical model is provided with pairs of inputs to obtain corresponding output data. The model will therefore represent a collection composed of a set of meta-features, a set of use cases and a correspondence between the meta-features and the set of use cases. Based on feedback as outlined above, the model is adjusted to maximise the classification accuracy by a minimisation of an error function of the model.) The meta-feature vector is a representation of a supervised learned model, therefore, it satisfies the limitation of “generating a ML model vector.”
-wherein the ML model vector comprises a sound characteristic, an object characteristic, and a user gesture associated with the sensor data; (Ciprian Petru [0062] Meta-features are obtained based on a feature variation over a time interval by applying a statistical function to the feature,... In our application, features can be extracted from the video (e.g. skin detection, motion detection, patterns of motion detection, foreground/background detection, body part detection, etc.), others can be generated by sensors (e.g. detection of scan related sounds, till information, etc.). Low level features are represented by features extracted directly from sensors or features that are obtained by simple computer vision algorithms (usually these simple features are obtained at pixel level).) Since Ciprian Petru teaches meta-feature vectors which comprise features such as motion detection (which satisfies user gesture), body part detection(which satisfies object characteristic), and scan related sounds, the limitation is satisfied.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang by specifying that the features of the ML model are stored as a ML model vector (meta-feature vector) as taught by Ciprian Petru. By performing a simple substitution of Zhang’s ML model and replacing it with Ciprian Petru’s ML model vector, one would arrive at the predictable outcome of generating an ML model vector, wherein the ML model comprises a sound characteristic, an object characteristic, and a user gesture associated with the sensor data; transferring the ML model to the prediction layer. One would have been motivated to perform this combination by the benefit of a vector over its scalar counterparts, in that the vector encapsulates a larger amount of relevant information including the activity interval (the time in which the particular activity took place, into a compressed size for processing. (Ciprian Petru [0085] For each ACTIVE INTERVAL a meta-feature vector is defined as shown in FIG. 5. The meta-feature vector encapsulates the relevant information about the activity during the ACTIVE INTERVAL into a fixed size vector. This vector is then used to detect or classify the activity. [0087] As shown in FIG. 5, the vector is formed from the mean of each of the features in the interval, the variance of each of the features in the interval and the skew of each of the feature. Additional features may be incorporated such as the kurtosis.)
Regarding Claims 3, 13, and 18
The combination of Zhang and Ciprian Petru teaches the method of claim 1, the system of claim 11, and the non-transitory computer-readable medium of claim 16:
Furthermore, Zhang teaches:
-wherein the first prediction model further comprises satisfaction data and historical customer data. (Zhang [0142] The one or more machine learning models can be trained to promote conditions that are associated with business performance or other measures of desirable outcomes for a location. For example, models can be learn which conditions or properties of a monitored area are correlated to increased revenue. Models can be trained to detect conditions that result in increased or decreased revenue. As an example, training data can include captured images or other collected data, with the images associated with properties or outcomes that are not visible from the images, such as customer satisfaction, average purchase amount per customer, total revenue, rate or volume of purchase of a product, frequency of repeat customers (e.g., customers returning at later times), and so on. This can tie the state of a store or other monitored area to the measures of desired outcomes. To do this, data sets for these outcomes can be collected, for example, data indicating customer satisfaction survey results, customer checkout totals over time (e.g., throughout the day), timing of sales of different products over time, and so on. By matching the timestamps of monitoring data with the timestamps for outcomes, the system can obtain examples that can show the impact of conditions of the monitored area on the outcomes. For example, each captured image of the area can be associated with the revenue of the restaurant over the next hour. Outcome metrics for different time periods can be used, and even a time series of outcome metrics (e.g., revenue for each of a series of eight consecutive 15-minute periods following capture of a monitoring image).) Zhang’s prediction model includes customer satisfaction data, and outcome metrics (which satisfies historical customer data).
Regarding Claim 4:
The combination of Zhang and Ciprian Petru teaches the method of claim 3:
Furthermore, Zhang teaches:
- wherein the satisfaction data comprise point of sale (POS) data captured from a POS system during a historical fulfillment workflow metric stream. (Zhang [0113] As another example, data from a cash register or other terminal may indicate the times that transactions end, and these times may be used to divide the audio data 116 into segments representing interactions with different customers. [0114] In general, the results of the audio analysis can be associated with specific employees based on, for example, records of who was logged in to a terminal or register at the time the dialogues occurred, records of who was working during different times or shifts, speaker recognition processing to determine the identity of the speaker (e.g., to match to one of various voice profiles for different workers), and so on. [0142] As an example, training data can include captured images or other collected data, with the images associated with properties or outcomes that are not visible from the images, such as customer satisfaction, average purchase amount per customer, total revenue, rate or volume of purchase of a product, frequency of repeat customers (e.g., customers returning at later times), and so on. This can tie the state of a store or other monitored area to the measures of desired outcomes. To do this, data sets for these outcomes can be collected, for example, data indicating customer satisfaction survey results, customer checkout totals over time (e.g., throughout the day), timing of sales of different products over time, and so on. By matching the timestamps of monitoring data with the timestamps for outcomes, the system can obtain examples that can show the impact of conditions of the monitored area on the outcomes.) Point of sale data is broadly interpreted to encompass any form of data collected through a system that accepts payments, like cash register or a terminal. Zhang does not use the wording “point of sale,” however, “POS” is a broader term which encapsulates cash registers and terminals. In these citations, the satisfaction data includes data from terminals such as the time an order was placed, or which worker is logged in. In [0142] Zhang includes more examples of training data that is not visible from images alone, such as customer satisfaction, purchase amount per customer, and checkout totals over time.
Regarding Claims 6, 15 and 20:
The combination of Zhang and Ciprian Petru teaches the method of claim 4, the system of claim 11, and the non-transitory computer-readable medium of claim 16:
Furthermore, Zhang teaches:
- wherein the sensor data comprises image data captured during a transaction at the Point of Sale(POS) system, and (Zhang [0079] In stage (B), the computing system 120 processes the sensor data and generates input for one or more machine learning models. For example, the computing system 120 receives the image data 114a, 144b and the audio data 116 and can use a data pre-processor 121 to extract feature values to be provided as input. The data preprocessor 121 may perform a variety of other tasks to manipulate the sensor data and prepare input to the neural networks, according to a set of predetermined settings 122. These settings 122 can be customized for the particular restaurant and even for individual sensors (e.g., to use different settings for different cameras 110a, 110b). To facilitate data processing, each set of sensor data is associated with an accompanying set of metadata that indicates, for example, a timestamp indicating a time of capture, a sensor identifier (e.g., indicating which camera, microphone, etc. generated the data), a location identifier (e.g., indicating the particular restaurant, and or the portion of the restaurant where the sensor is located), and so on. [0113] As another example, data from a cash register or other terminal may indicate the times that transactions end, and these times may be used to divide the audio data 116 into segments representing interactions with different customers.)
Regarding Claim 8:
The combination of Zhang and Ciprian Petru teaches The method of claim 1, further comprising:
Furthermore, Zhang teaches:
-modifying the fulfillment workflow metric stream based on predetermined actions associated with user input.(Zhang [0024] In some implementations, using the audio data to determine an event or condition at the monitored area includes: determining whether a sound level at the monitored area exceeds a threshold; or determining whether one or more workers spoke a predetermined word or phrase in a conversation with a visitor to the monitored area. [0116] The computer system 120 uses a set of predetermined criteria to determine that this represents a condition that warrants action by the computer system 120. For example, the computer system 120 can use mapping data 191 that specifies conditions (e.g., detected objects and status classifications, alone or in combination) and corresponding actions for the computer system 120 to perform. [0099] This can be done by comparing scores to corresponding threshold or corresponding baseline values typical for the monitored area. For example, if a cleanliness score is below a predetermined level or if the level of litter present is above a predetermined level, and potentially has been at that level for at least a threshold minimum amount of time, the computer system 120 detects an issue to be addressed.) Performing corresponding actions for the computer system to perform is an example of “modifying the fulfillment workflow metric stream.”
Regarding Claim 9:
The combination of Zhang and Ciprian Petru teaches The method of claim 1, further comprising:
Furthermore, Zhang teaches:
-wherein the computing device comprises a plurality of internet of things (IOT) devices.(Zhang [0075] The system 100 includes cameras 110a, 110b, a local computer system 120, a remote computer system 130, and various client devices 140a-140c. The various devices communicate over a network 150, which can include public and/or private networks and can include the Internet. [0140] In step 520, one or more trained machine learning models are deployed, e.g., provided, installed, or made active. In some implementations, the trained models are delivered to one or more devices (e.g., a mobile device, server system, on-site computer system, etc.) over a communication network.) The scope of internet of things (IOT) devices, are a set of physical devices embedded with sensors, software, and network connectivity, therefore, the limitation has been satisfied.
Regarding Claim 10:
The combination of Zhang and Ciprian Petru teaches The method of claim 1, further comprising:
Furthermore, Zhang teaches:
-wherein the physical movement comprises at least one of: (i) facial movement of a user identified in the predetermined vicinity of the physical environment or (ii) body movement of the user identified in the predetermine vicinity of the physical environment.(Zhang [0090] For example, for a person, the models 123 may be trained to classify the activity of the person, e.g., to provide outputs indicating likelihoods whether the person is eating, waiting, ordering, passing through, etc. In addition, the models 123 can be configured to distinguish workers from customers, for example, and indicate whether a worker is present and if so, where the worker is located in the monitored area. This can be helpful to track the presence and movement throughout a location (e.g., over various different views of a store or restaurant) over time. As another example, the models 123 may classify a passage way as open, in use, or blocked.) The limitation allows for either (i) or (ii) condition to be met to satisfy the limitation. Though Zhang does not teach facial movement, Zhang does teach body movement of a user in the predetermined vicinity of the physical environmental (where the work is located in the monitored area...tracking presence and movement.
Response to Arguments
Applicant's arguments filed 11/24/2025 have been fully considered but they are not persuasive.
Regarding claim objections, the amendments have addressed the examiner’s concerns and the
object has been withdrawn by the examiner.
Regarding applicant’s remarks over claim rejections under 35 U.S.C. 101, the applicant asserts that amended claim 1 specifies the steps for how the sensor data is processed into a vector by one ML, then the vector is passed to another ML model to predict anomalies in the environment from the sensor data, which the applicant believes specifies how the processing is performed. However, the examiner respectfully disagrees, even when considering the amended limitations, the claims still fail to provide steps at a sufficient degree of specificity to be considered a technical improvement to machine learning systems. Even as amended, the claims remain to be effectively black box models, with the intended only the outputs and inputs recited. The process to arrive at the outputs is still not recited. Splitting the model up into different layers such the sensor layer and prediction model still does not change the fact that the claims recite nothing more than the intended outcome of the models, whilst generally linking them to machine learning. Reciting that the ML model vector comprises sound characteristics, object characteristics, and user gestures is still an example of naming the inputs of the model without defining how it arrives at predicting abnormalities in a fulfillment workflow stream.
Furthermore, on page 10, the applicant argues that generating a ML model vector with several different characteristics to train the prediction ML model for detecting abnormalities, and making changes to the fulfillment workflow integrate the abstract idea into a practical application. However, the examiner respectfully disagrees. In addition to the examiner’s arguments above which already address the ML model vector, the examiner still does not find it clear how the claims arrive at training the prediction ML model for detecting abnormalities from the ML model vector. Given this case, the claims are broadly encapsulating any method in which a vector with such data, uses ML to determine abnormalities. Therefore, the steps cannot possibly provide a technical improvement to a particular field when they are recited to include any use of machine learning without meaningfully limiting its use on the abstract idea. Furthermore, as started previously, the step of making changes to the fulfillment workflow is not tied to a specific technological implementation, and therefore encapsulates any changes made (even by a user human user) to the workflow. This is still an example of “certain methods of organizing human activity” and is therefore still part of the abstract idea. Therefore, the claims are still directed to an abstract idea without significantly more and the rejection stands.
Regarding applicant’s remarks over prior art claim rejections, the applicant’s arguments have been fully considered but arguments based on Zhang alone are moot in view of the updated rejection which is now based on the combination of Zhang and Ciprian Petru’s teachings. The applicant argues that Zhang fails to teach “responsive to executing the pre-processing ML analysis...generating a ML model vector...” however, the rejection does not depend on Zhang alone to teach the ML model vector. Ciprian Petru, which teaches the meta-feature vector, remedies the deficiencies above, and the combination yields the predictable result of the claim limitations. Therefore, the applicant’s argument that Zhang does not mention creating pre-processing the sensor data into vectors, is not persuasive, because Zhang does teach pre-processing the sensor data but does not specify that it is processed in vector format. Ciprian Petru remedies this deficiency by teaching processing the sensor data in meta-feature vectors. Therefore, the combination teaches each and every claim limitation of each of the pending claims, and claims are now rejected under 35 U.S.C. 103.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
- Balakrishnan et al. (US 10360526 B2) discloses analytics for determining customer satisfaction by capturing images of the individuals and their orders continuously by using analytics such as facial expressions, and interactions between customers and servers.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICO LAUREN PADUA whose telephone number is (703)756-1978. The examiner can normally be reached Mon to Fri: 8:30 to 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jessica Lemieux can be reached at (571) 270-3445. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NICO L PADUA/ Junior Patent Examiner, Art Unit 3626
/SANGEETA BAHL/ Primary Examiner, Art Unit 3626