Last updated: April 19, 2026
Application No. 17/334,162
IMAGE-BASED ANOMALY DETECTION BASED ON A MACHINE LEARNING ANALYSIS OF AN OBJECT

Final Rejection §101§103
Filed
May 28, 2021
Examiner
KWON, JUN
Art Unit
2127
Tech Center
2100 — Computer Architecture & Software
Assignee
Zebra Technologies Corporation
OA Round
4 (Final)
This examiner grants 38% of cases after interview

— +46.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 68 resolved cases, 2023–2026
Examiner Intelligence

KWON, JUN View full profile →
Grants only 38% of cases
Career Allow Rate
26 granted / 68 resolved
-16.8% vs TC avg
Strong +46% interview lift
Without
With
+46.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 3m
Avg Prosecution
34 currently pending
Career history
102
Total Applications
across all art units
Statute-Specific Performance

§101
31.8%
-8.2% vs TC avg
§103
41.4%
+1.4% vs TC avg
§102
7.6%
-32.4% vs TC avg
§112
18.1%
-21.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 68 resolved cases
Office Action

§101 §103
Detailed Action
	This Office Action is in response to the remarks entered on 12/01/2025. Claims 4, 16 and 18 have been cancelled. Claims 1-3, 5-15, 17 and 19-20 are currently pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
Amended claims were received on 12/01/2025. 35 U.S.C. 101 rejection has been withdrawn. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5-7, 15, 17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li (Li & Chang, 2019, “Video anomaly detection and localization via multivariate gaussian fully convolution adversarial autoencoder”, hereinafter ‘Li’) in view of Khazai (Khazai et al, 2011, “Anomaly Detection in Hyperspectral Images Based on an Adaptive Support Vector Method”, hereinafter ‘Khazai’) in view of Kamen (US 20210248736 A1, hereinafter ‘Kamen’) and further in view of Akcay (Akcay et al, 2018, “GANomaly: Semi-Supervised Anomaly Detection via Adversarial Training”, hereinafter ‘Akcay’).

Regarding claim 1, Li teaches: 
A method associated with detecting an anomaly associated with an object, comprising ([Li, page 95, right col, para 3. Our method, line 1-10] Li discloses anomaly detection method for video frames (i.e., image data).): 
receiving, by a device, an input image that depicts the object ([Li, page 99, left col, para 4.3.1. UCSD dataset, line 1-19] The UCSD dataset which is used to train and test the machine learning model includes Ped1 and Ped2 dataset. The training set, which is interpreted as the reference image, includes pedestrians walking across the walkway. The testing set, which is interpreted as the input image, contains anomalies such as cars, bikers, wheelchairs, and skaters along with the pedestrians (object).); 
processing, by the device and using a feature extraction model, the input image to indicate one or more features of the object in a first feature output ([Li, page 98, left col, line 13-19] The encoder maps the                         
                            
                                    x
                                
                                    a
                                
                                    i
                                
                     to a latent space vector E(                        
                            
                                    x
                                
                                    a
                                
                                    i
                                
                    ). The mapping process is the feature extraction process and the encoder is the feature extraction model, as the mapping process performed by an encoder extracts relevant features from the input data by reducing the dimension), 
wherein the feature extraction model is trained based on reference images associated with a type of the object ([Li, page 95, right col, para 3. Our method, line 1-10] The two-stream Multivariate Gaussian Fully Convolution Adversaria Autoencoder (MGFC-AAE) is trained based on the normal samples of gradient, which is the reference images, and optical flow patches), 
wherein the reference images depict one or more non-anomalous objects that are of a same type as the type of the object, and wherein the reference images do not depict anomalous objects ([Li, page 99, left col, para 4.3.1. UCSD dataset, line 1-19] The UCSD dataset which is used to train and test the machine learning model includes Ped1 and Ped2 dataset. The training set (i.e., reference images), includes pedestrians walking across the walkway. The testing set contains anomalies such as cars, bikers, wheelchairs, and skaters along with the pedestrians, thus the objects in the testing set and the training set are the same type); 
determining, by the device and based on the one or more features, using a classification model, that an anomaly status of the object indicates that the object includes an anomaly ([Li, page 93, left col, line 32-39] The appearance stream which is further disclosed in [Li, page 95, Fig. 3] is the classification model. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] S_appearance, which are the classification score, are used to determine the anomaly status. The pre-defined threshold is used to determine the anomaly status), 
wherein the classification model is configured to determine the anomaly status based on a classification score associated with the first feature output and a classification threshold of the classification model, [Li, page 93, left col, line 32-39] The appearance stream which is further disclosed in [Li, page 95, Fig. 3] is the classification model, and utilizes gradient map which is the first feature for anomaly detection. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] S_appearance, which are the classification score, are used to determine the anomaly status. The pre-defined threshold is used to determine the anomaly status)
determining, by the device, a location of the anomaly associated with the anomaly status based on a second feature output of the feature extraction model ([Li, page 93, left col, line 32-39] The motion stream which is further disclosed in [Li, page 95, Fig. 3] is the determining process for the location of the anomaly and utilizes the optical flow which is the second feature for anomaly detection. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] S_motion, which is the score calculated from the motion stream, is used to determine the anomaly status.), 
wherein the location of the anomaly is determined using an anomaly localization model that is trained based on the reference images ([Li, page 93, left col, line 32-39] The motion stream which is further disclosed in [Li, page 95, Fig. 3] is the anomaly localization model and utilizes the optical flow which is the second feature for anomaly detection. [Li, page 99, left col, para 4.3.1. UCSD dataset, line 1-19] The UCSD dataset which is used to train and test the machine learning model includes Ped1 and Ped2 dataset. The training set, which is interpreted as the reference image, includes pedestrians walking across the walkway.); 
generating, by the device and based on the anomaly status and the location, anomaly data that is associated with the anomaly ([Li, page 98, right col, line 18-30] The total anomaly score which is the anomaly data that is associated with the anomaly is determined based on the S_appearance (i.e., anomaly status) and S_motion (i.e., anomaly location).);
wherein the first feature output is from an output layer of a convolutional neural network encoder of the feature extraction model ([Li, page 95, Fig. 3] and [Li, page 98, right col, 3.4. Anomaly detection on testing samples, line 1-27] collectively discloses using latent space representations                         
                            
                                    z
                                
                                    a
                                
                            =
                            
                                    E
                                
                                    a
                                
                            (
                            
                                    y
                                
                                    a
                                
                            )
                        
                     and                         
                            
                                    z
                                
                                    m
                                
                            =
                            
                                    E
                                
                                    m
                                
                            (
                            
                                    y
                                
                                    m
                                
                            )
                        
                     to compute the motion anomaly score                         
                            
                                    S
                                
                                    a
                                    p
                                    p
                                    e
                                    a
                                    r
                                    a
                                    n
                                    c
                                    e
                                
                     and                         
                            
                                    S
                                
                                    m
                                    o
                                    t
                                    i
                                    o
                                    n
                                
                     . E denotes the encoders) 
Li does not specifically disclose: 
wherein the classification threshold is determined based on a similarity analysis involving the reference images; and 
providing, by the device and to an object management system, the anomaly data;
wherein the second feature output is from an intermediate layer of the convolutional neural network encoder.
Khazai teaches: 
wherein the classification threshold is determined based on a similarity analysis involving the reference images ([Khazai, page 648, left col, line 1-15] The proposed RBFN measure, equation 8                         
                            σ
                        
                    , is the classification threshold, and SVM (support vector machine) is a basis function for the measure. The classification threshold is determined based on the distance between x_i and x_j which is the similarity analysis involving the reference image x_i); 
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Li, and Khazai to use the method of determining classification threshold based on a similarity analysis of Khazai to implement the image anomaly detection method of Li. The suggestion and/or motivation to do so is to improve the performance of the system, as determining the classification threshold based on training images instead of predetermining the threshold increases the flexibility of the system, allowing the system to handle a wider range of types of anomalies.
Li in view of Khazai does not specifically disclose: 
providing, by the device and to an object management system, the anomaly data;
wherein the second feature output is from an intermediate layer of the convolutional neural network encoder.
Kamen teaches: 
providing, by the device and to an object management system, the anomaly data ([Kamen, 0083] The localization map output from the decoder disclosed in [Kamen, 0038] can be stored in a memory device, displayed using a computer, and/or transmitted to a remote computer system.).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Li, Khazai, and Kamen to use the method of providing, by the device and to an object management system, the anomaly data of Kamen to implement the image anomaly detection method of Li and Khazai. The suggestion and/or motivation to do so is to increase accessibility of the system by making it easier for users and third party devices to access the output data.
However, Li in view of Khazai and further in view of Kamen does not specifically disclose:
wherein the second feature output is from an intermediate layer of the convolutional neural network encoder.
Akcay teaches:
wherein the first feature output is from an output layer of a convolutional neural network encoder of the feature extraction model, and ([Akcay, page 5, Fig. 2.] and [Akcay, page 8, line 6-15] The feature that is used to update the generator is bottleneck features of the input data, which is output from the output layer of the encoder. [Akcay, page 7, para 3.2 Model Training, line 15-22] Unlike other features, the feature that is used to update the encoder of the neural network is obtained from the internal representation which is the intermediate layer of the convolutional neural network encoder)
wherein the second feature output is from an intermediate layer of the convolutional neural network encoder. ([Akcay, page 5, Fig. 2.] and [Akcay, page 8, line 6-15] The feature that is used to update the generator is bottleneck features of the input data, which is output from the output layer of the encoder. [Akcay, page 7, para 3.2 Model Training, line 15-22] Unlike other features, the feature that is used to update the encoder of the neural network is obtained from the internal representation which is the intermediate layer of the convolutional neural network encoder)
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Li, Khazai, Kamen, and Akcay to use the method of wherein the first feature output is from an output layer of a convolutional neural network encoder of the feature extraction model, and wherein the second feature output is from an intermediate layer of the convolutional neural network encoder of Akcay to implement the image anomaly detection method of Li, Khazai, and Kamen. The suggestion and/or motivation to do improve the performance of the system, as encoded data loses information during the encoding process and anomaly localization requires less encoded data than anomaly feature extraction to detect the location of the anomaly more accurately.

Regarding claim 2, Li teaches: 
wherein the classification model [Li, page 93, left col, line 32-39] The appearance stream which is further disclosed in [Li, page 95, Fig. 3] is the classification model. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] S_appearance, which are the classification score, are used to determine the anomaly status. The pre-defined threshold is used to determine the anomaly status.); and
indicate the anomaly status based on a comparison of the classification score and the classification threshold ([Li, page 93, left col, line 32-39] The appearance stream which is further disclosed in [Li, page 95, Fig. 3] is the classification model. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] S_appearance, which are the classification score, are used to determine the anomaly status. The pre-defined threshold is used to determine the anomaly status.), 
wherein the anomaly status is a binary classification that is determined based on the comparison and is indicative of the object having an anomalous feature or is indicative of the object not having an anomalous feature ([Li, page 93, left col, line 32-39] The appearance stream which is further disclosed in [Li, page 95, Fig. 3] is the classification model. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] S_appearance, which are the classification score, are used to determine the anomaly status. The pre-defined threshold is used to determine the anomaly status of whether the object has an anomalous feature or not.).
Li does not specifically disclose: 
wherein the classification model includes a support vector machine; wherein the support vector machine is a single class support vector machine that is specifically trained to analyze the type of the object;
Khazai teaches: 
wherein the classification model includes a support vector machine ([Khazai, page 648, left col, line 1-15] The proposed RBFN measure, equation 8                         
                            σ
                        
                    , is the classification threshold, and SVM (support vector machine) is a basis function for the measure. The classification threshold is determined based on the distance between x_i and x_j which is the similarity analysis involving the reference image x_i.);
wherein the support vector machine is a single class support vector machine that is specifically trained to analyze the type of the object ([Khazai, page 647, left col, para II. GK-SVDD, line 1-20] The method is implemented using a Support Vector Machines for one-class classification problem (i.e., single class support vector machine), that is trained using the N number of training objects. Acording to [Khazai, page 2011, para I. Introduction, line 5-6] the objects are Earth surface objects.). 
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Li, and Khazai to use the method of wherein the classification model includes a support vector machine of Khazai to implement the image anomaly detection method of Li. The suggestion and/or motivation to do so is to improve the performance of the system, as Support Vector Machine provides clear decision boundaries for imbalanced datasets such as anomalous dataset which can help in interpreting why a particular data point is classified as an anomaly thus makes it easier for the system to classify anomalous data and non-anomalous data.

Regarding claim 5, Li in view of Khazai teaches: 
The method of claim 1.
Li in view of Khazai does not specifically disclose: 
wherein the anomaly localization model comprises a convolutional neural network decoder that is configured to determine the location of the anomaly.
Kamen teaches: 
wherein the anomaly localization model comprises a convolutional neural network decoder that is configured to determine the location of the anomaly ([Kamen, 0038] The convolutional neural network decoder generates the localization map which indicates the location of the anomaly.).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Li, Khazai, and Kamen to use the method of wherein the anomaly localization model comprises a convolutional neural network decoder that is configured to determine the location of the anomaly of Kamen to implement the image anomaly detection method of Li and Khazai. The suggestion and/or motivation to do so is to improve the performance of the system by locating the location of anomalies more precisely, as decoder enables the system to detect the anomalous data at higher resolution by mapping the output of the encoder to original dimension.

Regarding claim 6, Li in view of Khazai in view of Kamen and further in view of Akcay teaches: 
wherein the second feature output is from an intermediate layer of a convolutional neural network encoder of the feature extraction model. ([Akcay, page 5, Fig. 2.] and [Akcay, page 8, line 6-15] The feature that is used to update the generator is bottleneck features of the input data, which is output from the output layer of the encoder. [Akcay, page 7, para 3.2 Model Training, line 15-22] Unlike other features, the feature that is used to update the encoder of the neural network is obtained from the internal representation which is the intermediate layer of the convolutional neural network encoder)

Regarding claim 7, Li teaches: 
wherein generating the anomaly data comprises: generating a location indicator that identifies the location of the anomaly ([Li, page 96, right col, line 1-9] The optical flow patches which are interpreted as the location indicator is generated by the Motion Stream disclosed in [Li, page 95, Fig.3.]); 
combining the location indicator with the input image ([Li, page 99, left col, para 4.3.1. UCSD dataset, line 20-24] and [Li, page 100, right col, line 4-10; Fig. 8.] The binary red masks are applied as the location indicator.).

Regarding claim 15, Li teaches: 
receive an input image that depicts an object ([Li, page 99, left col, para 4.3.1. UCSD dataset, line 1-19] The UCSD dataset which is used to train and test the machine learning model includes Ped1 and Ped2 dataset. The training set, which is interpreted as the reference image, includes pedestrians walking across the walkway. The testing set, which is interpreted as the input image, contains anomalies such as cars, bikers, wheelchairs, and skaters along with the pedestrians (object).); 
determine, using a convolutional neural network encoder and from the input image, a first feature output that is associated with one or more features of the object ([Li, page 93, left col, line 32-39] The appearance stream which includes a convolutional neural network encoder [Li, page 97, Fig. 7] and is further disclosed in [Li, page 95, Fig. 3] is the classification model, and utilizes gradient map which is the first feature for anomaly detection. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] S_appearance, which are the classification score, are used to determine the anomaly status. The pre-defined threshold is used to determine the anomaly status.), 
wherein the convolutional neural network encoder is trained based on reference images that depict reference objects that are a type of the object ([Li, page 93, left col, line 32-39] The appearance stream which includes a convolutional neural network encoder [Li, page 97, Fig. 7] and is further disclosed in [Li, page 95, Fig. 3] is the classification model, and utilizes gradient map which is the first feature for anomaly detection. [Li, page 99, left col, para 4.3.1. UCSD dataset, line 1-19] The UCSD dataset which is used to train and test the machine learning model includes Ped1 and Ped2 dataset. The training set, which is interpreted as the reference image, includes pedestrians walking across the walkway. The testing set contains anomalies such as cars, bikers, wheelchairs, and skaters along with the pedestrians, thus the testing set and the training set are the same type.); 
determine, [Li, page 93, left col, line 32-39] The motion stream which is further disclosed in [Li, page 95, Fig. 3] is the anomaly localization model and utilizes the optical flow which is the second feature for anomaly detection. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] Based on both the S_appearance and S_motion, which are the classification score and the location score, are used to determine the anomaly status. The pre-defined threshold is used to determine the anomaly status.), 
wherein the convolutional neural network [Li, page 93, left col, line 32-39] The motion stream which is further disclosed in [Li, page 95, Fig. 3] is the anomaly localization model and utilizes the optical flow which is the second feature for anomaly detection. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] Based on both the S_appearance and S_motion, which are the classification score and the location score, are used to determine the anomaly status. The pre-defined threshold is used to determine the anomaly status.); 
wherein the reference objects depicted in the reference images are non-anomalous objects, and wherein the reference images do not depict anomalous objects; ([Li, page 99, left col, para 4.3.1. UCSD dataset, line 1-19] The UCSD dataset which is used to train and test the machine learning model includes Ped1 and Ped2 dataset. The training set (i.e., reference images), includes pedestrians walking across the walkway. The testing set contains anomalies such as cars, bikers, wheelchairs, and skaters along with the pedestrians, thus the objects in the testing set and the training set are the same type)
wherein the first feature output is from an output layer of the convolutional neural network encoder ([Li, page 95, Fig. 3] and [Li, page 98, right col, 3.4. Anomaly detection on testing samples, line 1-27] collectively discloses using latent space representations                         
                            
                                    z
                                
                                    a
                                
                            =
                            
                                    E
                                
                                    a
                                
                            (
                            
                                    y
                                
                                    a
                                
                            )
                        
                     and                         
                            
                                    z
                                
                                    m
                                
                            =
                            
                                    E
                                
                                    m
                                
                            (
                            
                                    y
                                
                                    m
                                
                            )
                        
                     to compute the motion anomaly score                         
                            
                                    S
                                
                                    a
                                    p
                                    p
                                    e
                                    a
                                    r
                                    a
                                    n
                                    c
                                    e
                                
                     and                         
                            
                                    S
                                
                                    m
                                    o
                                    t
                                    i
                                    o
                                    n
                                
                     . E denotes the encoders)
Li does not specifically disclose: 
a tangible machine-readable medium storing a set of instructions, the set of instructions comprising: one or more instructions that, when executed by one or more processors of a device, cause the device to: 
determine, using a support vector machine, that an anomaly status of the object is indicative of the object including an anomaly, wherein the support vector machine is trained based on the reference images; determine, using a convolutional neural network decoder, a location of the anomaly wherein the convolutional neural network decoder is configured to determine the location of the anomaly wherein the convolutional neural network decoder is trained based on the reference images; and 
perform an action associated with the location of the anomaly;
wherein the second feature output is from an intermediate layer of the convolutional neural network encoder.
Khazai teaches: 
determine, using a support vector machine, that an anomaly status of the object is indicative of the object including an anomaly, wherein the support vector machine is trained based on the reference images ([Khazai, page 647, left col, para II. GK-SVDD, line 1-20] The method is implemented using a Support Vector Machines for one-class classification problem (i.e., single class support vector machine), that is trained using the N number of training objects. Acording to [Khazai, page 2011, para I. Introduction, line 5-6] the objects are Earth surface objects. The classification process is binary as Khazai determines whether the pixel contains anomalous data or not (i.e., 1 or 0).);
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Li, and Khazai to use the method of wherein the classification model includes a support vector machine of Khazai to implement the image anomaly detection method of Li. The suggestion and/or motivation to do so is to improve the performance of the system, as Support Vector Machine provides clear decision boundaries for imbalanced datasets such as anomalous dataset which can help in interpreting why a particular data point is classified as an anomaly thus makes it easier for the system to classify anomalous data and non-anomalous data.
Li in view of Khazai does not specifically disclose: 
a tangible machine-readable medium storing a set of instructions, the set of instructions comprising: one or more instructions that, when executed by one or more processors of a device, cause the device to: determine, using a convolutional neural network decoder, a location of the anomaly wherein the convolutional neural network decoder is configured to determine the location of the anomaly wherein the convolutional neural network decoder is trained based on the reference images; and perform an action associated with the location of the anomaly.
wherein the second feature output is from an intermediate layer of the convolutional neural network encoder.
Kamen teaches: 
A tangible machine-readable medium storing a set of instructions, the set of instructions comprising: one or more instructions that, when executed by one or more processors of a device, cause the device to ([Kamen, 0028] The method of Kamen is implemented using computing device that contains processors and memories.): 
determine, using a convolutional neural network decoder, a location of the anomaly ([Kamen, 0038] The convolutional neural network decoder generates the localization map which indicates the location of the anomaly.);
wherein the convolutional neural network decoder is configured to determine the location of the anomaly ([Kamen, 0038] The convolutional neural network decoder generates the localization map which indicates the location of the anomaly.);
wherein the convolutional neural network decoder is trained based on the reference images ([Kamen, 0081] The encoder and decoder are trained using input medical image 602 and complete image 608. The complete image corresponds to the reference images.); and 
perform an action associated with the location of the anomaly ([Kamen, 0083] The localization map output from the decoder disclosed in [Kamen, 0038] can be stored in a memory device, displayed using a computer, and/or transmitted to a remote computer system.).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Li, Khazai, and Kamen to use the method of wherein the anomaly localization model comprises a convolutional neural network decoder that is configured to determine the location of the anomaly of Kamen to implement the image anomaly detection method of Li and Khazai. The suggestion and/or motivation to do so is to improve the performance of the system by locating the location of anomalies more precisely, as decoder enables the system to detect the anomalous data at higher resolution by mapping the output of the encoder to original dimension.
Li in view of Khazai and further in view of Kamen does not specifically disclose: 
wherein the second feature output is from an intermediate layer of the convolutional neural network encoder.
Akcay teaches: 
wherein the first feature output is from an output layer of the convolutional neural network encoder, and wherein the second feature output is from an intermediate layer of the convolutional neural network encoder. ([Akcay, page 5, Fig. 2.] and [Akcay, page 8, line 6-15] The feature that is used to update the generator is bottleneck features of the input data, which is output from the output layer of the encoder. [Akcay, page 7, para 3.2 Model Training, line 15-22] Unlike other features, the feature that is used to update the encoder of the neural network is obtained from the internal representation which is the intermediate layer of the convolutional neural network encoder)
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Li, Khazai, Kamen, and Akcay to use the method of wherein the first feature output is from an output layer of a convolutional neural network encoder of the feature extraction model, and wherein the second feature output is from an intermediate layer of the convolutional neural network encoder of Akcay to implement the image anomaly detection method of Li, Khazai, and Kamen. The suggestion and/or motivation to do improve the performance of the system, as encoded data loses information during the encoding process and anomaly localization requires less encoded data than anomaly feature extraction to detect the location of the anomaly more accurately.

Regarding claim 17, Li in view of Khazai teaches: 
wherein the support vector machine is trained to determine a binary classification that indicates that the object includes an anomalous feature or that indicates that the object does not include an anomalous feature, wherein the support vector machine is trained to determine a classification threshold that is used to determine the binary classification based on a similarity analysis involving the reference images ([Khazai, page 647, left col, para II. GK-SVDD, line 1-20] The method is implemented using a Support Vector Machines for one-class classification problem (i.e., single class support vector machine), that is trained using the N number of training objects. Acording to [Khazai, page 2011, para I. Introduction, line 5-6] the objects are Earth surface objects. The classification process is binary as Khazai determines whether the pixel contains anomalous data or not (i.e., 1 or 0).).

Regarding claim 19, Li in view of Khazai and further in view of Kamen teaches: 
wherein the convolutional neural network encoder and the convolutional neural network decoder are associated with a same convolutional neural network autoencoder that is trained based on the reference images ([Kamen, 0081] The encoder and decoder are trained using input medical image 602 and complete image 608. The complete image corresponds to the reference images.).

Regarding claim 20, Li teaches: 
wherein the one or more instructions, that cause the device to perform the action, cause the device to: generate a location indicator that identifies the location of the anomaly ([Li, page 96, right col, line 1-9] The optical flow patches which are interpreted as the location indicator is generated by the Motion Stream disclosed in [Li, page 95, Fig.3.]); 
combine the location indicator with the input image to form an anomaly indicator ([Li, page 99, left col, para 4.3.1. UCSD dataset, line 20-24] and [Li, page 100, right col, line 4-10; Fig. 8.] The binary red masks are applied as the location indicator.); 
Li in view of Khazai does not specifically disclose: 
provide the anomaly indicator to a user device.
Kamen teaches: 
provide the anomaly indicator to a user device ([Kamen, 0083] The localization map output from the decoder disclosed in [Kamen, 0038] can be stored in a memory device, displayed using a computer, and/or transmitted to a remote computer system.).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Khazai in view of Kamen in view of Akcay and further in view of Côté (US 20210028973 A1, hereinafter ‘Côté’).

Regarding claim 3, Li teaches: 
wherein the classification model comprises: 
a first [Li, page 93, left col, line 32-39] The appearance stream which is further disclosed in [Li, page 95, Fig. 3] is the classification model. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] S_appearance, which are the classification score, are used to determine the anomaly status. The pre-defined threshold is used to determine the anomaly status.), and 
a second [Li, page 93, left col, line 32-39] The motion stream which is further disclosed in [Li, page 95, Fig. 3] is the anomaly localization model which is the second machine learning model, and utilizes the optical flow which is the second feature for anomaly detection. [Li, page 99, left col, para 4.3.1. UCSD dataset, line 1-19] The UCSD dataset which is used to train and test the machine learning model includes Ped1 and Ped2 dataset. The training set, which is interpreted as the reference image, includes pedestrians walking across the walkway.), 
Li does not specifically disclose: 
the classification model comprises: support vector machine; a second binary classification that indicates that the anomaly is a particular type of anomaly or that the anomaly is not the particular type of anomaly, wherein the anomaly data is generated to include, based on the second binary classification, a label that indicates that the anomaly is the particular type of anomaly or that the anomaly is not the particular type of anomaly.
Khazai teaches: 
the classification model comprises: support vector machine ([Khazai, page 647, left col, para II. GK-SVDD, line 1-20] The method is implemented using a Support Vector Machines for one-class classification problem (i.e., single class support vector machine), that is trained using the N number of training objects. Acording to [Khazai, page 2011, para I. Introduction, line 5-6] the objects are Earth surface objects. The classification process is binary as Khazai determines whether the pixel contains anomalous data or not (i.e., 1 or 0).); 
Li in view of Khazai in view of Kamen and further in view of Akcay does not specifically disclose: 
a second binary classification that indicates that the anomaly is a particular type of anomaly or that the anomaly is not the particular type of anomaly, wherein the anomaly data is generated to include, based on the second binary classification, a label that indicates that the anomaly is the particular type of anomaly or that the anomaly is not the particular type of anomaly.
Côté teaches: 
a second binary classification that indicates that the anomaly is a particular type of anomaly or that the anomaly is not the particular type of anomaly, wherein the anomaly data is generated to include, based on the second binary classification, a label that indicates that the anomaly is the particular type of anomaly or that the anomaly is not the particular type of anomaly ([Côté, 0092] More than one binary anomaly classifiers (Yes or No) are used to detect different particular types of anomalies. Pattern detection, which is the binary classification model, is trained with historical data and anomalies can be identified and labeled. The data may be labeled as “yes” to indicate the existence of an anomaly while other windows may be labeled as “no” to indicate an absence (or non-existence) of an anomaly.).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Li, Khazai, Kamen, Akcay and Côté to use the method of indicating that the anomaly is a particular type of anomaly or that the anomaly is not the particular type of anomaly of Côté to implement the image anomaly detection method of Li, Khazai, and Kamen. The suggestion and/or motivation to do increase the usage of the system, as utilizing classification processes that are specialized for each anomalies enables the system to detect more diverse types of anomalies by applying a plurality of classification processes.

Claims 8-9, and 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Kamen and further in view of Akcay.

Regarding claim 8, Li teaches a device, comprising: 
receive an input image that depicts an object ([Li, page 99, left col, para 4.3.1. UCSD dataset, line 1-19] The UCSD dataset which is used to train and test the machine learning model includes Ped1 and Ped2 dataset. The training set, which is interpreted as the reference image, includes pedestrians walking across the walkway. The testing set, which is interpreted as the input image, contains anomalies such as cars, bikers, wheelchairs, and skaters along with the pedestrians (object).); 
process, using a feature extraction model, the input image to generate a first feature output that is associated with one or more features of the object ([Li, page 98, left col, line 13-19] The encoder maps the                         
                            
                                    x
                                
                                    a
                                
                                    i
                                
                     to a latent space vector E(                        
                            
                                    x
                                
                                    a
                                
                                    i
                                
                    ). The mapping process is the feature extraction process and the encoder is the feature extraction model, as the mapping process performed by an encoder extracts relevant features from the input data by reducing the dimension.), 
wherein the feature extraction model is trained based on reference images associated with a type of the object ([Li, page 95, right col, para 3. Our method, line 1-10] The two-stream Multivariate Gaussian Fully Convolution Adversaria Autoencoder (MGFC-AAE) is trained based on the normal samples of gradient, which is the reference images, and optical flow patches.); 
wherein the reference images depict non-anomalous objects that are of a same type as the type of the object, and wherein the references do not depict anomalous objects; ([Li, page 99, left col, para 4.3.1. UCSD dataset, line 1-19] The UCSD dataset which is used to train and test the machine learning model includes Ped1 and Ped2 dataset. The training set (i.e., reference images), includes pedestrians walking across the walkway. The testing set contains anomalies such as cars, bikers, wheelchairs, and skaters along with the pedestrians, thus the objects in the testing set and the training set are the same type)
determine, using a classification model, an anomaly status of the object based on the first feature output ([Li, page 93, left col, line 32-39] The appearance stream which is further disclosed in [Li, page 95, Fig. 3] is the classification model, and utilizes gradient map which is the first feature for anomaly detection. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] S_appearance, which are the classification score, are used to determine the anomaly status. The pre-defined threshold is used to determine the anomaly status.), 
wherein the classification model is trained to determine the anomaly status based on a similarity analysis involving non-anomalous objects depicted in the reference images ([Li, page 97, right col, para 3.3. Adversarial training between E and D, line 8-14] The training is performed based on the reconstruction error of the training image. [Li, page 99, left col, para 4.3.1. UCSD dataset, line 1-19] The training set, which is interpreted as the reference image, includes pedestrians walking across the walkway, and it is normal image which does not contain anomalies.);
determine, based on the anomaly status indicating that the input image depicts the object having an anomaly, a location of the anomaly in the input image based on a second feature output of the feature extraction model ([Li, page 93, left col, line 32-39] The motion stream which is further disclosed in [Li, page 95, Fig. 3] is the anomaly localization model and utilizes the optical flow which is the second feature for anomaly detection. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] Based on both the S_appearance and S_motion, which are the classification score and the location score, are used to determine the anomaly status. The pre-defined threshold is used to determine the anomaly status.), 
wherein the location of the anomaly is determined using an anomaly localization model that is trained based on the reference images ([Li, page 93, left col, line 32-39] The motion stream which is further disclosed in [Li, page 95, Fig. 3] is the anomaly localization model and utilizes the optical flow which is the second feature for anomaly detection. [Li, page 99, left col, para 4.3.1. UCSD dataset, line 1-19] The UCSD dataset which is used to train and test the machine learning model includes Ped1 and Ped2 dataset. The training set Ped1 and Ped2, which is interpreted as the reference image, only contains normal video clips, which depicts pedestrians walking across the walkway); 
generate, based on the anomaly status and the location, anomaly data that is associated with the anomaly ([Li, page 98, right col, line 18-30] The total anomaly score which is the anomaly data that is associated with the anomaly is determined based on the S_appearance (i.e., anomaly status) and S_motion (i.e., anomaly location).); 
wherein the first feature output is from an output layer of the convolutional neural network encoder ([Li, page 93, left col, line 32-39] The appearance stream which is further disclosed in [Li, page 95, Fig. 3] is the classification model. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] S_appearance, which are the classification score, are used to determine the anomaly status. The pre-defined threshold is used to determine the anomaly status)
Li does not specifically disclose: 
one or more memories; and one or more processors, coupled to the one or more memories, configured to: perform an action associated with the anomaly data;
wherein the second feature output is from an intermediate layer of the convolutional neural network encoder.
Kamen teaches: 
one or more memories; and one or more processors, coupled to the one or more memories, configured to ([Kamen, 0028] The method of Kamen is implemented using computing device that contains processors and memories.): 
perform an action associated with the anomaly data ([Kamen, 0083] The localization map output from the decoder disclosed in [Kamen, 0038] can be stored in a memory device, displayed using a computer, and/or transmitted to a remote computer system.).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Li and Kamen to use the method of providing, by the device and to an object management system, the anomaly data of Kamen to implement the image anomaly detection method of Li. The suggestion and/or motivation to do so is to increase accessibility of the system by making it easier for users and third party devices to access the output data.
However, Li in view of Kamen does not specifically disclose:
wherein the second feature output is from an intermediate layer of the convolutional neural network encoder.
 Akcay teaches: 
wherein the first feature output is from an output layer of the convolutional neural network encoder, and wherein the second feature output is from an intermediate layer of the convolutional neural network encoder. ([Akcay, page 5, Fig. 2.] and [Akcay, page 8, line 6-15] The feature that is used to update the generator is bottleneck features of the input data, which is output from the output layer of the encoder. [Akcay, page 7, para 3.2 Model Training, line 15-22] Unlike other features, the feature that is used to update the encoder of the neural network is obtained from the internal representation which is the intermediate layer of the convolutional neural network encoder)
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Li, Kamen, and Akcay to use the method of wherein the first feature output is from an output layer of a convolutional neural network encoder of the feature extraction model, and wherein the second feature output is from an intermediate layer of the convolutional neural network encoder of Akcay to implement the image anomaly detection method of Li. The suggestion and/or motivation to do improve the performance of the system, as encoded data loses information during the encoding process and anomaly localization requires less encoded data than anomaly feature extraction to detect the location of the anomaly more accurately.

Regarding claim 9, Li teaches: 
wherein the feature extraction model comprises a convolutional neural network encoder. ([Li, page 98, left col, line 13-19] The encoder maps the                         
                            
                                    x
                                
                                    a
                                
                                    i
                                
                     to a latent space vector E(                        
                            
                                    x
                                
                                    a
                                
                                    i
                                
                    ). The mapping process is the feature extraction process and the encoder is the feature extraction model, as the mapping process performed by an encoder extracts relevant features from the input data by reducing the dimension)

Regarding claim 12, Li in view of Kamen teaches: 
The device of claim 8. 
Li in view of Kamen does not specifically disclose: 
wherein the first feature output and the second feature output are from different layers of a convolutional neural network of the feature extraction model.
Akcay teaches: 
wherein the first feature output and the second feature output are from different layers of a convolutional neural network of the feature extraction model ([Akcay, page 5, Fig. 2.] and [Akcay, page 8, line 6-15] The feature that is used to update the generator is bottleneck features of the input data, which is output from the output layer of the encoder. [Akcay, page 7, para 3.2 Model Training, line 15-22] Unlike other features, the feature that is used to update the encoder of the neural network is obtained from the internal representation which is the intermediate layer of the convolutional neural network encoder. ).

Regarding claim 13, Li teaches wherein the one or more processors, to generate the anomaly data, are configured to: generate a location indicator that identifies the location of the anomaly ([Li, page 96, right col, line 1-9] The optical flow patches which are interpreted as the location indicator is generated by the Motion Stream disclosed in [Li, page 95, Fig.3.]); 
combine the location indicator with the input image ([Li, page 99, left col, para 4.3.1. UCSD dataset, line 20-24] and [Li, page 100, right col, line 4-10; Fig. 8.] The binary red masks are applied as the location indicator.).

Regarding claim 14, Li in view of Kamen teaches wherein the one or more processors, to perform the action, are configured to at least one of: transmit, to a user device, the anomaly data, or control, according to the anomaly data, an object management system to perform an operation associated with the object ([Kamen, 0083] The localization map output from the decoder disclosed in [Kamen, 0038] can be stored in a memory device, displayed using a computer, and/or transmitted to a remote computer system.).

Claims 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Kamen in view of Akcay and further in view of Khazai.

Regarding claim 10, Li teaches: 
wherein the classification model comprises [Li, page 93, left col, line 32-39] The appearance stream which is further disclosed in [Li, page 95, Fig. 3] is the classification model. [Li, page 98, right col, para 3.4. Anomaly detection on testing samples, line 28-36] S_appearance, which are the classification score, are used to determine the anomaly status. The pre-defined threshold is used to determine the anomaly status.)
Li in view of Kamen and further in view of Akcay does not specifically disclose: 
wherein the classification model comprises a support vector machine that is configured to provide a score;
Khazai teaches: 
wherein the classification model comprises a support vector machine that is configured to provide a score; ([Khazai, page 647, left col, para II. GK-SVDD, line 1-20] The method is implemented using a Support Vector Machines for one-class classification problem (i.e., single class support vector machine), that is trained using the N number of training objects. Acording to [Khazai, page 2011, para I. Introduction, line 5-6] the objects are Earth surface objects. The classification process is binary as Khazai determines whether the pixel contains anomalous data or not (i.e., 1 or 0) )
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Li, Kamen, Akcay and Khazai to use the method of wherein the classification model includes a support vector machine of Khazai to implement the image anomaly detection method of Li. The suggestion and/or motivation to do so is to improve the performance of the system, as Support Vector Machine provides clear decision boundaries for imbalanced datasets such as anomalous dataset which can help in interpreting why a particular data point is classified as an anomaly thus makes it easier for the system to classify anomalous data and non-anomalous data.

Regarding claim 11, Li in view of Kamen in view of Akcay and further in view of Khazai teaches: 
wherein the similarity analysis is configured to determine a classification threshold of the support vector machine that is compared with the classification score to determine a binary classification of the anomaly status that is associated with the object including an anomalous feature or not including an anomalous feature ([Khazai, page 647, left col, para II. GK-SVDD, line 1-20] The method is implemented using a Support Vector Machines for one-class classification problem (i.e., single class support vector machine), that is trained using the N number of training objects. Acording to [Khazai, page 2011, para I. Introduction, line 5-6] the objects are Earth surface objects. The classification process is binary as Khazai determines whether the pixel contains anomalous data or not (i.e., 1 or 0).).

Response to Arguments
Response to Argument under 35 U.S.C. 101
Applicant’s arguments, see [Remarks, page 1-3], filed 12/01/2025, with respect to Claims 1-3, 5-15, 17 and 19-20 have been fully considered and are persuasive.  The 35 U.S.C. 101 rejection of Claims 1-3, 5-15, 17 and 19-20 has been withdrawn. 

Response to Argument under 35 U.S.C. 103
	Arguments: [Remarks, page 5] Applicant asserts that the Li’s approach differs fundamentally from the claimed invention, which relies on feature extraction and comparison with reference images to determine the anomaly status of an object.
Examiner’s Response: Examiner respectfully disagrees. The paragraph [Li, page 98, left col, line 13-19] discloses the encoder mapping the             
                
                        x
                    
                        a
                    
                        i
                    
         to a latent space vector E(            
                
                        x
                    
                        a
                    
                        i
                    
        ). The mapping process is the feature extraction process and the encoder is the feature extraction model, as the mapping process performed by an encoder extracts relevant features from the input data by reducing the dimension.
Accordingly, the arguments to claims 1, 8 and 15 are not persuasive. Therefore, the arguments to dependent claims 2-3, 5-7, 9-14, 17 and 19-20 are not persuasive.

Arguments: [Remarks, page 6] Applicant asserts that the Office Action appears to concede that the training sets of Li include both non-anomalous and anomalous objects, which is distinct from the language of the claims, which requires that the reference images include only non-anomalous objects and DO NOT include anomalous objects.  
Examiner’s Response: Examiner respectfully disagrees. Examiner did not concede that the training sets of Li include both non-anomalous and anomalous objects. [Li, page 99, left col, para 4.3.1. UCSD dataset, line 1-19] clearly states “In Ped1, there are 34 normal video clips in training set and 36 abnormal video clips in testing set” and “In Ped2, the training set and testing set contains 16 normal video clips and 12 abnormal video clips with size 320 ×240, respectively” which indicates that the training set (reference data) of Ped1 and the training set (reference data) of Ped2 ONLY contains NORMAL video clips, and the 36 abnormal video clips and 12 abnormal video clips are ONLY included in the TESTING SET, which are not the reference data. 
Accordingly, the arguments to claims 1, 8 and 15 are not persuasive. Therefore, the arguments to dependent claims 2-3, 5-7, 9-14, 17 and 19-20 are not persuasive.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Bergmann et al. “The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection”, 2021 (This prior art teaches utilizing feature descriptors (feature extractor) to perform anomaly detection of images)
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can normally be reached Monday – Friday 7:30AM – 4:30PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached at (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JUN KWON/Examiner, Art Unit 2127                                                                                                                                                                                                        

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127
Read full office action
Prosecution Timeline

May 28, 2021
Application Filed
Jul 05, 2024
Non-Final Rejection — §101, §103
Oct 15, 2024
Response Filed
Oct 24, 2024
Final Rejection — §101, §103
Mar 31, 2025
Request for Continued Examination
Apr 03, 2025
Response after Non-Final Action
May 27, 2025
Non-Final Rejection — §101, §103
Dec 01, 2025
Response Filed
Jan 29, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/087,881
Patent 12602569
EXTRACTING ENTITY RELATIONSHIPS FROM DIGITAL DOCUMENTS UTILIZING MULTI-VIEW NEURAL NETWORKS
2y 5m to grant Granted Apr 14, 2026
17/178,360
Patent 12602609
UPDATING MACHINE LEARNING TRAINING DATA USING GRAPHICAL INPUTS
2y 5m to grant Granted Apr 14, 2026
18/451,880
Patent 12579436
Tensorized LSTM with Adaptive Shared Memory for Learning Trends in Multivariate Time Series
2y 5m to grant Granted Mar 17, 2026
18/811,610
Patent 12572777
Policy-Based Control of Multimodal Machine Learning Model via Activation Analysis
2y 5m to grant Granted Mar 10, 2026
18/759,617
Patent 12493772
LAYERED MULTI-PROMPT ENGINEERING FOR PRE-TRAINED LARGE LANGUAGE MODELS
2y 5m to grant Granted Dec 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
38%
Grant Probability
84%
With Interview (+46.2%)
4y 3m
Median Time to Grant
High
PTA Risk
Based on 68 resolved cases by this examiner. Grant probability derived from career allow rate.