Office Action Analysis: 17856599 — REDUCING FALSE ALARMS IN VIDEO SURVEILLANCE SYSTEMS

Examiner Intelligence

GARCIA, PAULO ANDRES View full profile →
Grants 83% — above average
Career Allow Rate
34 granted / 41 resolved
+20.9% vs TC avg
Strong +17% interview lift
Without
With
+17.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
13 currently pending
Career history
54
Total Applications
across all art units
Statute-Specific Performance

§101
16.7%
-23.3% vs TC avg
§103
54.3%
+14.3% vs TC avg
§102
12.9%
-27.1% vs TC avg
§112
10.4%
-29.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 41 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Notice to Applicants
2.	This communication is in response to the application filled on 01/27/2026.
3.	Claims 1-24 are pending.
4.	Limitations appearing inside {} are intended to indicate the limitations not taught by said prior art(s)/combinations.
Information Disclosure Statement
5.	The information disclosure statements (IDS) submitted on 07/01/2022, 10/23/2023, and 02/11/2026 have been considered by the examiner.
Continued Examination Under 37 CFR 1.114
6.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/27/2026 has been entered.

 Response to Arguments
7.	Applicant’s arguments, see pg. 1-3, with regard to the 103 rejection of claim 1 and 11 have been fully considered and are not persuasive. Specifically, applicant argues that the categories of Hietz are not specifically associated with a “false alarm” or a “true alarm” and that none of the categories associate a moving object with human activity, and therefore, Heitz cannot disclose a true alarm indicating the moving object is associated with human activity. The examiner disagrees. Specifically, with regard to Heitz failing to associate categories with a “false alarm” or a “true alarm” the examiner notes that the BRI of false and true alarm are inclusive of the false and true positive determination of Heitz, and since a false positive in the case of Heitz is specifically the categorization of a detection into a category it does not belong to, all the categories/classes of Heitz are associated with false/true alarms in that they all undergo determination for false positives (For an illustrative example, [col. 60, ln. 50-54] “…FIG. 15C does not have a bounding box corresponding to bounding box 1504 as the second analysis determined that the jacket on the chair was not a person. Thus, the detected instance of the potential person within bounding box 1504 comprises a false positive.”). With regard to the argument that Heitz does not associate a true alarm determination with human activity, the examiner specifically notes that Heitz discloses determining false positives, and thus true positives in the case that false positives are not detected, based on the presence of human activity within the video frame ([col. 60, ln. 50-54], [col. 65, ln. 24-33] “…after identifying the one or more potential instances of a person, the system analyzes the one or more potential instances to determine whether the one or more potential instances are false positives… the analyzing includes analyzing the dimensions of the potential instances (e.g., the height, width, and proportionality)… the analyzing is performed as part of the determination as to whether the first image includes the one or more potential instances of a person.”). The examiner further notes that the argument that Heitz suppresses the false positives without linking the suppression to a machine learning algorithm is not convincing, since it is specifically noted in Heitz that the false positive suppression is also implementable as part of the determination of if the first image includes one or more potential instance of a person ([col. 65, ln. 24-33]), and since the determination of if an image includes a potential instance of a person is performed via machine learning ([col. 64, ln. 59 to col. 65, ln. 23] “…the system utilizes facial detection to determine whether the first image includes one or more potential instances of a person… utilizes historical information for the camera to determine whether the first image includes one or more potential instances of a person… utilizes heuristics to determine whether the first image includes one or more potential instances of a person… distinguishes the foreground of an image from the background and analyzes the foreground to determine whether the first image includes one or more potential instances of a person… distinguishes the foreground of the image from the background based on prior training and/or analysis of previous images captured by the camera… utilizes scalable object detection with a deep neural network to determine whether the first image includes one or more potential instances of a person. Scalable object detection using deep neural networks is described in detail in the following paper: Erhan, Dumitru et al., “Scalable Object Detection using Deep Neural Networks,” 2013, which is hereby incorporated by reference in its entirety… utilizes a deep network-based object detector to determine whether the image includes one or more potential instances of a person… utilizes a single shot multibox detector to determine whether the image includes one or more potential instances of a person. A single shot multibox detector is described in detail in the following paper: Liu, Wei et al., “SSD: Single Shot MultiBox Detector,” 2015, which is hereby incorporated by reference in its entirety.”), Heitz teaches false alarm/true alarm determination is linked to a machine learning algorithm. With regard to the argument that the alert as disclosed in Heitz is not generated due to a “true alarm” but rather based on an event category, the examiner notes that in such a case, the object detection and false/true alarm determination have already been performed ([col. 65, ln. 24-33], [col. 64, ln. 59 to col. 65, ln. 23]), and thus the alert that is generated is only for true alarm instances of an object in said event category. Therefore, Heitz sill discloses wherein when the object is classified as a true alarm, to generate an alert. With regard to the arguments that Zhang does not suggest analyzing a score or feature over time associated with a moving object, the examiner likewise disagrees. Specifically, in response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Zhang was relied upon to teach analyzing a feature over time associated with a moving object as part of a machine learning algorithm, as such, arguments with regard to Zhang failing to teach true alarms are not persuasive. With regards to the argument that Zhang does not make a “decision” the examiner notes that this is also irrelevant since it is not associated with the limitation Zhang was relied upon to teach, and that it is also false, since traversing a random forest classifier results in a decision as to the classification of a given object when a leaf node is reached, which is itself a decision at each branch of the tree with regard to the features of an object. Therefore, the argument with regard to Heitz and Zhang are not persuasive, and the 103 rejections of claims 1 and 11 are maintained. 

Claim Rejections - 35 USC § 103

8.	The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

9.	Claims 1-20, and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent No. 10,957,171 to Heitz et al. (hereinafter Heitz) and further in view of “Robust Visual Tracking Using Oblique Random Forests” to Zhang et al. (hereinafter Zhang). 

10.	Regarding Claim 1, Heitz discloses a video surveillance system for reducing false alarms, the video surveillance system comprising: 
a camera ([Figure 5, video source 522-1, camera 118], [Fig. 11A, video source 522, camera 118], [col. 15, ln. 38-43] “FIG. 5 illustrates a representative operating environment 500 in which a server system 508… provides data processing for monitoring and facilitating review of motion events in video streams captured by video cameras 118.”); and 
an electronic processor configured to ([Figure 7A, CPU 702], [col. 19, ln. 61-64] “FIG. 7A is a block diagram illustrating the server system 508 in accordance with some implementations. The server system 508 typically includes one or more processing units (CPUs) 702…”) 
when a moving object is detected in a video captured by the camera, 
perform object detection on the video to determine a score associated with a class of objects ([Figure 7A, Object detection sub-module 7152], [col. 20, ln. 58-63] “…7152 for identifying objects and/or entities within an image and/or a video feed, including, but not limited to: Regioning sub-module 7154 for selecting and/or analyzing regions around potential instance(s) of objects and/or entities”, [Figure 17A-C], [col. 63, ln. 48-53 and col. 64, ln. 1-2, 35-37, 41-58] “…(1704)…the system receives the event indicator from a camera 118… the camera 118 determines if sufficient motion is present in the video feed. If sufficient motion is detected, the camera sends the event indicator to the system… the system utilizes object detection sub-module 7152 to determine whether the first image includes one or more potential instances of a person… denotes a bounding box around each potential instance of a person… determining whether an image includes one or more potential instances of a person includes identifying one or more potential instances and assigning a confidence score to each of the potential instances… the determining includes analyzes the one or more potential instances to determine…false positives.”, [“pre-event analysis”, steps 1708, col. 63, ln. 35-55] “…determines (1708) whether… pre-event images includes… a person… the system utilizes object detection sub-module 7152 to determine whether the first image includes… a person… determining includes… identifying one or more potential instances and assigning a confidence score to each… if the confidence score meets… criteria the system denotes the corresponding instance with a bounding box.”, ), wherein the score represents a likelihood that the moving object detected in the video is associated with a class of objects ([Figure 7A and 7C, confidence criteria 7171], [col. 63, ln. 48-53 and col. 64, ln. 1-2, 35-37, 41-58], [col. 73, ln. 66 to col. 74, ln. 24] “…for each image… analyzes the image to determine whether the image includes a particular object… the system utilizes scalable object detection with a deep neural network to determine whether the first image includes the particular object… the particular object comprises a vehicle, such as a car, truck, boat, or airplane… a weapon… an entity such as an animal (e.g., a pet).”); 
using metadata associated with the video ([Figure 7A, see 7166, subset 7168], [Figure 7B, see 71681-71688], [col. 21, ln. 58 to col. 22, ln. 25] “Motion start data 71681 includes date and time information such as a timestamp… information regarding the amount of motion present and/or the motion start location… motion end data 71684 includes date and time information such as a timestamp… the amount of motion present and/or the motion end location… Event features data 71685 include… event features such as event categorizations /classifications, object masks, motion masks, identified/recognized/tracked motion objects (also sometimes called blobs), information regarding features of the motion objects (e.g., object color, object dimensions, velocity, size changes, etc.), information regarding activity in zones of interest, and the like. Scene features data 71686 includes information regarding the scene in which the event took place such as depth map information, information regarding the location of windows, televisions, fans, the ceiling/floor, etc., information regarding whether the scene is indoors or outdoors, information regarding zones of interest…”), determine a feature over time associated with the moving object detected in the video ([“post-event analysis”, step 1718, col. 66, ln. 56-67] “In accordance with a determination that the plurality of pre-event images does not include another image… obtains (1718)… post-event images…”, [“post-event analysis”, step 1718, col. 67, ln. 23-26] “… images corresponding to the start or stop of motion are selected… images corresponding to an end of a motion track (e.g., a motion stop or exit activity) are selected…” [col. 68, ln. 61 to col. 69, ln. 48, step (1732), (1736), (1738)] “…in accordance with a determination that a match is not found… denotes (1732) the image as containing a person. The system denotes image as containing the included person by adding or updating metadata associated with the image… stores the information… in a database… 716… event information database 7166… determines (1736) whether… images include… a person… system determines whether… images… include… a person… by analyzing metadata for the plurality of post event images… utilizing a database… 716… 7166… denotes (1738) the motion event corresponding to the event indicator as involving a person… by editing or adding metadata for the motion event… by storing information in a database… 7166 or event records 7168… in accordance with a determination that a person was in motion, the person was in a region in which motion occurred, and/or the person corresponds to a motion track.”); 	execute a machine learning algorithm ([Figure 7A see object detection sub-module 7152 for score associated with class of objects], [col. 65, ln. 7-33], [col. 73, ln. 66 to col. 74, ln. 24]) to analyze the scores associated with the class of objects determined by the object detection ([Figure 7A, Object detection sub-module 7152], [col. 20, ln. 58-63] [Figure 17A-C], [col. 63, ln. 48-53 and col. 64, ln. 1-2, 35-37, 41-58] [“pre-event analysis”, steps 1708, col. 63, ln. 35-55] [Figure 7A and 7C, confidence criteria 7171], [col. 63, ln. 48-53 and col. 64, ln. 1-2, 35-37, 41-58]) {analyze the feature over time associated with the moving object detected in the video}, and determine whether the moving object detected in the video is classified as a false alarm or as a true alarm ([Figure 7A and 7C, confidence criteria 7171, Alert sub-module 7151], [Figure 18], [col. 37, ln. 50 to col. 38, ln. 2] “…7148 aggregates all of the information and generates a categorization for the motion event candidate… false positive suppression is optionally performed to reject some motion event candidate… determining whether a motion event candidate is a false positive includes determining whether the motion event candidate occurred in a particular zone… analyzing an importance score for the motion event candidate… based on zones of interest involved with the motion event candidate, background features, motion vectors, scene features, entity features, motion features, motion tracks, and the like.”, [col. 63, ln. 48-53 and col. 64, ln. 1-2, 35-37, 41-58]), wherein the true alarm indicated that the moving object is associated with human activity ([col. 60, ln. 50-54], [col. 65, ln. 24-33] [col. 64, ln. 59 to col. 65, ln. 23]); and 
when the moving object detected in the video is classified as the true alarm, generate an alert ([Figure 7C, Confidence criteria 7171], [col. 24, ln. 38-46] “... the confidence criteria 7171 include a plurality of thresholds…each threshold is associated with a particular type of alert…system determines whether a confidence score exceeds a particular threshold, such as threshold 71716…”, [col. 36, ln. 63 to col. 37, ln. 20] “… an alert is generated and sent to the client device 504 (11169)… an alert is generated based on the even category (1121)… the alert is based on confidence level of the event category.”).
	Specifically, one of ordinary skill in the art, before the effective filling date of the claimed invention, would recognize that Haitz discloses using metadata associated with the video to determine a feature over time associated with the moving object detected in the video, and analyzing said features over time to determine a true/false alarm ([Figure 7A, event categorizer module 7148 for feature over time analysis and metadata (e.g., motion vectors), see object detection sub-module 7152 for score associated with class of objects], [col. 40, ln. 26-45] [Fig. 11F-G categorizers], [col. 75, ln. 11-13]), but does so as a separate analysis from the scores associated with the class of objects. 
	However, Zhang teaches a using metadata associated with the video to determine a feature over time associated with the moving object detected in the video ([pg. 5591, col. 2, 3. Particle filter tracking, par. 1, ln. 1 to pg. 5592, col. 1, par. 2, ln. 9] “Our tracking algorithm is formulated within a particle filtering framework [39]. Let                         
                            
                                    z
                                
                                    t
                                
                     and                         
                            
                                    X
                                
                                    t
                                
                     denote the state variable describing the parameters of the object and the observation respectively in the                         
                            
                                    t
                                
                                    t
                                    h
                                
                     frame. In the particle filter framework, the true posterior state distribution                         
                            p
                            (
                            
                                    z
                                
                                    t
                                
                            |
                            
                                    X
                                
                                    1
                                    :
                                    t
                                
                            )
                        
                     is approximated by a finite set of P samples                        
                             
                                    {
                                    
                                            z
                                        
                                            i
                                        
                                            t
                                        
                                    }
                                
                                    i
                                    =
                                    1
                                
                                    P
                                
                     (called particles) with corresponding normalized weights                         
                            
                                    {
                                    
                                            W
                                        
                                            i
                                        
                                            t
                                        
                                    }
                                
                                    i
                                    =
                                    1
                                
                                    P
                                
                    . The particles are drawn from the proposal density function                         
                            q
                            (
                            
                                    z
                                
                                    t
                                
                            |
                            
                                    z
                                
                                    1
                                    :
                                    t
                                    -
                                    1
                                
                            ,
                            
                                    X
                                
                                    1
                                    :
                                    t
                                
                            )
                        
                     which is set to the state transitional probability                         
                            p
                            (
                            
                                    z
                                
                                    t
                                
                            |
                            
                                    z
                                
                                    t
                                    -
                                    1
                                
                            )
                        
                    . In practice, weight                         
                            
                                    W
                                
                                    i
                                
                     of particle I is given by the observation (likelihood) model                         
                            
                                    W
                                
                                    i
                                
                                    t
                                
                            =
                            
                                    W
                                
                                    i
                                
                                    t
                                    -
                                    1
                                
                            *
                            p
                            
                                            X
                                        
                                            t
                                        
                                            z
                                        
                                            i
                                        
                                            t
                                        
                            ,
                             
                                    1
                                
                            .
                        
                     The observation likelihood is given by the classification result and given as:                         
                            p
                            
                                            x
                                        
                                            t
                                        
                                            z
                                        
                                            t
                                        
                            =
                            
                                    1
                                
                                    K
                                
                                    ∑
                                    
                                        k
                                    
                                    I
                                    
                                                    g
                                                
                                                    k
                                                
                                                            x
                                                        
                                                            t
                                                        
                                            >
                                            0
                                        
                                    ,
                                     
                     where I[.] is the indicator function and                         
                            
                                    g
                                
                                    k
                                
                            (
                            x
                            )
                        
                     is the result of a binary classification from the                         
                            
                                    k
                                
                                    t
                                    h
                                
                     decision tree in the ensemble with K decision tress. We model state parameters consisting of six parameters: translation in x-axis, translation in y-axis, scale variations, rotation angle, aspect ratio and skew angle is modeled using a Gaussian distribution assuming that the dimensions are independent.”) and analyzing the features over time associated with the moving object detected in the video ([pg. 5591, col. 2, 3. Particle filter tracking, par. 1, ln. 1 to pg. 5592, col. 1, par. 2, ln. 9], [pg. 5592, col. 1, 4. Incremental Oblique Random Forests, par. 2, ln. 1 to col. 2, par. 1, ln. 4] “In our particle filter framework, we assume that we have access to N training samples (particles),                         
                            
                                    X
                                
                                    t
                                
                            =
                            {
                            
                                    x
                                
                                    1
                                
                                    t
                                
                            ,
                            …
                            ,
                            
                                    x
                                
                                    N
                                
                                    t
                                
                            }
                        
                     at each instant t obtained form region proposals. The samples are M dimensional:                         
                            
                                    x
                                
                                    i
                                
                                    t
                                
                            ∈
                            X
                            ⊆
                            
                                    R
                                
                                    M
                                
                            ,
                             
                            i
                            ∈
                            {
                            1
                            ,
                            …
                            ,
                            N
                            }
                        
                    . Our objective is to classify the samples in                         
                            
                                    X
                                
                                    t
                                
                     as belonging to class y = 1, indicating the object of interest, or to class y = -1, indicating background region. We achieve this by learning a mapping function                         
                            G
                            :
                             
                            X
                            →
                             
                            Y
                        
                    , where                         
                            Y
                            ∈
                            
                                    -
                                    1
                                    ,
                                     
                                    +
                                    1
                                
                            .
                        
                     We denote a generic data point by x and use                         
                            
                                    x
                                
                                    ⋄
                                
                    , with                         
                            ⋄
                        
                     denoting the placeholder for the index where ever necessary. Our mapping function G is a random forest [9], composed of K base classifiers                         
                            G
                            =
                            
                                    {
                                    
                                            g
                                        
                                            k
                                        
                                    }
                                
                                    k
                                    =
                                    1
                                
                                    K
                                
                    , where the classifiers                         
                            
                                    g
                                
                                    k
                                
                            :
                             
                            X
                            →
                             
                            Y
                            ,
                             
                            k
                            ∈
                            {
                            1
                            ,
                            …
                            ,
                            K
                            }
                        
                    , called decision trees are combined using bagging [8]. Each decision tree                         
                            
                                    g
                                
                                    k
                                
                            (
                            x
                            )
                        
                     classifies a sample                         
                            x
                            ∈
                            X
                        
                     by routing it from the root to some leaf node, recursively, which provides a label for the instance. Specifically, each node j in the tree is associated with a binary split function:                         
                            
                                    f
                                
                                    k
                                    j
                                
                                    x
                                    ,
                                    θ
                                
                            ∈
                            {
                            -
                            1
                            ,
                            +
                            1
                            }
                        
                    , where                         
                            θ
                        
                     is the parameter of the split function. Samples are sent to the right child node if                         
                            
                                    f
                                
                                    k
                                    j
                                
                                    x
                                
                            =
                            1
                        
                     and to the left if                         
                            
                                    f
                                
                                    k
                                    j
                                
                                    x
                                    ,
                                    θ
                                
                            =
                            -
                            1
                        
                     with the process terminating at a leaf (or pure) node. Given an input x, the output of the tree is the prediction stored at the leaf reached by x, which is a target label                         
                            y
                            ∈
                            Y
                        
                     in out case… For each                         
                            
                                    g
                                
                                    k
                                
                    , we employ an oblique decision tree that results in a non-orthogonal hyperplane at each decision node. More specifically, a linear combination of the attributes is tested as follows:                         
                            
                                    f
                                
                                    k
                                    j
                                
                                            x
                                        
                                            ⋄
                                        
                            =
                            
                                                1
                                                 
                                                i
                                                f
                                                 
                                                        ∑
                                                        
                                                            m
                                                            =
                                                            1
                                                        
                                                            M
                                                        
                                                                w
                                                            
                                                                m
                                                            
                                                        *
                                                        
                                                                x
                                                            
                                                                ⋄
                                                                m
                                                            
                                                        <
                                                        b
                                                    
                                                -
                                                1
                                                 
                                                o
                                                t
                                                h
                                                e
                                                r
                                                w
                                                i
                                                s
                                                e
                                            
                            ,
                             
                            (
                            2
                            )
                        
                     where w and b are parameters of the hyperplane… we propose to learn the hyperplane from the data by recursively clustering the data samples in a supervised manner…”, [pg. 5592, col. 2, 4.2. PSVM learning, par. 1, ln. 1-23] “PSVM classifies data points depending on proximity to either one of the two separation planes that are aimed to be pushed away as far apart as possible. The rationale behind PSVM is that the separation hyperplanes are not bounded planes anymore, as done in conventional SVM [15], but are “proximal” planes. An illustration of PSVM hyperplanes and its relation to SVM is shown in Fig. 4. Let                         
                            X
                            =
                            
                                    [
                                    
                                            x
                                        
                                            1
                                        
                                    ,
                                    …
                                    ,
                                    
                                            x
                                        
                                            N
                                        
                                    ]
                                
                                    ⏉
                                
                            ∈
                            
                                    R
                                
                                    N
                                    ×
                                    M
                                
                     be an                         
                            N
                            ×
                            M
                        
                     matrix obtained by stacking N samples in X… Let                         
                            Y
                            =
                            
                                    {
                                    -
                                    1
                                    ,
                                    +
                                    1
                                    }
                                
                                    N
                                
                     be a vector obtained by stacking the labels of samples in X. Here, we drop the time index t in                         
                            
                                    X
                                
                                    t
                                
                     , since the following formulation applies to any t. We define a diagonal matrix D, whose diagonal entries                         
                            
                                    D
                                
                                    i
                                    ,
                                    i
                                
                            =
                            1
                        
                     if                         
                            
                                    x
                                
                                    i
                                
                     belongs to the positive class and -1 otherwise. Then, PSVM aims at solving the following problem:                         
                            
                                            min
                                        
                                            (
                                            w
                                            ,
                                              b
                                            ,
                                            ξ
                                            )
                                        
                                ⁡
                                
                                            1
                                        
                                            2
                                        
                                                    ξ
                                                
                                            2
                                        
                            +
                            v
                            *
                            
                                    1
                                
                                    2
                                
                                            w
                                        
                                            ⏉
                                        
                                    w
                                    +
                                    
                                            b
                                        
                                            2
                                        
                            s
                            .
                            t
                            .
                             
                            D
                            
                                    X
                                    w
                                    -
                                    b
                                    e
                                
                            +
                            ξ
                            =
                            e
                            .
                        
                     (3) where                         
                            ξ
                        
                     is the error vector and v is the regularization parameter, w and b are the coefficient of the hyperplane, and e is a vector of all ones. The parameters of PSVM {w, b} can be computed in closed form and is given by                         
                            
                                    [
                                    w
                                    ;
                                    b
                                    ]
                                
                                    ⏉
                                
                            =
                            
                                    (
                                    v
                                    I
                                    +
                                    
                                            H
                                        
                                            ⏉
                                        
                                    H
                                    )
                                
                                    -
                                    1
                                
                                    H
                                
                                    ⏉
                                
                            D
                            e
                            ;
                            H
                            =
                            
                                    X
                                    ,
                                     
                                    -
                                    e
                                
                            (
                            4
                            )
                        
                    …”, [pg. 5593, col. 1 4.3 Online updates, par. 1, ln. 1 to col. 2, par. 3, ln. 11] “Model update avoids target drift and thus plays an important role in tracking performance…. we propose an efficient method to update the PSVM model parameters when necessary. Let                         
                            D
                            e
                            =
                            Y
                        
                     and suppose as time t, we have the solution                         
                            
                                    β
                                
                                    t
                                
                            =
                            
                                    [
                                    
                                            w
                                        
                                            t
                                        
                                    ,
                                     
                                            b
                                        
                                            t
                                        
                                    ]
                                
                                    ⏉
                                
                    . We can calculate                         
                            
                                    β
                                
                                    t
                                    +
                                    1
                                
                     at time t+1 with new available data recursively from                         
                            
                                    β
                                
                                    t
                                
                     without directly solving Eq. (4). The key problem in updating the parameters is the calculation of                         
                            
                                    (
                                    
                                            H
                                        
                                            ⏉
                                        
                                    H
                                    +
                                    v
                                    I
                                    )
                                
                                    -
                                    1
                                
                    . If the features corresponding to the new available data is                         
                            
                                    H
                                
                                    t
                                    +
                                    1
                                
                     at time (t+1), then the problem of estimating                         
                            
                                    β
                                
                                    t
                                    +
                                    1
                                
                     becomes: Eq. (5)… This is a least squares minimization problem, which leads to the following online update of the parameters based on recursive least squares (RLS): Eq. (6)-(7)…”). Specifically, one of ordinary skill in the art, before the effective filling date of the claimed invention, would recognize Heitz and Zhang as within the same filed of object tracking and classification for video, and as analogous to the claimed invention. The motivation to combine is disclosed in Zhang, wherein it offers a more flexible, less computationally expensive approach to classification ([pg. 5595, col. 2, Computational complexity, par. 1, ln. 1-15] “The overall complexity of our oblique decision tree it is O(N ∗M3 s ), while the complexity of an orthogonal decision tree is O(MsN(log2N)). However, only a few features (Ms = logM in this study) are sampled at each internal node for many tasks such as visual tracking where a large number of training samples accumulate over time, resulting in Ms ≪ logN… Moreover, our tree induction method also results in shallow trees and thus, more efficient. On average, the proposed simple Obli-RaF tracker runs 3 times faster than the orthogonal one (more details on this and sensitivity analysis are in supplementary material).”, [pg. 5595, col. 1, Obli-RaF Vs Orth-RaF, par. 1, ln. 1 to par. 2, ln. 7] “Table 1 compares the precision score (within 20 pixels) and Table 2 compares the success rate (AUC). Clearly, the proposed oblique random forest outperforms the orthogonal random forest in all cases. As noted earlier (section 1), there are two main reasons for the observed performance gap: (i) flexibility of ObliRaF which is not restricted to be axis aligned to the coordinate system of the input features, and (ii) efficient online update procedure of our method which better captures variations in the target object.”). One of ordinary skill in the art, before the effective filling date of the claimed invention, would specifically recognize that, in combining Haitz with Zhang, you would modify the random forest machine learning algorithm of Zhang to operate to produce a true/false alarm detection analogous to the true/false alarm analysis based on features over time presented in Haitz ([Figure 7A, event categorizer module 7148 for feature over time analysis and metadata (e.g., motion vectors)], [col. 40, ln. 26-45] [Fig. 11F-G categorizers], [col. 75, ln. 11-13]). Specifically, this is further obvious in view of the analogous SVM and decision tree nature of the disclosed features over time true/false alarm analysis in Haitz ([col. 75, ln. 11-13]), which is directly related to Zhang which uses both decision trees (as part of a random forest) and PSVM for learning parameters in the nodes of the tree. Unlike Haitz, however, Zhang specifically teaches in a second implementation to use both the confidence scores (provided by deep learning-based object detection analogous to Haitz) and the features over time in the analysis to determine the final classifications ([pg. 5593, Fig. 3], [pg. 5594, col. 1, Obli-RaFT with ConvNet, par. 1, ln. 1-15] “We also propose a second implementation of our tracker motivated by the recent successes of ConvNets… To realize this, we… train two tiny ConvNets on feature maps from conv4-3 and conv5-3 layers of VGG-16 [44] model. The two ConvNets are then used to estimate a heat map of the target object in a generative manner. Meanwhile the proposed Obli-RaF works in a discriminative manner to predict whether one particle belongs to the object of interest or the background. The final confidence of the particle is obtained as the sum of the confidence from our Obli-RaF method and the confidence of the generative ConvNets. For the ConvNets part from we adopt the same parameter setting as in [47].”). Specifically, one of ordinary skill in the art, before the effective filling date of the claimed invention, would recognize VGG-16 as a deep learning model, and directly analogous to the object detection of Haitz, and thus that Zhang in the second implementation describes an analogous combination to that of Haitz with Zhang. One of ordinary skill in the art, before the effective filling date of the claimed invention, would have combined the system of Haitz with the machine learning algorithm analysis of confidence scores and features over time as disclosed in Zhang, through known means, with no change to their respective function, and the combination would have yielded nothing more than predicable results. 
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the system of Haitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claim 1. 

11.	Regarding Claim 2, a combination of Heitz and Zhang teaches the system of claim 1. Heitz discloses wherein the electronic processor is configured to perform object detection using a deep neural network ([col. 63, ln. 48-53 and col. 64, ln. 1-2, 35-37, 41-58], [col. 65, ln. 7-17] “... the system utilizes scalable object detection with a deep neural network to determine whether the first image includes one or more potential instances of a person… utilizes a deep network-based object detector to determine whether the image includes one or more potential instances of a person…utilizes a single shot multibox detector to determine whether the image includes one or more potential instances of a person.”). Likewise, Zhang discloses performing object detection using a deep neural network in the second implementation ([pg. 5593, Fig. 3], [pg. 5594, col. 1, Obli-RaFT with ConvNet, par. 1, ln. 1-15]). 
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the system of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claim 2. 

12.	Regarding Claim 3, a combination of Hietz and Zhang teaches the system of claim 1. Heitz further discloses wherein the machine learning algorithm is a decision tree classifier ([Heitz, col. 75, ln. 11-13]), which as noted in arguments presented in claim 1, is directly related to a random forest classifier. However, Heitz does not specifically disclose a random forest classifier.
	However, Zhang discloses wherein the machine learning algorithm is a random forest classifier ([pg. 5592, col. 1, 4. Incremental Oblique Random Forests, par. 2, ln. 1 to col. 2, par. 1, ln. 4], [pg. 5593, Fig. 3]). The motivation to combine remains analogous to claim 1. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the system of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claim 3.

13.	Regarding Claim 4, a combination of Heitz and Zhang teaches the system of claim 1. Heitz discloses wherein the feature over time associated with the moving object detected in the video is at least one selected from the group consisting of a displacement of the moving object over time, a change in bounding box height associated with the moving object over time, an average directional change of the moving object over time, from a starting position of the moving object, a standard deviation of directional change of the moving object over time, an average distance traveled by the moving object over time, a standard deviation of a distance traveled by the moving object over time, a difference in bounding box width between frames, a difference in bounding box height between frames, a mean absolute percentage error associated with fitting a line to direction values associated with the moving object over time, a mean absolute percentage error associated with fitting a line to position values associated with the moving object over time, and a mean absolute percentage error associated with fitting a line to distance values associated with the moving object over time ([Figure 11C-(b)], [col. 36, ln. 34-63], [col. 38, ln. 7-10] “…the motion vector representing a motion event candidate is a simple two-dimensional linear vector defined by a start coordinate and an end coordinate of a motion entity”, [col. 39, ln. 4-8, 12-15] “…the motion track is used to generate a two-dimensional linear motion vector which only takes into account the beginning and end locations of the motion track (e.g., as shown by the dotted arrow in FIG. 11C-(b))… the motion vector is a non-linear motion vector that traced the entire motion track from the first frame to the last frame of the frame sequence in which the motion entity has moved.”). Vectors are understood to have directionality and displacement, thus constitute both displacement of the moving object over time and average directional change of the moving object over time. It would also be clear to one of ordinary skill in the art, before the effective filling date of the claimed invention, that the features are over time, specifically, in view of event information database 7166 and records 7168, and that the motion vectors are traced across motion tracks for the frame sequence, thus indicating the motion track is specifically over time ([Figure 7A, see 7166, subset 7168], [Figure 7B, see 71681-71688], [col. 21, ln. 58 to col. 22, ln. 25]). Furthermore, Zhang likewise discloses displacement of the moving object over time as the parameters for the tree ([pg. 5591, col. 2, 3. Particle filter tracking, par. 1, ln. 1 to pg. 5592, col. 1, par. 2, ln. 9] “We model state parameters consisting of six parameters: translation in x-axis, translation in y-axis, scale variations, rotation angle, aspect ratio and skew angle is modeled using a Gaussian distribution assuming that the dimensions are independent.”).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the system of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claim 4. 

14.	Regarding Claim 5, a combination of Heitz and Zhang teaches the system of claim 1. Heitz further discloses wherein the feature over time associated with the moving object detected in the video is relevant to determining whether the moving object is associated with human activity ([Figure 11A-B], [col. 36, ln. 34-63], [col. 46, ln. 47-54] “…the selectable filters also include a human filter, which can be one or more characteristics associated with events involving a human being… the one or more characteristics that can be used as a human filter include a characteristic shape (e.g., aspect ratio, size, shape, and the like) of the motion entity, audio comprising human speech, motion entities having human facial characteristics, etc.”). 
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the system of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claim 5.

15.	Regarding Claim 6, a combination of Heitz and Zhang teaches the system of claim 1. Heitz further discloses the video surveillance system further comprising a display device and wherein the electronic processor is further configured to send the alert and the video to the display device ([Figure 8], [col. 25, ln. 1-12], “…504, typically, includes one or more processing units (CPUs) 802, one or more network interfaces 804, memory 806, and one or more communication buses 808 for interconnecting these components (sometimes called a chipset). Optionally, the client device also includes a user interface 810 and one or more built-in sensors 890 (e.g., accelerometer and gyroscope). User interface 810 includes one or more output devices 812 that enable presentation of media content, including one or more speakers and/or one or more visual displays.”, [Figure 11A-B, Alerts for motion events 1105, Client Device 504], [col. 31, ln. 5-9] “…504 receives the alerts 1105 and presents them to a user of the client device. ... the server system sends alert information to the client device 504 and the client device generates the alert based on the alert information.”, [Figure 13A-C], [col. 53, ln. 45-49] “…504 is able to control, review, and monitor video feeds from the one or more cameras 118 with the user interfaces for the application displayed on the client device 504 shown in FIGS. 13A-13C.”).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the system of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claim 6.

16.	Regarding Claim 7, a combination of Heitz and Zhang teaches the system of claim 1. Heitz further discloses the video surveillance system further comprising an input device and wherein the electronic processor is further configured to receive, via the input device, feedback regarding the alert based on the video; and based on the feedback, adjust the machine learning algorithm ([Figure 13A-C, Touch Screen 1306], [col. 56, ln. 22-35] “…FIG. 13B, each of the representations 1336 corresponds to a motion event of a bird flying from left to right across the field of view of the respective camera. In FIG. 13B, each of the representations 1336 is associated with a checkbox 1341. ... when a respective checkbox 1341 is unchecked (e.g., with a tap gesture) the motion event corresponding to the respective checkbox 1341 is removed from the event category B and, in some circumstances, the event category B is re-computed based on the removed motion event. For example, the checkboxes 1341 enable the user of the client device 504 to remove motion events incorrectly assigned to an event category so that similar motion events are not assigned to the event category in the future.”).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the system of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claim 7.

17.	Regarding Claim 8, a combination of Heitz and Zhang teaches the system of claim 1. Heitz further discloses the video surveillance system further comprising a second camera and wherein the electronic processor is further configured to when a second moving object or the moving object is detected in a second video captured by the second camera within a predetermined amount of time after the moving object being detected in the video and the moving object detected in the video is classified as the true alarm, generate a second alarm ([Figure 5, 522-1, 522-n, cameras 118], [Figure 18], [col. 76, ln. 64 to col. 77, ln. 12] “…(1) obtaining a first category of a plurality of motion categories for a first motion event, the first motion event corresponding to a first plurality of video frames from a camera; (2) sending a first alert indicative of the first category to a user associated with the camera; (3) after sending the first alert, obtaining a second category of the plurality of motion categories for a second motion event, the second motion event corresponding to a second plurality of video frames from the camera; (4) in accordance with a determination that the second category is the same as (or substantially the same as) the first category, determining whether a predetermined amount of time has elapsed since the sending of the first alert; (5) in accordance with a determination that the predetermined amount of time has elapsed, sending a second alert indicative of the second category to the user…”).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the system of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claim 8.

18.	Regarding Claim 9, a combination of Heitz and Zhang teaches the system of claim 1. Heitz further discloses wherein the electronic processor is further configured to when a second moving object or the moving object is detected in a second video captured by the camera within a predetermined amount of time after the moving object being detected in the video and the moving object detected in the video is classified as the true alarm, generate a second alarm ([Figure 5, 522-1, 522-n, cameras 118], [Figure 18], [col. 76, ln. 64 to col. 77, ln. 12]).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the system of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claim 9.

19.	Regarding Claim 10, a combination of Heitz and Zhang teaches the system of claim 1. Heitz further discloses wherein the metadata includes at least one selected from the group consisting of timestamped positions of the moving object, bounding boxes around the moving object, and a trajectory of the moving object ([col. 36, ln. 34-63], [col. 37, ln. 44 to col. 38, ln. 2], [col. 21, ln. 62-65] “Motion start data 71681 includes date and time information such as a timestamp and optionally includes additional information such as information regarding the amount of motion present and/or the motion start location. Similarly, motion end data 71684 includes date and time information such as a timestamp and optionally includes additional information such as information regarding the amount of motion present and/or the motion end location.”, [Figure 15A-I], [col. 64, ln. 42-45] “7152 to determine whether the first image includes one or more potential instances of a person. ... the system denotes a bounding box around each potential instance of a person.”).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the system of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claim 10.

20.	Regarding Claim 11, Heitz discloses a method for reducing false alarms in a video surveillance system ([col. 3, ln. 17-22] “Thus, devices, storage mediums, and computing systems are provided with methods for providing event alerts, thereby increasing the effectiveness, efficiency, and user satisfaction with such systems. Such methods may complement or replace conventional methods for providing event alerts.”). For the remainder of the claim, arguments analogous to those made in claim 1 are applicable to claim 11.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the method of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claim 11.

21.	Regarding Claims 12-20, a combination of Heitz and Zhang teaches all the limitations of Claim 11. Arguments analogous to those made in Claims 2-10 are applicable to Claims 12-20 respectively.  
	Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the method of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claims 12-20.

22.	Regarding Claim 22, a combination of Heitz and Zhang teaches the system of claim 7. Heitz further discloses wherein adjusting the machine learning algorithm comprises adjusting which features over time are analyzed by the electronic processor ([col. 41, ln. 14-50] “…each time a new motion vector is to be categorized, the event categorizer places the new motion vector into the vector event space according to its value. If the new motion vector is sufficiently close to or falls within an existing dense cluster, the vector category associated with the dense cluster is assigned to the new motion vector. If the new motion vector is not sufficiently close to any existing cluster, the new motion vector forms its own cluster of one member, and is assigned to the category of unrecognized events. If the new motion vector is sufficiently close to or falls within an existing sparse cluster, the cluster is updated with the addition of the new motion vector. If the updated cluster is now a dense cluster, the updated cluster is promoted, and all motion vectors (including the new motion vector) in the updated cluster are assigned to a new vector category created for the updated cluster. If the updated cluster is still not sufficiently dense, no new category is created, and the new motion vector is assigned to the category of unrecognized events… clusters that have not been updated for at least a threshold expiration period are retired. The retirement of old static clusters helps to remove residual effects of motion events that are no longer valid, for example, due to relocation of the camera that resulted in a scene change. FIG. 11D illustrates an example process for the event categorizer of the server system 508 to (1) gradually learn new vector categories based on received motion events, (2) assign newly received motion vector to recognized vector categories or an unrecognized vector category, and (3) gradually adapt the recognized vector categories to the more recent motion events by retiring old static clusters and associated vector categories, if any. The example process is provided in the context of a density-based clustering algorithm (e.g., sequential DBscan). However, a person skilled in the art will recognize that other clustering algorithms that allow growth of clusters based on new vector inputs can also be used in various implementations.”). Likewise, Zhang discloses adjusting the machine learning algorithm comprises adjusting which features over time are analyzed by the electronic processor ([pg. 5592, col. 2, 4.2. PSVM learning, par. 1, ln. 1-23], [pg. 5593, col. 1 4.3 Online updates, par. 1, ln. 1 to col. 2, par. 3, ln. 11]). 
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the system of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claim 22.

23.	Regarding Claim 23, a combination of Haitz and Zhang teaches the system of claim 1. Heitz does not specifically disclose that the system uses a majority voting mechanism for classification. 
	However, Zhang discloses a machine learning algorithm configured to categorize the moving object detected in the video by implementing majority voting to categorize the results ([pg. 5593, Fig. 3, see description] “Those particles are fed into the oblique random forest classifier. Each tree in the forest recursively clusters the data samples. Each leaf node in the tree will vote for one of the two classes (target object or background). The one with maximum vote will be considered as the tracking result. The model is updates when the number of votes is less than a threshold η. Moreover, the model is retrained when the number of votes is less than µ (µ < η). It can be easily generated by combining with other particle filter based trackers such as the ConvNets models in [47]…”). Specifically, in combining Haitz with Zhang, analogous to arguments provided in claim 1, it would have been obvious to have modified the machine learning algorithm of Zhang to operate on a true/false alarm basis as disclosed in Haitz. The examiner further notes this modification would have been relatively easy to make, since the classification in true/false alarm is a binary classification analogous to the target object/background determination as disclosed in Zhang. The motivation to combine is analogous to that provided in claim 1. 
	Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the system of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang to obtain the invention as specified in claim 23.

24.	Regarding Claim 24, a combination of Haitz and Zhang teaches the system of claim 1. Haitz discloses wherein performing the object detection further includes determining a position of the moving object over time ([Figure 7A, Object detection sub-module 7152], [col. 20, ln. 58-63], [Figure 17A-C], [col. 63, ln. 48-53 and col. 64, ln. 1-2, 35-37, 41-58] [“pre-event analysis”, steps 1708, col. 63, ln. 35-55], [“post-event analysis”, step 1718, col. 66, ln. 56-67]), and wherein the electronic processor is further configured to generate an {overlap score} based on similarities between the position of the object determined from the object detection and the features determined from the metadata associated with the video ([col. 68, ln. 31-39] “The system determines (1728) whether a match is found between the information regarding the included person and the stored persons information… the system utilizes data processing module 7144 and/or object detection sub-module 7152 to determine whether the match is found… determining whether a match is found comprises determining whether the included person is in the same location as one of the stored persons within the image.”). Specifically, one of ordinary skill in the art, before the effective filling date of the claimed invention, would recognize that the stored persons information is analogous to that information which has been stored in database 7168 of Haitz ([col. 68, ln. 61 to col. 69, ln. 48, step (1732), (1736), (1738)], [col. 21, ln. 58 to col. 22, ln. 25]), and therefore, since the matching is specifically performed by either the processing module 7144 or object detection module 7152, the object detection module can be configured to generate a match based on similarities between position of the object determined from the object detection and the features determined from the metadata associated with the video. While Haitz discloses an analogous matching, Haitz does not specifically disclose an “overlap score” is performed in this matching, though the examiner notes that overlap scoring is performed in Haitz with regard to motion events and a selected zone of interest ([col. 48, ln. 9-15] “…an overlap factor is determined for the event mask of each past motion event and a selected zone of interest, and if the overlapping factor exceeds a predetermined overlap threshold, the motion event is deemed to be a relevant motion event for the selected zone of interest.”). One of ordinary skill in the art, before the effective filling date of the claimed invention, would specifically recognize that the “zone of interest” would be analogous to metadata, given the broadest reasonable interpretation of metadata. Furthermore, one of ordinary skill in the art, before the effective filling date of the claimed invention, would recognize that since objects detected in Haitz are denoted with a bounding box ([Fig. 15A-C]), it would have been obvious to one of ordinary skill in the art that the “match” as performed in Haitz, specifically with relation to the same location determination, could be performed by the very common intersect over union (IOU) metric. 
	However, Zhang specifically teaches to determine an overlap score ([pg. 5594, Fig. 5, see (b) Overlap Success] “Following the work of [54], precision plots obtained with a threshold of 20 pixels are shown on the left. Success plots measured using AUC values are shown on the right.”) Specifically, the “AUC” used in Zhang to determine model accuracy is understood as the area under the curve of the intersect over union (IOU) with relation to different thresholds, and is elaborated in [54] as cited in Zhang ([see 892, “Online Object Tracking: A Benchmark” to Wu et al., pg. 2413, col. 2, Success plot, par. 1, ln. 1 to pg. 2414, col. 1, par. 1, ln. 12] “Another evaluation metric is the bounding box overlap. Given the tracked bounding box                         
                            
                                    r
                                
                                    t
                                
                     and the ground truth bounding box                         
                            
                                    r
                                
                                    a
                                
                    , the overlap score is defined as                         
                            S
                            =
                            
                                                    r
                                                
                                                    t
                                                
                                                ⋂
                                                
                                                            r
                                                        
                                                            a
                                                        
                                            r
                                        
                                            t
                                        
                                        ⋃
                                        
                                                    r
                                                
                                                    a
                                                
                    , where                         
                            ∩
                        
                     and                         
                            ∪
                        
                     represent the intersection and union of two regions, respectively, and |.| denotes the number of pixels in the region. To measure the performance on a sequence of frames, we count the number of successful frames whose overlap S is larger than the given threshold                         
                            
                                    t
                                
                                    o
                                
                    . The success plot shows the ratios of successful frames at the thresholds varied from 0 to 1. Using one success rate value at a specific threshold (e.g.                         
                            
                                    t
                                
                                    o
                                
                            =
                            0.5
                        
                    ) for tracker evaluation may not be fair or representative. Instead, we use the area under curve (AUC) of each success plot to rank the tracking algorithms.”). Therefore, one of ordinary skill in the art, before the effective filling date of the claimed invention, would recognize Zhang discloses an “overlap score” (i.e., IOU). The motivation to combine the overlap score of Zhang with the system of Haitz would have been obvious to one of ordinary skill in the art, in that is effectively represents the similarity between to pixel spaces (i.e., object delineated with a bounding box) with relation to the positions of the object. One of ordinary skill in the art, before the effective filling date of the claimed invention, would have combined the overlap score of Zhang with the system of Haitz through known means, with no change to their respective function, and the combination would have yielded nothing more than predicable results. 
	Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine the system of Heitz with the machine learning algorithm analysis of confidence scores and features over time and the overlap score of Zhang to obtain the invention as specified in claim 24.

25.	Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent No. 10,957,171 to Heitz in view of “Robust Visual Tracking Using Oblique Random Forests” to Zhang, and further in view of U.S. Patent No. 10,867,217 to Madden (hereinafter Madden).

26.	Regarding Claim 21, Heitz further discloses wherein the deep neural network is different than the machine learning algorithm ([Figure 7A, event categorizer module 7148 for feature over time analysis and metadata (e.g., motion vectors), see object detection sub-module 7152 for score associated with class of objects], see also [Fig. 11F-G categorizers], [col. 75, ln. 11-13] which describes 7148 vs. [col. 65, ln. 7-17] which describes 7152). Heitz does not specifically disclose retraining the neural network based on the feedback, though one of ordinary skill in the art, before the effective filling date of the claimed invention, would specifically recognize that Heitz does disclose analogous updating of the event categorizer module 7148 based on feedback ([col. 41, ln. 14-50], [col. 45, ln. 46-] “…a user may review past motion events and their categories on the event timeline… the user is allowed to edit the event category assignments 1109, for example, by removing one or more past motion events from a known event category. When the user has edited the event category composition of a particular event category by removing one or more past motion events from the event category, the user-facing frontend notifies the event categorizer of the edits… the event categorizer removes the motion vectors of the removed motion events from the cluster corresponding to the event category, and re-computes the cluster parameters (e.g., cluster weight, cluster center, and cluster radius)… the removal of motion events from a recognized cluster optionally causes other motion events that are similar to the removed motion events to be removed from the recognized cluster as well… manual removal of one or more motion events from a recognized category may cause one or more motion events to be added to event category due to the change in cluster center and cluster radius… the event category models are stored in the event category models database 1108 (FIG. 11A), and is retrieved and updated in accordance with the user edits.”). 
While Zhang likewise teaches that the deep neural network is different then the machine learning algorithm ([pg. 5594, col. 1, Obli-RaFT with ConvNet, par. 1, ln. 1-15]), Zhang does not specifically disclose retraining the neural network based on the feedback, instead only adjusting features analyzed by the machine learning based on the feedback ([pg. 5592, col. 2, 4.2. PSVM learning, par. 1, ln. 1-23], [pg. 5593, col. 1 4.3 Online updates, par. 1, ln. 1 to col. 2, par. 3, ln. 11]). 
	However, Madden teaches to retrain a neural network based on the feedback ([col. 30, ln. 40-46] “The model trainer 144 may … continuously train the deep learning model … A deep learning model may be a neural network… For example… 144 may train a convolutional neural network model (CNN) or a recurrent neural network model (RNN).”, [col. 12, ln. 4 to col. 13, ln. 2] “…a pre-trained visual classification model is preloaded onto the data collector system 132. The … model is stored in the sensor database 134… the … can be adaptively retrained with non-visual data 128 and visual data 130 collected from the monitored control unit 108… 128 can be used as ground truth and can retrain the visual classification model for improved performance. For retraining the visual classification model, can use the non-visual sensor data stream as a trigger for an event, and can evaluate how well the visual classification model performed (with the stored pre-trained model data) and provides the following modes of correction if the event was not detected by the visual classification model… 132 can collect training samples from the missed event, add them to a training set, and retrain the neural network model for improved classification… 132 may receive feedback from the monitor control unit, such as monitor control unit 108. The feedback may be similar to the data used to train the neural network model in …144. The data collection system 132 may provide the feedback to … 144 to tune the trained visual model 146. For example, the camera 122-A that includes the trained visual model 146 provides an indication of object identification or object movement to the client device 106 of user 104. After the user 104 reviews the indication of identification or movement from the camera 122-A, which includes the real-time video feed from the camera 122-A or the frame where object or movement was detected, the user 104 determines the trained visual model 146 produced an error and no object exists in the frame or no movement actually occurred.”). One of ordinary skill in the art, before the effective filling date of the claimed invention, would recognize that Heitz and Madden are within the same field of image processing for surveillance systems, and as analogous to the claimed invention. The motivation for including the feedback retraining of Madden would have been obvious to one of ordinary skill in the art, and is taught in Madden, in that by allowing for feedback the models accuracy can improve by reducing errors ([col. 12, ln. 4 to col. 13, ln. 2]). One of ordinary skill in the art, before the effective filling date of the claimed invention, would have combined the feedback retraining of Madden with the system of Haitz and the machine learning algorithm analysis of confidence scores and features over time of Zhang through known means, with no change to their respective function, and the combination would have yielded nothing more than predicable results.
	Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to combine system of Heitz with the machine learning algorithm analysis of confidence scores and features over time of Zhang and the feedback retraining of the neural network of Madden to obtain the invention as specified in claim 21.

Conclusion
27.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. See PTO-892. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAULO ANDRES GARCIA whose telephone number is (703)756-5493. The examiner can normally be reached Mon-Fri, 8-4:30PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on (571)272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PAULO ANDRES GARCIA/Examiner, Art Unit 2669           
/CHAN S PARK/Supervisory Patent Examiner, Art Unit 2669
Read full office action
Prosecution Timeline

Jul 01, 2022
Application Filed
Sep 20, 2024
Non-Final Rejection — §103
Nov 22, 2024
Applicant Interview (Telephonic)
Nov 22, 2024
Examiner Interview Summary
Dec 13, 2024
Response Filed
Jan 17, 2025
Final Rejection — §103
Mar 03, 2025
Request for Continued Examination
Mar 06, 2025
Response after Non-Final Action
Apr 02, 2025
Non-Final Rejection — §103
Jul 16, 2025
Interview Requested
Jul 23, 2025
Examiner Interview Summary
Aug 06, 2025
Response Filed
Oct 23, 2025
Final Rejection — §103
Jan 27, 2026
Request for Continued Examination
Jan 30, 2026
Response after Non-Final Action
Mar 02, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/254,181
Patent 12602823
RE-LOCALIZATION OF ROBOT
2y 5m to grant Granted Apr 14, 2026
17/800,891
Patent 12597280
IMAGE PROCESSING SYSTEM, IMAGE PROCESSING METHOD, AND PROGRAM
2y 5m to grant Granted Apr 07, 2026
18/098,540
Patent 12597161
SYSTEMS AND METHODS FOR OBJECT TRACKING AND LOCATION PREDICTION
2y 5m to grant Granted Apr 07, 2026
17/837,564
Patent 12586400
IMAGE PROCESSING APPARATUS, CONTROL METHOD THEREOF, AND STORAGE MEDIUM
2y 5m to grant Granted Mar 24, 2026
17/984,832
Patent 12586176
SYSTEMS AND METHODS FOR PREDICTING AN INCOMING ROTATIONAL BALANCE OF AN UNFINISHED WORKPIECE
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

5-6
Expected OA Rounds
83%
Grant Probability
99%
With Interview (+17.2%)
3y 2m
Median Time to Grant
High
PTA Risk
Based on 41 resolved cases by this examiner. Grant probability derived from career allow rate.
REDUCING FALSE ALARMS IN VIDEO SURVEILLANCE SYSTEMS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

REDUCING FALSE ALARMS IN VIDEO SURVEILLANCE SYSTEMS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email