Last updated: April 19, 2026
Application No. 18/542,389
REAL-TIME OBJECT TRACKING USING MOTION AND VISUAL CHARACTERISTICS FOR INTELLIGENT VIDEO ANALYTICS SYSTEMS

Non-Final OA §101§103
Filed
Dec 15, 2023
Examiner
SHERALI, ISHRAT I
Art Unit
2667
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
1 (Non-Final)
Interview Optional

— +5.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 761 resolved cases, 2023–2026
Examiner Intelligence

SHERALI, ISHRAT I View full profile →
Grants 93% — above average
Career Allow Rate
710 granted / 761 resolved
+31.3% vs TC avg
Moderate +6% lift
Without
With
+5.8%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
11 currently pending
Career history
772
Total Applications
across all art units
Statute-Specific Performance

§101
20.6%
-19.4% vs TC avg
§103
30.1%
-9.9% vs TC avg
§102
12.4%
-27.6% vs TC avg
§112
8.7%
-31.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 761 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-9 and 12-18  are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The limitations, under their broadest reasonable interpretation, cover mental process using images frame data  (concept performed in a human mind, including as observation, evaluation, judgment, opinion, prediction, etc.), and mathematical calculations for solving mathematical relationship.  This judicial exception is not integrated into a practical application because the steps do not add meaningful limitations to be considered specifically applied to a particular technological problem to be solved. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the steps of the claimed invention can be done mentally for solving mathematical problem based  on the collected data based on person intelligence using paper/pencil and no additional features in the claims would preclude them from being performed as such.
According to the USPTO guidelines, a claim is directed to non-statutory subject matter if: 
STEP 1: the claim does not fall within one of the four statutory categories of invention (process, machine, manufacture or composition of matter), or 
STEP 2: the claim recites a judicial exception, e.g. an abstract idea, without reciting additional elements that amount to significantly more than the judicial exception, as determined using the following analysis:
STEP 2A (PRONG 1): Does the claim recite an abstract idea, law of nature, or natural phenomenon?
STEP 2A (PRONG 2): Does the claim recite additional elements that integrate the judicial exception into a practical application?
STEP 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
Using the two-step inquiry, it is clear that claims 1 and 10 are directed to an abstract idea as shown below:
STEP 1: Do the claims fall within one of the statutory categories?  
YES.  Claims 1, 13 and 17  are directed to a method i.e., process,  a system i.e., machine and a non-transitory computer readable medium comprising instructions i.e., manufacture.     
STEP 2A (PRONG 1): Is the claim directed to a law of nature, a natural phenomenon or an abstract idea? 
YES, the claims are directed toward a mathematical process/mental process (i.e., abstract idea).
	With regard to STEP 2A (PRONG 1), the guidelines provide three groupings of subject matter that are considered abstract ideas:
Mathematical concepts – mathematical relationships, mathematical formulas or equations, mathematical calculations;
Certain methods of organizing human activity – fundamental economic principles or practices (including hedging, insurance, mitigating risk); commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations); managing personal behavior or relationships or interactions between people (including social activities, teaching, and following rules or instructions); and
Mental processes – concepts that are practicably performed in the human mind (including an observation, evaluation, judgment, opinion).
    

          Claims 1,  13 and 17  comprise a mental process that can be practicably 
performed in the human mind  using paper pencil for solving mathematical relationship (or generic computers or components to perform the process) and, therefore, an abstract idea. 
           Regarding Claim(s) 1, 13 and 17:  (representative claim 1)
           A method comprising:
           obtaining, based on a first set of images depicting an environment, a first visual appearance descriptor associated with a first object included in the environment, wherein the first set of images is generated during a first time period, and wherein the first object is subsequently absent from the environment depicted in a second set of images generated during a second time period  (visually observing image/video data of environment, obtaining visually features array of first object based on first time/instance and arranging feature array of first object using mathematical function or model  on paper  paper/pencil using person mental intelligence i.e. solving mathematical function using mental process);
         obtaining, based on a third set of images depicting the environment, a second visual appearance descriptor associated with a second object included in the environment, wherein the third set of images is generated during a third time period that is subsequent to the second time period (visually observing image/video data of environment, obtaining visually features array of third set of objects associated with second in the image/video based on third time/instance which is subsequent to second time/instance and arranging features array of first object using mathematical function or model  on paper  paper/pencil using person mental intelligence i.e. solving mathematical function using mental- process or person intelligence );
         obtaining a compound similarity metric between the first object and the second object based on at least one of a visual appearance similarity metric or a motion similarity metric, wherein the visual appearance similarity metric corresponds to a degree of similarity between the first visual appearance descriptor and the second visual appearance descriptor  (comparing/matching obtained features array of  first object and features array of second object and measure similarity of features array of first object and second object based on the differences of each feature in feature arrays of objects using video/image data collected on first time/instance and third time/instance and determining similarity based on differences of each feature in feature array of first object and second object i.e. compound similarity of each feature in the feature array of first object and second object using differences of feature based on mental process or person intelligence) and
         responsive to determining that the compound similarity metric meets a threshold value, updating an identifier associated with the second object to correspond to an identifier associated with the first object (based on the differences (i.e., metric)  of each  feature in the feature array and comparing/matching differences of features in the feature arrays of objects  with preset value i.e. mental process of solving mathematical problem using mental/ person intelligence). 
           
           The above limitations, as drafted, is a simple process that, under their broadest reasonable interpretation, covers performance of the limitations in the mind  by a human intelligence and solving mathematical  relationship. Furthermore limitations
 “obtaining, based on a first set of images depicting an environment, a first visual appearance descriptor associated with a first object included in the environment, wherein the first set of images is generated during a first time period, and wherein the first object is subsequently absent from the environment depicted in a second set of images generated during a second time period  (visually observing image/video data of environment, obtaining visually features array of first object based on first time/instance and arranging feature array of first object using mathematical function or model  on paper  paper/pencil using person mental intelligence i.e. solving mathematical function using mental process);   obtaining, based on a third set of images depicting the environment, a second visual appearance descriptor associated with a second object included in the environment, wherein the third set of images is generated during a third time period that is subsequent to the second time period (visually observing image/video data of environment, obtaining visually features array of third set of objects associated with second in the image/video based on third time/instance which is subsequent to second time/instance and arranging features array of first object using mathematical function or model  on paper  paper/pencil using person mental intelligence i.e. solving mathematical function using mental- process or person intelligence ); obtaining a compound similarity metric between the first object and the second object based on at least one of a visual appearance similarity metric or a motion similarity metric, wherein the visual appearance similarity metric corresponds to a degree of similarity between the first visual appearance descriptor and the second visual appearance descriptor  (comparing/matching obtained features array of  first object and features array of second object and measure similarity of features array of first object and second object based on the differences of each feature in feature arrays of objects using video/image data collected on first time/instance and third time/instance and determining similarity based on differences of each feature in feature array of first object and second object i.e. compound similarity of each feature in the feature array of first object and second object using differences of feature based on mental process or person intelligence) and responsive to determining that the compound similarity metric meets a threshold value, updating an identifier associated with the second object to correspond to an identifier associated with the first object (based on the differences (i.e., metric)  of each  feature in the feature array and comparing/matching differences of features in the feature arrays of objects  with preset value i.e. mental process of solving mathematical problem using mental/ person intelligence)”  are insignificant.
   
The Examiner notes that under MPEP 2106.04(A) (2) (III), the courts consider a mental process (thinking, human intelligence) that can be performed in the mind/intelligence using a paper and pencil to be an abstract idea.   CyberSource Corp. v. Retail Decisions, Inc., 654 F.3d 1366, 1372, 99 USPQ2d 1690, 1695 (Fed. Cir. 2011). As the Federal Circuit explained, "methods which can be performed mentally, or which are the equivalent of human mental work, are unpatentable abstract ideas the ‘basic tools of scientific and technological work’ that are open to all.’" 654 F.3d at 1371, 99 USPQ2d at 1694 (citing Gottschalk v. Benson, 409 U.S. 63, 175 USPQ 673 (1972)). See also Mayo Collaborative Servs. v. Prometheus Labs. Inc., 566 U.S. 66, 71, 101 USPQ2d 1961, 1965 ("‘[Mental processes and abstract intellectual concepts are not patentable, as they are the basic tools of scientific and technological work’" (quoting Benson, 409 U.S. at 67, 175 USPQ at 675)); Parker v. Flook, 437 U.S. 584, 589, 198 USPQ 193, 197 (1978).  
          Furthermore the Examiner also notes that even if you combined the math with the mental process, a combination of abstract ideas don't make a claim eligible.  See MPEP 2106.04(II)(A)(2): Because a judicial exception is not eligible subject matter, Bilski, 561 U.S. at 601, 95 USPQ2d at 1005-06 (quoting Chakrabarty, 447 U.S. at 309, 206 USPQ at 197 (1980)), if there are no additional claim elements besides the judicial exception, or if the additional claim elements merely recite another judicial exception, that is insufficient to integrate the judicial exception into a practical application. See, e.g., RecogniCorp, LLC v. Nintendo Co., 855 F.3d 1322, 1327, 122 USPQ2d 1377 (Fed. Cir. 2017) ("Adding one abstract idea (math) to another abstract idea (encoding and decoding) does not render the claim non-abstract").

          Other than generic and well-known computer hardware  i.e., processors and memory/program   to solve the mathematical problem based on the collected data recited in the independent claims 1, 13 and 17 and disclosed in the specification, nothing in the claims 1, 13 and 17  elements preclude the processing from being  performed as mental process, or merely based on the observations, evaluation, judgement, thought process  using collected data for  solving mathematical relationship using paper/pencil based on the human intelligence and mental process.  The generic computer hardware/software  recited in independent claims  is a mere idea of a solution without details per MPEP 2106.05( f ) or the idea of a technological environment without detail per MPEP 2106.05 ( h ).    The generic computing hardware/software  are recited as just to automate the mental process  (Step 2A, prong 1 Test Abstract idea = Yes).

	STEP 2A (PRONG 2): Does the claim recite additional elements that integrate the judicial exception into a practical application? NO, the claims do not recite additional elements that integrate the judicial exception into a practical application.
With regard to STEP 2A (prong 2), whether the claim recites additional elements that integrate the judicial exception into a practical application, the guidelines provide the following exemplary considerations that are indicative that an additional element (or combination of elements) may have integrated the judicial exception into a practical application:
an additional element reflects an improvement in the functioning of a computer, or an improvement to other technology or technical field;
an additional element that applies or uses a judicial exception to affect a particular treatment or prophylaxis for a disease or medical condition; 
an additional element implements a judicial exception with, or uses a judicial exception in conjunction with, a particular machine or manufacture that is integral to the claim;
an additional element effects a transformation or reduction of a particular article to a different state or thing; and
an additional element applies or uses the judicial exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception.

While the guidelines further state that the exemplary considerations are not an exhaustive list and that there may be other examples of integrating the exception into a practical application, the guidelines also list examples in which a judicial exception has not been integrated into a practical application:
an additional element merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea; 
an additional element adds insignificant extra-solution activity to the judicial exception; and 
an additional element does no more than generally link the use of a judicial exception to a particular technological environment or field of use.

Claims 1, 13  and 17  do not recite any of the exemplary considerations that are indicative of an abstract idea having been integrated into a practical application. 
Claims 1, 13 and 17 recite the limitations   “obtaining, based on a first set of images depicting an environment, a first visual appearance descriptor associated with a first object included in the environment, wherein the first set of images is generated during a first time period, and wherein the first object is subsequently absent from the environment depicted in a second set of images generated during a second time period  (visually observing image/video data of environment, obtaining visually features array of first object based on first time/instance and arranging feature array of first object using mathematical function or model  on paper  paper/pencil using person mental intelligence i.e. solving mathematical function using mental process);   obtaining, based on a third set of images depicting the environment, a second visual appearance descriptor associated with a second object included in the environment, wherein the third set of images is generated during a third time period that is subsequent to the second time period (visually observing image/video data of environment, obtaining visually features array of third set of objects associated with second in the image/video based on third time/instance which is subsequent to second time/instance and arranging features array of first object using mathematical function or model  on paper  paper/pencil using person mental intelligence i.e. solving mathematical function using mental- process or person intelligence ); obtaining a compound similarity metric between the first object and the second object based on at least one of a visual appearance similarity metric or a motion similarity metric, wherein the visual appearance similarity metric corresponds to a degree of similarity between the first visual appearance descriptor and the second visual appearance descriptor  (comparing/matching obtained features array of  first object and features array of second object and measure similarity of features array of first object and second object based on the differences of each feature in feature arrays of objects using video/image data collected on first time/instance and third time/instance and determining similarity based on differences of each feature in feature array of first object and second object i.e. compound similarity of each feature in the feature array of first object and second object using differences of feature based on mental process or person intelligence) and responsive to determining that the compound similarity metric meets a threshold value, updating an identifier associated with the second object to correspond to an identifier associated with the first object (based on the differences (i.e., metric)  of each  feature in the feature array and comparing/matching differences of features in the feature arrays of objects  with preset value i.e. mental process of solving mathematical problem using mental/ person intelligence)”  as stated above are insignificant.
The above  limitations are recited at a high level of generality (i.e. as a general action or calculation being taken based on the results of the acquiring step) and amounts to mere post solution actions, which is a form of insignificant extra-solution activity without further detail. Furthermore, the claims are claimed generically and are operating in their ordinary capacity such that they do not use the judicial exception in a manner that imposes a meaningful limit on the judicial exception.  Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.  
         Other than generic and well-known computer hardware  i.e., processors and memory/program   to solve the mathematical problem based on the collected data recited in the independent claims 1 and 12 and  disclosed in the specification, nothing in the claims 1 and 12  elements preclude the processing from being  performed as mental process, or merely based on the observations, evaluation, judgement, thought process for  solving mathematical relationship using paper/pencil based on the human intelligence and mental process.  The generic computer hardware/software  recited in independent claims  1 and 12  are a mere idea of a solution without details per MPEP 2106.05( f ) or the idea of a technological environment without detail per MPEP 2106.05 ( h ).    The generic computing hardware/software  are recited as just to automate the mental process (Step 2A, prong 2 Test Abstract idea = Yes).

STEP 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? 
With regard to STEP 2B, whether the claims recite additional elements that provide significantly more than the recited judicial exception, the guidelines specify that the pre-guideline procedure is still in effect.  Specifically, that examiners should continue to consider whether an additional element or combination of elements:
adds a specific limitation or combination of limitations that are not well-understood, routine, conventional activity in the field, which is indicative that an inventive concept may be present; or  
simply appends well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception, which is indicative that an inventive concept may not be present.

With regard to (2b) the Guidance provided the following examples of limitations that may be enough to qualify as “significantly more" when recited in a claim with a judicial exception:
Improvement to another technology or technical field
Improvement to functioning of computer itself and/or applying the judicial exception with, or by use of, a particular machine
Effecting a transformation or reduction of a particular article to a different state or thing.
Adding a specific limitation other that what is well understood, routine and conventional in the field, or adding unconventional steps that confine the claim to a particular useful application
Meaningful limitation beyond generally linking the use of an abstract idea to a particular technological environment.
The Guidance further set forth limitations that were found not to be enough to qualify as “significantly more” when recited in a claim with a judicial exception include:
Adding words to “apply it” (or an equivalent) with the judicial exception or mere instructions to implement abstract ideas on a computer
Simply appending well-understood, routine and conventional activities previously known to the industry specified at a high level of generality to the judicial exception, e.g. a claim to an abstract idea requiring no more than a generic
Computer to perform generic computer functions that are well -understood, routine and conventional activities previously known to the industry.
Adding insignificant extra-solution activity to the judicial exception, e.g. mere data gathering in conjunction with a law of nature or abstract idea
Generally linking the use of the judicial exception to a particular technological environment or field of use. 

Claims 1, 13  and 17 do not recite any additional elements that are not well-understood, routine or conventional.  
The claims 1, 13 and 17 do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The above identified additional computer components/ processor/software, using instructions to apply the judicial exception, are merely generic computer components that are well-known, routine, and conventional as is evidenced by Bancorp Services v. Sun Life (Fed. Cir. 2012) and Alice Corp. v. CLS Bank (2014).  
Other than generic and well-known computer hardware  i.e., processors and memory/program   to solve the mathematical problem based on the collected data recited in the independent claims 1, 13 and 17 and  disclosed in the specification, nothing in the claims 1, 13 and 17  elements preclude the processing from being  performed as mental process, or merely based on the observations, evaluation, judgement, thought process for  solving mathematical relationship using paper/pencil based on the human intelligence and mental process.  The generic computer hardware/software  recited in independent claims  1 and 12  are a mere idea of a solution without details per MPEP 2106.05( f ) or the idea of a technological environment without detail per MPEP 2106.05 ( h ).    The generic computing hardware/software  are recited as just to automate the mental process.
           Thus, since Claim(s) 1, 13 and 17  are: (a) directed toward an abstract idea, (b) do not recite additional elements that integrate the judicial exception into a practical application, and (c) do not recite additional elements that amount to significantly more than the judicial exception, it is clear that Claim(s) 1  and 12 are not eligible subject matter under 35 U.S.C 101  (Step 2B,  Test Abstract idea = Yes).
Regarding dependent claims 2-9, 12, 14-16 and 18  : the additional limitations 
do not integrate the mental process of solving mathematical relationship based on mental process using paper- pencil into practical application or add significantly more to the mental process.  Claims  2-9 and 1-18 further limit the abstract idea of independent claims 1, 13 and 17.   The limitation(s) of these dependent claims fall under  (solving mathematical relationship/mathematical function using mental process) including solving mathematical relationship/function based on observation and evaluation, and  judgement which can be done mentally in the human mind) OR (insignificant pre/post-solution extra activity of generating/gathering data,  performing mathematical calculation) OR (generic computers components to perform the process, OR generic machine learning for solving mathematical relationship/function.   The generic computer hardware/software  and generic machine mode/neural-networhl are recited in independent claims  2-9, 12, 14-16 and 18  are a mere idea of a solution without details per MPEP 2106.05( f ) or the idea of a technological environment without detail per MPEP 2106.05 ( h ).    The generic computing hardware and generic machine learning model are recited as just to automate the mental process.
          
                       Claim Rejections - 35 USC § 103
      The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


 
         Claims 1-2,  4-9, 12- 14 and 16-18  are rejected under 35 USC 103 as being unpatentable over  Qiu et al.,  ( Kestrel Video Analytics for Augmented Muti-Camera Vehicle tracking,  2018 IEEE 0-7695-6387-2/18, DOI 10.1109/IoTDI.2018.00015, pages 48-59) in view of KOPERSKI et al (US 20180286081).
         Regarding claims 1, 13 and  17  Qiu disclose method/system and non-transitory computer readable comprising instructions (Qiu Fig. 1, page 48, Abstract, page 48 section I. INTRODUCTION, page 56, section E Object Detection Performance)
          obtaining, based on a first set of images depicting an environment, a first visual appearance descriptor associated with a first object included in the environment, wherein the first set of images is generated during a first time period, and wherein the first object is subsequently absent from the environment depicted in a second set of images generated during a second time period (Qui Fig. 1, page 49, right column, section II KESTREL DESIGN, lines 1-10, states  Figure 1 shows the system architecture of Kestrel, which consists of a mobile device pipeline and a cloud pipeline. Videos from fixed cameras are streamed and stored on a cloud. Mobile devices store videos locally and only send metadata to the cloud specifying when and where the video is taken. Given a vehicle to track through the camera network, the cloud detects cars in each frame (object detection), determines the direction of motion of each car (tracking) in each single camera view and extracts visual descriptors for each car (attribute extraction). Then, across nearby cameras, it tries to match cars (cross-camera association), and uses these to infer the path the car took (path inference) only on videos from the fixed camera   and page 52, right-column, section Spatio-temporal Association, lines 1-12 Qiu states  To associate between two object instances (ox,i, oy,i), Kestrel uses the timestamp of the object’s first scene entry T I and its last appearance exiting the scene T O. Kestrel calculates the total travel time from Cam x to Cam y as ΔT(ox,i, oy,j ) = T Iy,j − T Ox,i, and compares this to the estimated time ET(x, y) needed to travel from Cam x to Cam y. Taking the exit timestamp from the first camera and the entry timestamp in the second camera filters out any variance due to a temporary stop (e.g., at a stop sign) while in the camera view.  It is  obvious to  obtain based on a first set of images depicting an environment, a first visual appearance descriptor associated with a first object included in the environment, wherein the first set of images is generated during a first time period, and wherein the first object is subsequently absent from the environment depicted in a second set of images generated during a second time period.);
           obtaining, based on a third set of images depicting the environment, a second visual appearance descriptor associated with a second object included in the environment, wherein the third set of images is generated during a third time period that is subsequent to the second time period (Qui Fig. 1, page 49, right column, section II KESTREL DESIGN, lines 1-10, states  Figure 1 shows the system architecture of Kestrel, which consists of a mobile device pipeline and a cloud pipeline. Videos from fixed cameras are streamed and stored on a cloud. Mobile devices store videos locally and only send metadata to the cloud specifying when and where the video is taken. Given a vehicle to track through the camera network, the cloud detects cars in each frame (object detection), determines the direction of motion of each car (tracking) in each single camera view and extracts visual descriptors for each car (attribute extraction). Then, across nearby cameras, it tries to match cars (cross-camera association), and uses these to infer the path the car took (path inference) only on videos from the fixed camera   and page 52, right-column, section Spatio-temporal Association, lines 1-12 Qiu states  To associate between two object instances (ox,i, oy,i), Kestrel uses the timestamp of the object’s first scene entry T I and its last appearance exiting the scene T O. Kestrel calculates the total travel time from Cam x to Cam y as ΔT(ox,i, oy,j ) = T Iy,j − T Ox,i, and compares this to the estimated time ET(x, y) needed to travel from Cam x to Cam y. Taking the exit timestamp from the first camera and the entry timestamp in the second camera filters out any variance due to a temporary stop (e.g., at a stop sign) while in the camera view.         In the system of Qui  videos from fixed cameras are streamed and stored on a cloud  and the cloud detects cars in each frame (object detection), determines the direction of motion of each car (tracking) in each single camera view and extracts visual descriptors for each car (attribute extraction). Then, across nearby cameras, it tries to match cars (cross-camera association therefore  it is obvious thar the system Qui obtaining, based on a third set of images depicting the environment, a second visual appearance descriptor associated with a second object included in the environment, wherein the third set of images is generated during a third time period that is subsequent to the second time period) ;
           obtaining a compound similarity metric between the first object and the second object based on at least one of a visual appearance similarity metric or a motion similarity metric, wherein the visual appearance similarity metric corresponds to a degree of similarity between the first visual appearance descriptor and the second visual appearance descriptor (page 52, right-column, section Spatio-temporal Association, lines 1-12 Qiu states  To associate between two object instances (ox,i, oy,i), Kestrel uses the timestamp of the object’s first scene entry T I and its last appearance exiting the scene T O. Kestrel calculates the total travel time from Cam x to Cam y as ΔT(ox,i, oy,j ) = T Iy,j − T Ox,i, and compares this to the estimated time ET(x, y) needed to travel from Cam x to Cam y. Taking the exit timestamp from the first camera and the entry timestamp in the second camera filters out any variance due to a temporary stop (e.g., at a stop sign) while in the camera view and page 52, right and column section Pair wise Instance Association, Qiu states An instance corresponds to a specific object captured at a specific camera, together with the associated attributes (§II-B). Instance association determines if object instances at two cameras represent the same object or not. A camera network can be modeled as an arbitrary topology shown in Figure 10. Assume that each intersection has a camera (Cam c), and each camera extracts several instances. We define the nth object instance of Cam c as oc,n. Kestrel infers the association between any pair of object instances (ox,i, oy,j ), using three key techniques: visual features, travel time estimation and the direction of motion and page 54, right column, section Metrics lines 1-8, Qiu states The primary performance metric for Kestrel is accuracy. We use recall and precision to evaluate the accuracy of Kestrel’s association and path inference. Among all the given paths, recall ( T P T P +F N ) measures the fraction of the paths for which Kestrel can make a successful inference. All this obviously corresponds to           obtaining a compound similarity metric between the first object and the second object based on at least one of a visual appearance similarity metric or a motion similarity metric, wherein the visual appearance similarity metric corresponds to a degree of similarity between the first visual appearance descriptor and the second visual appearance descriptor because Qiu inferences and accuracy based on visual features, direction of travel and travel time estimation); and
            responsive to determining that the compound similarity metric meets a threshold value, updating an identifier associated with the second object to correspond to an identifier associated with the first object (Qiu page 53, left column section Visual Similarity lines 1-12, Qiu states . Visual Similarity: The visual similarity of object instances is also used in object association. Given two color histograms H1, H2 from two instances, Kestrel measures their similarity using a “correlation offset, which is 1 minus the correlation between the two histograms”. The lower the offset is, the more likely the two instances are the same. We find that this metric can discriminate well between similar and dissimilar objects. From a sample of 150 images (figure omitted for space reasons), we find that over 95% of identical objects have an offset of less than 0.4, while over 90% of different objects have an offset larger than 0.4. In a more general setting, this threshold can be learned from data and page 53, section Path Inference, right-column lines 1-16 states  those outside the temporal window and those whose visual correlation is too low. In this pruned network, the weight w(ox, oy) of a link l(ox, oy) is defined as the color histogram correlation offset (lower the offset, better is the correlation). A path from a source instance os to destination od in an instance network is defined as an instance path p(os, od) = os, o1, o2, ..., od . A physical route in the real world, termed a camera path, is defined by the sequence of cameras traversed by a vehicle: multiple instance paths can traverse the same camera path. Consider one object of interest ox,i, captured by either a mobile user or a camera operator from Cam x. Kestrel seeks to infer the instance path it takes and answers this query in near-real time, generating the path and instances at each camera as a feedback to the user. responsive to determining that the compound similarity metric meets a threshold value, updating an identifier associated with the second object to correspond to an identifier associated with the first object (Qiu page 53, left column section Visual Similarity lines 1-12, Qiu states . Visual Similarity because  of two objects instances are similar based on visual features and motion or  two objects instances are dis-similar based on visual features and motion  features [spatial-temporal] will needed to be updated as similar feature descriptor or dis-similar feature descriptor).
              In the same field of endeavor KOPERSKI disclose responsive to determining that the compound similarity metric meets a threshold value, updating an identifier associated with the second object to correspond to an identifier associated with the first object (KOPERSKI paragraph 0032 states In one embodiment, the object re-identification component 120 estimates the parameters of the distribution heuristically using the distribution of typical travel speeds if the path distance between cameras A and B is known. When such information is not available, the re-identification component 120 could use a supervised learning algorithm to estimate the parameter values. For example, given an appearance dissimilarity function, the object re-identification component 120 could construct a binary threshold classification to predict whether observations  are indeed the same object. The object re-identification component 120 could select the threshold that generates no false positives on the training data, and could use only these observation pairs, e.g., objects with distinctive appearances that can be re-identified reliably, to estimate the parameters (τ, σ.sup.2) of the normal distribution, paragraph 0045 states Once each patch descriptor has been mapped to the closest codebook entry, a histogram of code frequencies representing each image is generated at block 425. At block 430, the object re-identification component 120 computes a measure of visual dissimilarity (i.e., an appearance cost) between each of two images from the two cameras using the generated histograms. The object re-identification component 120 determines a confidence value for each of the measures of visual dissimilarity and, for measures having a confidence value above a predefined threshold amount of confidence, uses the measures to update the temporal context model and paragraph 0046 states  operation 440, the object re-identification component 120 uses the temporal context model to compute a time cost between each of the two images, using the corresponding timestamps of the two images. The computed time cost and appearance cost are combined into a combined signal cost function (block 445). Using the combined signal cost function at block 450, the object re-identification component 120 formulates simultaneous object re-identification queries as a linear assignment problem. Additionally, the method 400 can return block 435 to refine the temporal context model. For instance, the object re-identification component 120 can continually refine the temporal context model. In one embodiment, the object re-identification component 120 iteratively refines the temporal context model until each iteration results in an insignificant amount of change to the temporal model (e.g., when the object re-identification component 120 determines that the amount of change is less than a predefined threshold amount of change.  It is obvious that KOPERSKI disclose responsive to determining that the compound similarity metric meets a threshold value, updating an identifier associated with the second object to correspond to an feature identifier associated with the first object because they are same object and similarly if it is below threshold update the feature identifier associate with the objects because the not same object.
          Therefore it would have been obvious to one of ordinary skill in the art, before the claimed invention was filed  to  in-response to determine that the compound similarity metric meets a threshold value, update an identifier associated with the second object to correspond to an identifier associated with the first object.as shown by in combination of Qiu and KOPERSKI  because such a system provides  re-identification and tracking of objects in the larger environment of multiple disjoint and fixed heterogenous network of cameras.  
          Regarding claims 2 and 14 Qiu disclose  obtaining the first visual appearance descriptor associated with the first object comprises: providing a subset of image data of the first set of images as an input to a machine learning model, wherein the subset of image data is associated with the first object; and obtaining one or more outputs of the machine learning model comprising a representation of one or more visual appearance characteristics of the first obect (Qui Fig. 1, sows machine learning model system and shows vehicle detection, tracking cameras,  attributes extraction, path inferences, query and association, page 50, right column,  section Attribute Extraction and page 52, right column section Extraction Object Descriptors i.e. this obvious corresponds to  obtaining the first visual appearance descriptor associated with the first object comprises: providing a subset of image data of the first set of images as an input to a machine learning model, wherein the subset of image data is associated with the first object; and obtaining one or more outputs of the machine learning model comprising a representation of one or more visual appearance characteristics of the first  object and other objects).     
            Regarding claim 4 Qui disclose the motion similarity metric corresponds to a degree of similarity between a current spatio-temporal state associated with the second object and the third time period and a predicted future spatio-temporal state associated with the first object and the third time period (Qui Fig. 1, page 49, right column, section II KESTREL DESIGN, lines 1-10, states  Figure 1 shows the system architecture of Kestrel, which consists of a mobile device pipeline and a cloud pipeline. Videos from fixed cameras are streamed and stored on a cloud. Mobile devices store videos locally and only send metadata to the cloud specifying when and where the video is taken. Given a vehicle to track through the camera network, the cloud detects cars in each frame (object detection), determines the direction of motion of each car (tracking) in each single camera view and extracts visual descriptors for each car (attribute extraction). Then, across nearby cameras, it tries to match cars (cross-camera association), and uses these to infer the path the car took (path inference) only on videos from the fixed camera   and page 52, right-column, section Spatio-temporal Association, lines 1-12 states, To associate between two object instances (ox,i, oy,i), Kestrel uses the timestamp of the object’s first scene entry T I and its last appearance exiting the scene T O. Kestrel calculates the total travel time from Cam x to Cam y as ΔT(ox,i, oy,j ) = T Iy,j − T Ox,i, and compares this to the estimated time ET(x, y) needed to travel from Cam x to Cam y. Taking the exit timestamp from the first camera and the entry timestamp in the second camera filters out any variance due to a temporary stop (e.g., at a stop sign) while in the camera view. Therefore it is obvious that Qiu system motion similarity metric corresponds to a degree of similarity between a current spatio-temporal state associated with the second object and the third time period and a predicted future spatio-temporal state associated with the first object and the third time period because Qiu system detecting objects and associating vehicle or objects based on spatio-temporal features and also note: page 53, left column  paragraphs Visual Similarity, Direction Filter and Instance Association).
          Regarding claims 5 and 16 Qiu disclose obtaining the compound similarity metric between the first object and the second object in view of the visual appearance similarity metric and the motion similarity metric (Qui Fig. 1, page 52, right-column, section Spatio-temporal Association, lines 1-12 Qiu states  To associate between two object instances (ox,i, oy,i), Kestrel uses the timestamp of the object’s first scene entry T I and its last appearance exiting the scene T O. Kestrel calculates the total travel time from Cam x to Cam y as ΔT(ox,i, oy,j ) = T Iy,j − T Ox,i, and compares this to the estimated time ET(x, y) needed to travel from Cam x to Cam y. Taking the exit timestamp from the first camera and the entry timestamp in the second camera filters out any variance due to a temporary stop (e.g., at a stop sign) while in the camera view and page 52, right and column section Pair wise Instance Association, Qiu states An instance corresponds to a specific object captured at a specific camera, together with the associated attributes (§II-B). Instance association determines if object instances at two cameras represent the same object or not. A camera network can be modeled as an arbitrary topology shown in Figure 10. Assume that each intersection has a camera (Cam c), and each camera extracts several instances. We define the nth object instance of Cam c as oc,n. Kestrel infers the association between any pair of object instances (ox,i, oy,j ), using three key techniques: visual features, travel time estimation and the direction of motion and page 54, right column, section Metrics lines 1-8, Qiu states The primary performance metric for Kestrel is accuracy. We use recall and precision to evaluate the accuracy of Kestrel’s association and path inference. Among all the given paths, recall ( T P T P +F N ) measures the fraction of the paths for which Kestrel can make a successful inference), 
providing the visual appearance similarity metric and the motion similarity metric as inputs to a machine learning model, wherein the machine learning model is one of: a support vector machine or a neural network and obtaining one or more outputs of the machine learning model  (Qiu Fig. 1 disclose deep learning neural network (Abstract) note in Fig. 1,  Attribute Extraction, Path Inference and Association, page 52, right column Pair-wise Association Qiu states An instance corresponds to a specific object captured at a specific camera, together with the associated attributes (§II-B). Instance association determines if object instances at two cameras represent the same object or not. A camera network can be modeled as an arbitrary topology shown in Figure 10. Assume that each intersection has a camera (Cam c), and each camera extracts several instances. We define the nth object instance of Cam c as oc,n. Kestrel infers the association between any pair of object instances (ox,i, oy,j ), using three key techniques: visual features, travel time estimation and the direction of motion and note: page 54 right column, section  Metrics .  It is obvious that the system of Qiu includes providing the visual appearance similarity metric and the motion similarity metric as inputs to a machine learning model, wherein the machine learning model is one of: a support vector machine or a neural network and obtaining one or more outputs of the machine learning model).
         Regarding claim 6  Qiu disclose the first visual appearance descriptor is obtained before the second set of images is generated (Qui  Fig. 1, page 49, right column, section II KESTREL DESIGN, lines 1-10, states  Figure 1 shows the system architecture of Kestrel, which consists of a mobile device pipeline and a cloud pipeline. Videos from fixed cameras are streamed and stored on a cloud. Mobile devices store videos locally and only send metadata to the cloud specifying when and where the video is taken. Given a vehicle to track through the camera network, the cloud detects cars in each frame (object detection), determines the direction of motion of each car (tracking) in each single camera view and extracts visual descriptors for each car (attribute extraction). Then, across nearby cameras, it tries to match cars (cross-camera association), and uses these to infer the path the car took (path inference) only on videos from the fixed camera. It would be obvious in the system of Qiu obvious to the first visual appearance descriptor is obtained before the second set of images is generated).
          Regarding claim 7 Qiu obtaining, based on a fourth set of images depicting the environment, a third visual appearance descriptor associated with the first object included in the environment, wherein the fourth set of images is generated during a fourth time period that is subsequent to the third time period, wherein the second visual appearance descriptor and the compound similarity metric are obtained before the fourth set of images is generated  (Qui  Fig. 1, page 49, right column, section II KESTREL DESIGN, lines 1-10, states  Figure 1 shows the system architecture of Kestrel, which consists of a mobile device pipeline and a cloud pipeline. Videos from fixed cameras are streamed and stored on a cloud. Mobile devices store videos locally and only send metadata to the cloud specifying when and where the video is taken. Given a vehicle to track through the camera network, the cloud detects cars in each frame (object detection), determines the direction of motion of each car (tracking) in each single camera view and extracts visual descriptors for each car (attribute extraction). Then, across nearby cameras, it tries to match cars (cross-camera association), a
Read full office action
Prosecution Timeline

Dec 15, 2023
Application Filed
Dec 13, 2025
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/121,572
Patent 12592150
METHOD FOR WARNING COLLISION OF VEHICLE, SYSTEM, VEHICLE, AND COMPUTER READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 31, 2026
18/139,928
Patent 12586209
MECHANISM CAPABLE OF DETECTING MOTIONS OF DIFFERENT SURFACE TEXTURES WITHOUT NEEDING TO PERFORM OBJECT IDENTIFICATION OPERATION
2y 5m to grant Granted Mar 24, 2026
18/213,980
Patent 12579820
LEARNING APPARATUS AND LEARNING METHOD
2y 5m to grant Granted Mar 17, 2026
18/511,031
Patent 12548308
METHOD AND SYSTEM FOR FUSING DATA FROM LIDAR AND CAMERA
2y 5m to grant Granted Feb 10, 2026
18/170,919
Patent 12542874
Methods and Systems for Person Detection in a Video Feed
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
93%
Grant Probability
99%
With Interview (+5.8%)
2y 4m
Median Time to Grant
Low
PTA Risk
Based on 761 resolved cases by this examiner. Grant probability derived from career allow rate.