Last updated: April 19, 2026
Application No. 18/647,042
SYSTEMS AND METHODS FOR MULTIPLE SENSOR OBJECT TRACKING

Non-Final OA §102§103
Filed
Apr 26, 2024
Examiner
MAIDEN, MICHAEL KIM
Art Unit
2665
Tech Center
2600 — Communications
Assignee
Palantir Technologies Inc.
OA Round
1 (Non-Final)
Interview Optional

— +8.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 72 resolved cases, 2023–2026
Examiner Intelligence

MAIDEN, MICHAEL KIM View full profile →
Grants 93% — above average
Career Allow Rate
67 granted / 72 resolved
+31.1% vs TC avg
Moderate +9% lift
Without
With
+8.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
16 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
9.8%
-30.2% vs TC avg
§103
52.1%
+12.1% vs TC avg
§102
29.0%
-11.0% vs TC avg
§112
8.0%
-32.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 72 resolved cases
Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
	The information disclosure statement (IDS) was submitted on 11/06/2025. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. 

Claim Status
	Claim(s) 1-3, 11-13, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Weng (US 20200143545 A1).
Claim(s) 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Weng (US 20200143545 A1) in view of Ye (Ye, Y., Zhu, B., Tang, T., Yang, C., Xu, Q., & Zhang, G. (2022). A robust multimodal remote sensing image registration method and system using steerable filters with first- and second-order gradients. ISPRS Journal of Photogrammetry and Remote Sensing, 188, 331–350. https://doi.org/10.1016/j.isprsjprs.2022.04.011).
Claim(s) 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Weng (US 20200143545 A1) in view of Hassan (Hassan, M., Mehmood, A., & Khan, M. F. (2011). An Efficient Method of Tracking across Multiple Cameras. 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, 1477–1481. https://doi.org/10.1109/TrustCom.2011.203).
Claim(s) 6-8 and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Weng (US 20200143545 A1) in view of Hassan (Hassan, M., Mehmood, A., & Khan, M. F. (2011). An Efficient Method of Tracking across Multiple Cameras. 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, 1477–1481. https://doi.org/10.1109/TrustCom.2011.203) and in further view of  Karunasekera (Karunasekera, H., Wang, H., & Zhang, H. (2019). Multiple Object Tracking With Attention to Appearance, Structure, Motion and Size. IEEE Access, 7, 104423–104434. https://doi.org/10.1109/ACCESS.2019.2932301).
Claim(s) 9-10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Weng (US 20200143545 A1) in view of Hassan (Hassan, M., Mehmood, A., & Khan, M. F. (2011). An Efficient Method of Tracking across Multiple Cameras. 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, 1477–1481. https://doi.org/10.1109/TrustCom.2011.203) and in further view of  Peng (Chu, P., Wang, J., You, Q., Ling, H., & Liu, Z. (2023). TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking. Proceedings / IEEE Workshop on Applications of Computer Vision, 4859–4869. https://doi.org/10.1109/WACV56688.2023.00485).

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-3, 11-13, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Weng (US 20200143545 A1).

Regarding claim 1, Weng discloses A method for multiple-sensor object tracking, the method comprising: (¶3 “a computer-implemented method is provided for tracking” ¶43 “Alternatively, the infrared image and the visible image can be captured by separate imaging devices.”)
receiving a first sensor feed and a second sensor feed from a plurality of sensors respectively, the first sensor feed comprising a set of first images, and the second sensor feed comprising a set of second images; (¶43 “The infrared image or the visible image can include an image frame in a series of image frames obtained by an infrared sensor or sensor array…Alternatively, the infrared image and the visible image can be captured by separate imaging devices.” Weng disclose a series of infrared images and visible images that may be captured by two different sensors)
generating an image transformation based on at least one first image in the set of first images and at least one second image in the set of second images; (¶61 “the infrared imaging module that produces the infrared image 402 and the visible imaging module that produces the visible image 404 may be disposed at a certain relative spatial configuration with each other (e.g., baseline displacement, relative rotation). In some embodiments, the calibration parameters describing such configuration can be predetermined, or determined during a calibration step discussed herein. The calibration parameters can be used to align the infrared image 402 and the visible image 404”)
applying the image transformation to the set of second images; (¶61 “For example, image data associated with the visible image 404 may be transformed to from a first coordinate system (e.g., associated with the visible image 404) to a second coordinate system (e.g., associated with the infrared image 402)”)
aggregating the set of first images and the set of transformed second images to generate a set of aggregated images; and (¶62 “the additional features extracted from the visible image 404, such as interior edges, textures, and patterns can be added to the infrared image. For example, some or all features of the target object 412 can be added to the infrared image to form the combined image 406.” Fig. 4 discloses a set of aggregated images)
applying a multiple object tracking model to the set of aggregated images to identify a plurality of objects, (¶72 “At block 702, a combined image based on an infrared image and a visible image is obtained.” ¶73 “At block 704, a target is identified in the combined image.” ¶124 “At block 1014, a tracking indicator is displayed in the combined image. A graphical target indicator (tracking indicator) may be displayed for each of one or more targets identified based on the tracking information.”)
wherein the method is performed using one or more processors. (¶4 “The UAV comprises a memory that stores one or more computer-executable instructions; and one or more processors configured to access the memory and execute the computer-executable instructions”)

Regarding claims 2 and 12, Weng discloses wherein the aggregating the set of transformed first images and the set of transformed second images comprises: (¶62 “the additional features extracted from the visible image 404, such as interior edges, textures, and patterns can be added to the infrared image. For example, some or all features of the target object 412 can be added to the infrared image to form the combined image 406.” Fig. 4 discloses a set of aggregated images)
arranging a first image in the set of first images captured at a first time and a transformed second image in the set of transformed second images captured at approximately the first time adjacent to each other. (¶31 “The infrared image may be captured or generated at approximately the same time as the corresponding visible image.” Fig. 4 discloses the first and second images are arranged adjacent to each other )

Regarding claims 3 and 13, Weng discloses wherein the generating an image transformation based at least in part on the first image and the second image comprises: (¶61 “the infrared image 402 and the…visible image 404 may be disposed at a certain relative spatial configuration with each other (e.g., baseline displacement, relative rotation). In some embodiments, the calibration parameters describing such configuration can be predetermined, or determined during a calibration step discussed herein. The calibration parameters can be used to align the infrared image 402 and the visible image 404”)
applying an image matching model to a first image in the set of first images and a second image in the set of second images to generate an image matching result; and (¶34 “the fusion module 112 can be configured to extract features from, or perform transformation of an infrared image and/or a visible image. The extracted features from the infrared image (infrared features) and the extracted features from the visible image (visible features) can be used to match or align the infrared image and the visible image”)
generating the image transformation based on the image matching result. (¶34 “The extracted features from the infrared image (infrared features) and the extracted features from the visible image (visible features) can be used to match or align the infrared image and the visible image”)

Regarding claim 11, Weng discloses  A system for multiple-sensor object tracking, the system comprising: (¶3 “a computer-implemented method is provided for tracking” ¶43 “Alternatively, the infrared image and the visible image can be captured by separate imaging devices.”)
at least one processor; and (¶4 “The UAV comprises a memory that stores one or more computer-executable instructions; and one or more processors configured to access the memory and execute the computer-executable instructions”)
at least one memory storing instructions that, when executed by the at least one processor, causes the system to perform a set of operations, the set of operations comprising: (¶4 “The UAV comprises a memory that stores one or more computer-executable instructions; and one or more processors configured to access the memory and execute the computer-executable instructions”)
receiving a first sensor feed and a second sensor feed from a plurality of sensors respectively, the first sensor feed comprising a set of first images, and the second sensor feed comprising a set of second images; (¶43 “The infrared image or the visible image can include an image frame in a series of image frames obtained by an infrared sensor or sensor array…Alternatively, the infrared image and the visible image can be captured by separate imaging devices.” Weng disclose a series of infrared images and visible images that may be captured by two different sensors)
generating an image transformation based on at least one first image in the set of first images and at least one second image in the set of second images; (¶61 “the infrared imaging module that produces the infrared image 402 and the visible imaging module that produces the visible image 404 may be disposed at a certain relative spatial configuration with each other (e.g., baseline displacement, relative rotation). In some embodiments, the calibration parameters describing such configuration can be predetermined, or determined during a calibration step discussed herein. The calibration parameters can be used to align the infrared image 402 and the visible image 404”)
applying the image transformation to the set of second images; (¶61 “For example, image data associated with the visible image 404 may be transformed to from a first coordinate system (e.g., associated with the visible image 404) to a second coordinate system (e.g., associated with the infrared image 402)”)
aggregating the set of first images and the set of transformed second images to generate a set of aggregated images; and (¶62 “the additional features extracted from the visible image 404, such as interior edges, textures, and patterns can be added to the infrared image. For example, some or all features of the target object 412 can be added to the infrared image to form the combined image 406.” Fig. 4 discloses a set of aggregated images)
applying a multiple object tracking model to the set of aggregated images to identify a plurality of objects, (¶72 “At block 702, a combined image based on an infrared image and a visible image is obtained.” ¶73 “At block 704, a target is identified in the combined image.” ¶124 “At block 1014, a tracking indicator is displayed in the combined image. A graphical target indicator (tracking indicator) may be displayed for each of one or more targets identified based on the tracking information.”)

Regarding claim 20, Weng discloses A method for multiple-sensor object tracking, the method comprising: (¶3 “a computer-implemented method is provided for tracking” ¶43 “Alternatively, the infrared image and the visible image can be captured by separate imaging devices.”)
receiving a first sensor feed and a second sensor feed from a plurality of sensors respectively, the first sensor feed comprising a set of first images, and the second sensor feed comprising a set of second images; (¶43 “The infrared image or the visible image can include an image frame in a series of image frames obtained by an infrared sensor or sensor array…Alternatively, the infrared image and the visible image can be captured by separate imaging devices.” Weng disclose a series of infrared images and visible images that may be captured by two different sensors)
generating an image transformation based on at least one first image in the set of first images and at least one second image in the set of second images, (¶61 “the infrared imaging module that produces the infrared image 402 and the visible imaging module that produces the visible image 404 may be disposed at a certain relative spatial configuration with each other (e.g., baseline displacement, relative rotation). In some embodiments, the calibration parameters describing such configuration can be predetermined, or determined during a calibration step discussed herein. The calibration parameters can be used to align the infrared image 402 and the visible image 404”) wherein the generating an image transformation based at least in part on the first image and the second image comprises: (¶61 “the infrared image 402 and the…visible image 404 may be disposed at a certain relative spatial configuration with each other (e.g., baseline displacement, relative rotation). In some embodiments, the calibration parameters describing such configuration can be predetermined, or determined during a calibration step discussed herein. The calibration parameters can be used to align the infrared image 402 and the visible image 404”)
applying an image matching model to a first image in the set of first images and a second image in the set of second images to generate an image matching result; and (¶34 “the fusion module 112 can be configured to extract features from, or perform transformation of an infrared image and/or a visible image. The extracted features from the infrared image (infrared features) and the extracted features from the visible image (visible features) can be used to match or align the infrared image and the visible image”)
generating the image transformation based on the image matching result. (¶34 “The extracted features from the infrared image (infrared features) and the extracted features from the visible image (visible features) can be used to match or align the infrared image and the visible image”)
 applying the image transformation to the set of second images; (¶61 “For example, image data associated with the visible image 404 may be transformed to from a first coordinate system (e.g., associated with the visible image 404) to a second coordinate system (e.g., associated with the infrared image 402)”)
aggregating the set of first images and the set of transformed second images to generate a set of aggregated images (¶62 “the additional features extracted from the visible image 404, such as interior edges, textures, and patterns can be added to the infrared image. For example, some or all features of the target object 412 can be added to the infrared image to form the combined image 406.” Fig. 4 discloses a set of aggregated images) wherein the aggregating the set of transformed first images and the set of transformed second images comprises: (¶62 “the additional features extracted from the visible image 404, such as interior edges, textures, and patterns can be added to the infrared image. For example, some or all features of the target object 412 can be added to the infrared image to form the combined image 406.” Fig. 4 discloses a set of aggregated images)
arranging a first image in the set of first images captured at a first time and a transformed second image in the set of transformed second images captured at approximately the first time adjacent to each other; and (¶31 “The infrared image may be captured or generated at approximately the same time as the corresponding visible image.” Fig. 4 discloses the first and second images are arranged adjacent to each other )
applying a multiple object tracking model to the set of aggregated images to identify a plurality of objects, (¶72 “At block 702, a combined image based on an infrared image and a visible image is obtained.” ¶73 “At block 704, a target is identified in the combined image.” ¶124 “At block 1014, a tracking indicator is displayed in the combined image. A graphical target indicator (tracking indicator) may be displayed for each of one or more targets identified based on the tracking information.”)
wherein the method is performed using one or more processors. (¶4 “The UAV comprises a memory that stores one or more computer-executable instructions; and one or more processors configured to access the memory and execute the computer-executable instructions”)
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Weng (US 20200143545 A1) in view of Ye (Ye, Y., Zhu, B., Tang, T., Yang, C., Xu, Q., & Zhang, G. (2022). A robust multimodal remote sensing image registration method and system using steerable filters with first- and second-order gradients. ISPRS Journal of Photogrammetry and Remote Sensing, 188, 331–350. https://doi.org/10.1016/j.isprsjprs.2022.04.011).

Regarding claims 4 and 14, Weng discloses the claimed invention except for wherein the image matching model includes at least one selected from a group consisting of an angle-weighted oriented gradients (AWOGs) algorithm and a channel features of orientated gradients (CFOG) algorithm.
In related art, Ye discloses the image matching model includes at least one selected from a group consisting of an angle-weighted oriented gradients (AWOGs) algorithm and a channel features of orientated gradients (CFOG) algorithm. (Ye: Section 2.1 “Recently, a number of descriptors based on multi-orientated gradient information to depict structural features have also proved to be robust to NRD, among which the channel features of orientated gradients (CFOG) (Ye et al., 2019), the angle-weighted oriented gradient (AWOG) (Fan et al., 2021)…are the most representative ones.” Ye discloses AWOG and CFOG algorithms for depicting structural features to be used in image matching algorithms)
Therefore, it would have been obvious to for one of ordinary skill in the art before the effective filing date to incorporate the use of AWOG and CFOG algorithms to depict structural features for image matching tasks disclosed by Ye into the method of object tracking disclosed by Weng to identify features across the visible image and infrared image to be used for matching.

Claim(s) 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Weng (US 20200143545 A1) in view of Hassan (Hassan, M., Mehmood, A., & Khan, M. F. (2011). An Efficient Method of Tracking across Multiple Cameras. 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, 1477–1481. https://doi.org/10.1109/TrustCom.2011.203).

Regarding claims 5 and 15, Weng discloses the claimed invention except for wherein identifying a set of first detected objects from a first image in the set of first images; 
and identifying a set of second detected objects from a second image in the set of second images, wherein the applying a multiple object tracking model to the set of aggregated images to identify a plurality of objects comprises: 
associating the set of first detected objects and the set of second detected objects such that a specific object in the set of first detected objects is assigned to a specific tracking identifier and the specific object in the set of second detected objects is assigned to the specific tracking identifier. 
In related art, Hassan discloses identifying a set of first detected objects from a first image in the set of first images; (Hassan: Section VI and Fig. 7 disclose tracking multiple objects across a first set of images in camera 1 )
and identifying a set of second detected objects from a second image in the set of second images, wherein the applying a multiple object tracking model to the set of aggregated images to identify a plurality of objects comprises: (Hassan: Section VI and Fig. 7 disclose tracking multiple objects across a second set of images in camera 2)
associating the set of first detected objects and the set of second detected objects such that a specific object in the set of first detected objects is assigned to a specific tracking identifier and the specific object in the set of second detected objects is assigned to the specific tracking identifier. (Hassan: Fig. 7 and Section V “When a tracked object, having some specific label assigned in particular camera comes in field of view of another camera should get the same label and should be tracked there as well.”)
Therefore, it would have been obvious to for one of ordinary skill in the art before the effective filing date to incorporate assigning detected objects a specific tracking identifier across different images disclosed by Hassan into the method of object tracking across infrared and visible images disclosed by Weng to be able to track specific detected objects across images captured by different sensors.

Claim(s) 6-8 and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Weng (US 20200143545 A1) in view of Hassan (Hassan, M., Mehmood, A., & Khan, M. F. (2011). An Efficient Method of Tracking across Multiple Cameras. 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, 1477–1481. https://doi.org/10.1109/TrustCom.2011.203) and in further view of  Karunasekera (Karunasekera, H., Wang, H., & Zhang, H. (2019). Multiple Object Tracking With Attention to Appearance, Structure, Motion and Size. IEEE Access, 7, 104423–104434. https://doi.org/10.1109/ACCESS.2019.2932301).

Regarding claims 6 and 16, Weng, as modified by Hassan, disclose wherein the applying a multiple object tracking model to the set of aggregated images to identify a plurality of objects comprises: (Weng: ¶72 “At block 702, a combined image based on an infrared image and a visible image is obtained.” ¶73 “At block 704, a target is identified in the combined image.” ¶124 “At block 1014, a tracking indicator is displayed in the combined image. A graphical target indicator (tracking indicator) may be displayed for each of one or more targets identified based on the tracking information.”)
Weng, as modified by Hassan, fails to specifically disclose applying a motion model to the set of aggregated images to identify the plurality of objects across at least two images in the set of aggregated images.
In related art, Karunasekera discloses applying a motion model to the set of aggregated images to identify the plurality of objects across at least two images in the set of aggregated images. (Karunasekera: Section III.A.3 “When tracking an object in adjacent frames, its motion helps to predict the position of the object in the next frame. We define the difference between the predicted position pred(x,y) and the measured position det(x,y) as the motion based distance measure as in (7). L2 norm is used. x and y are the center positions of the object bounding box expressed in 2D image coordinates. ndst is the value used to normalize the distance measure.”)
Therefore, it would have been obvious to for one of ordinary skill in the art before the effective filing date to incorporate the using an objects motion to track an object across multiple frames disclosed by Karunasekera into the method of object tracking across infrared and visible images disclosed by Weng, as modified by Hassan, to monitor the motion of the target object across a series of image to track the target object.

Regarding claims 7 and 17, Weng, as modified by Hassan, disclose wherein the applying a multiple object tracking model to the set of aggregated images to identify a plurality of objects comprises: (Weng: ¶72 “At block 702, a combined image based on an infrared image and a visible image is obtained.” ¶73 “At block 704, a target is identified in the combined image.” ¶124 “At block 1014, a tracking indicator is displayed in the combined image. A graphical target indicator (tracking indicator) may be displayed for each of one or more targets identified based on the tracking information.”)
Weng, as modified by Hassan, fail to specifically disclose applying an appearance model to the set of aggregated images to identify the plurality of objects across at least two images in the set of aggregated images.
In related art, Karunasekera discloses applying an appearance model to the set of aggregated images to identify the plurality of objects across at least two images in the set of aggregated images. (Karunasekera: Section III.A.1 “Appearance of an object is an important clue when tracking, as it can be used to recognize the object in different frames as well as to differentiate with other objects… we propose a grid based multiple histogram matching method as an appearance feature.”)
Therefore, it would have been obvious to for one of ordinary skill in the art before the effective filing date to incorporate monitoring the appearance of objects across frames to aid in object tracking disclosed by Karunasekera into the method of object tracking across infrared and visible images disclosed by Weng, as modified by Hassan, to identify a specific target object to be tracked across multiple images from multiple sensors.

Regarding claims 8 and 18, Weng, as modified by Hassan, disclose wherein the applying a multiple object tracking model to the set of aggregated images to identify a plurality of objects comprises: (Weng: ¶72 “At block 702, a combined image based on an infrared image and a visible image is obtained.” ¶73 “At block 704, a target is identified in the combined image.” ¶124 “At block 1014, a tracking indicator is displayed in the combined image. A graphical target indicator (tracking indicator) may be displayed for each of one or more targets identified based on the tracking information.”)
Weng, as modified by Hassan, fail to specifically disclose applying a motion model to the set of aggregated images to generate a first result; 
applying an appearance model to the set of aggregated images to generate a second result; and 
applying an optimization algorithm to the first result and the second result to identify the plurality of objects across at least two images in the set of aggregated images.
In related art, Karunasekera discloses applying a motion model to the set of aggregated images to generate a first result; (Karunasekera: Section III.A.3 “When tracking an object in adjacent frames, its motion helps to predict the position of the object in the next frame. We define the difference between the predicted position pred(x,y) and the measured position det(x,y) as the motion based distance measure as in (7). L2 norm is used. x and y are the center positions of the object bounding box expressed in 2D image coordinates. ndst is the value used to normalize the distance measure.”)
applying an appearance model to the set of aggregated images to generate a second result; and (Karunasekera: Section III.A.1 “Appearance of an object is an important clue when tracking, as it can be used to recognize the object in different frames as well as to differentiate with other objects… we propose a grid based multiple histogram matching method as an appearance feature.”)
applying an optimization algorithm to the first result and the second result to identify the plurality of objects across at least two images in the set of aggregated images. (Karunasekera: Section III.B “Once all four dissimilarity measurements are calculated, overall dissimilarity values dis(i, j) is calculated as in (1) for each ith detection in the current frame against each jth track in the memory. Therefore, this dissimilarity matrix is used in the Hungarian algorithm [12] to assign each of the object detection in the current frame to the best matching track.” Motion and appearance are two of the four dissimilarity measurements used to perform object tracking)  
Therefore, it would have been obvious to for one of ordinary skill in the art before the effective filing date to incorporate measuring the appearance and motion of target objects across frames to aid in object tracking disclosed by Karunasekera into the method of object tracking across infrared and visible images disclosed by Weng, as modified by Hassan, to utilize appearance and motion of target objects to aid in tracking objects across frames in variety of imaging modalities such as infrared images and visible images.

Claim(s) 9-10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Weng (US 20200143545 A1) in view of Hassan (Hassan, M., Mehmood, A., & Khan, M. F. (2011). An Efficient Method of Tracking across Multiple Cameras. 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, 1477–1481. https://doi.org/10.1109/TrustCom.2011.203) and in further view of  Peng (Chu, P., Wang, J., You, Q., Ling, H., & Liu, Z. (2023). TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking. Proceedings / IEEE Workshop on Applications of Computer Vision, 4859–4869. https://doi.org/10.1109/WACV56688.2023.00485).

Regarding claims 9 and 19, Weng, as modified by Hassan, disclose wherein the applying a multiple object tracking model to the set of aggregated images to identify a plurality of objects comprises: (Weng: ¶72 “At block 702, a combined image based on an infrared image and a visible image is obtained.” ¶73 “At block 704, a target is identified in the combined image.” ¶124 “At block 1014, a tracking indicator is displayed in the combined image. A graphical target indicator (tracking indicator) may be displayed for each of one or more targets identified based on the tracking information.”)
Weng, as modified by Hassan, fail to specifically disclose determining a spatial relationship between two objects in the set of first detected objects,
associating the set of first detected objects and the set of second detected objects based at least in part on the spatial relationship.
In related art, Peng discloses determining a spatial relationship between two objects in the set of first detected objects, (Peng: Section 1 “We propose a spatial-temporal graph Transformer (TransMOT) to effectively model the spatial-temporal relationship of the objects for end-to-end learnable as sociation in MOT.” Fig. 1 disclose TransMOT identifying multiple objects)
associating the set of first detected objects and the set of second detected objects based at least in part on the spatial relationship. (Peng: Section 3 “TransMOT maintains a set of Nt−1 tracklets, each of which represents a tracked object. Each tracklet lit−1 maintains a set of states, such as its past locations and appearance features on the previous T image frames. Given a new image frame It, the online tracking algorithm eliminates the tracklets whose tracked object exits the scene, determines whether any tracked objects are occluded, computes new locations for the existing tracklets , and generates new tracklets for new objects that enter the scene.”)
Therefore, it would have been obvious to for one of ordinary skill in the art before the effective filing date to incorporate determining a spatial relationship between tracked objects across multiple images disclosed by Peng into the method of object tracking across infrared and visible images disclosed by Weng, as modified by Hassan, to track target objects across a series of combined images captured from an unmanned aerial vehicle.

Regarding claim 10, Weng, as modified by Hassan and Peng, disclose the spatial relationship includes a spatial graph. (Peng: Section 1 “In TransMOT, objects are arranged as a temporal series of sparse weighted graphs that are constructed using their spatial relationships within each frame.”)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Egnal (US 20080060034 A1) discloses methods and systems for combining multiple video streams are provided. Video feeds are received from multiple optical sensors, and homography information and/or corner metadata is calculated for each frame from each video stream. This data is used to mosaic the separate frames into a single video frame. Local translation of each image may also be used to synchronize the video frames. The optical sensors can be provided by an airborne platform, such as a manned or unmanned surveillance vehicle. Image data can be requested from a ground operator, and transmitted from the airborne platform to the user in real time or at a later time. Various data arrangements may be used by an aggregation system to serialize and/or multiplex image data received from multiple sensor modules. Fixed-size record arrangement and variable-size record arrangement systems are provided.
Yeung (US 20180296281 A1) discloses systems and methods for automated steering control of a robotic endoscope, e.g., a colonoscope, are provided. The control system may comprise: a) a first image sensor configured to capture a first input data stream comprising a series of two or more images of a lumen; and b) one or more processors that are individually or collectively configured to generate a steering control output signal based on an analysis of data derived from the first input data stream using a machine learning architecture, wherein the steering control output signal adapts to changes in the data of the first input data stream in real time.
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL KIM MAIDEN whose telephone number is (703)756-1264. The examiner can normally be reached Monday - Friday 7:30 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Koziol can be reached at 4089187630. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL KIM MAIDEN/Examiner, Art Unit 2665                                                                                                                                                                                                        

/BOBBAK SAFAIPOUR/Primary Examiner, Art Unit 2665
Read full office action
Prosecution Timeline

Apr 26, 2024
Application Filed
Mar 20, 2026
Non-Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/246,625
Patent 12597290
THREE-DIMENSIONAL (3D) FACIAL FEATURE TRACKING FOR AUTOSTEREOSCOPIC TELEPRESENCE SYSTEMS
2y 5m to grant Granted Apr 07, 2026
17/909,828
Patent 12592058
DATA GENERATING METHOD, LEARNING METHOD, ESTIMATING METHOD, DATA GENERATING DEVICE, AND PROGRAM
2y 5m to grant Granted Mar 31, 2026
18/226,050
Patent 12579654
INTERFACE DETECTION IN RECIPROCAL SPACE
2y 5m to grant Granted Mar 17, 2026
18/647,366
Patent 12579830
COMBINING BRIGHTFIELD AND FLUORESCENT CHANNELS FOR CELL IMAGE SEGMENTATION AND MORPHOLOGICAL ANALYSIS IN IMAGES OBTAINED FROM AN IMAGING FLOW CYTOMETER
2y 5m to grant Granted Mar 17, 2026
17/931,878
Patent 12561944
POINT CLOUD DATA PROCESSING APPARATUS, POINT CLOUD DATA PROCESSING METHOD, AND PROGRAM
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
93%
Grant Probability
99%
With Interview (+8.9%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 72 resolved cases by this examiner. Grant probability derived from career allow rate.