Last updated: May 04, 2026

Application No. 18/642,890

TECHNIQUES FOR ROBUST REAL-TIME MULTIPLE-OBJECT TRACKING WITH DETECTION PROPAGATION AND PER-CLASS OPTIMIZATION

Non-Final OA §102§103

Filed

Apr 23, 2024

Examiner

GILLIARD, DELOMIA L

Art Unit

2661

Tech Center

2600 — Communications

Assignee

Microsoft Technology Licensing, LLC

OA Round

1 (Non-Final)

Interview Optional

— +10.2% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 90% grant rate with +10.2% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 1091 resolved cases, 2023–2026

Examiner Intelligence

GILLIARD, DELOMIA L View full profile →

Grants 90% — above average

Career Allowance Rate

978 granted / 1091 resolved

+27.6% vs TC avg

Moderate +10% lift

Without

With

+10.2%

Interview Lift

resolved cases with interview

Fast prosecutor

1y 12m

Avg Prosecution

12 currently pending

Career history

1103

Total Applications

across all art units

Statute-Specific Performance

§101

10.0%

-30.0% vs TC avg

§103

48.8%

+8.8% vs TC avg

§102

15.5%

-24.5% vs TC avg

§112

11.3%

-28.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1091 resolved cases

Office Action

§102 §103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3, 8 and 10-12 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by ByteTrack: Multi-Object Tracking by Associating Every Detection Box to Zhang et al., hereinafter, “Zhang”.
Claim 1. A data processing system comprising: [Abstract] To put forwards the state-of-the art performance of MOT, we design a simple and strong tracker, named ByteTrack, Algorithm 1

a processor; [Abstract] For the first time, we achieve 80.3 MOTA, 77.3 IDF1 and 63.1 HOTA on the test set of MOT17 with 30 FPS running speed on a single V100 GPU.

[4.1. Setting – Implementation Details] The model is trained on 8 NVIDIA Tesla V100 GPU with batch size of 48.

and a memory storing executable instructions that, when executed, cause the processor alone or in combination with other processors to perform operations of: [Abstract] For the first time, we achieve 80.3 MOTA, 77.3 IDF1 and 63.1 HOTA on the test set of MOT17 with 30 FPS running speed on a single V100 GPU., Algorithm 1; inherently the GPU would require some type of memory to execute the instructions for carrying out the operation of the device)
 
obtaining a frame of video content at an object detection pipeline, [Figure 5] , [4.1. Setting] Datasets (pipeline)

the video content comprising a plurality of frames; Algorithm 1, Input: a video sequence

analyzing the frame of video content using an object detection model to detect a plurality of objects in the frame of video content, [BDDD100K] We utilize a simple ResNet-50 ImageNet classification model from UniTrack[68] to extract Re-ID features and compute appearance similarity.

the object detection model associating each object of the plurality of objects with a confidence score; [Figure 2] models prediction boxes with high and low scores 

[BDDD100K] We utilize a simple ResNet-50 ImageNet classification model from UniTrack[68] to extract Re-ID features and compute appearance similarity.

performing a primary matching operation on high confidence detection objects to determine first object tracks of the high confidence detection objects by associating the high confidence detection objects with an object track of a plurality of object tracks 
[Abstract] Most methods obtain identities by associating detection boxes whose scores are higher than a threshold...[introduction] We first match the high score detection boxes to the tracklets based on motion similarity or appearance similarity

the high confidence detection objects being objects from the plurality of objects associated with a confidence score that satisfies a confidence threshold, [Abstract] Most methods obtain identities by associating detection boxes whose scores are higher than a threshold

Algorithm 1, lines 3-13 [3. BYTE] For each frame in the video, we predict the detection boxes and scores using the detector Det. We separate all the detection boxes into two parts Dhigh and Dlow according to the detection score threshold τ. For the detection boxes whose scores are higher than τ ,we put them into the high score detection boxes Dhigh.

the first object tracks tracking the high confidence detection objects across the plurality of frames; Algorithm 1, Input: a video sequence, [3. BYTE] For each frame in the video

performing a secondary matching operation on low confidence detection objects to determine second object tracks of the low confidence detection objects by associating the low confidence detection objects with an object track of the plurality of object tracks, [Introduction] Then, we perform the second matching between the unmatched track lets, i.e. the tracklet in red box, and the low score detection boxes using the same motion similarity.

the low confidence detection objects being objects from the plurality of objects associated with a confidence score that does not satisfy the confidence threshold, Algorithm 1, lines 3-13 [3. BYTE] For each frame in the video, we predict the detection boxes and scores using the detector Det. We separate all the detection boxes into two parts Dhigh and Dlow according to the detection score threshold τ…For the detection boxes whose scores are lower than τ ,we put them into the low score detection boxes  Dlow 

the second object tracks tracking the low confidence detection objects across the plurality of frames; Zhang [3. BYTE] For each frame in the video

and outputting, from the object detection pipeline, the first object tracks and the second object tracks. [Abstract] To solve this problem, we present a simple, effective and generic association method, tracking by associating almost every detection box instead of only the high score ones.

Algorithm 1 Output: Tracks T of the video

[3. BYTE] The output of BYTE is the tracks T of the video and each track contains the bounding box and identity of the object in each frame.

Claim 2. Zhang further teaches wherein performing the primary matching operation on high confidence detection objects further comprises: Zhang [Abstract] Most methods obtain identities by associating detection boxes whose scores are higher than a threshold...[introduction] We first match the high score detection boxes to the tracklets based on motion similarity or appearance similarity

comparing attributes of a high confidence detection object with attributes of previously tracked objects from a previous frame of the video content to determine whether the high confidence detection objects correspond to the previously tracked objects, the previously tracked objects being associated with an object track; [Figure 2] Frames t1, t2, t3 (series of frames)

[Abstract] we utilize their similarities with tracklets to recover true objects

associating the high confidence detection object with the object track of a previously tracked object that corresponds with the high confidence detection object; Zhang [Abstract] Most methods obtain identities by associating detection boxes whose scores are higher than a threshold… Figure 2. (b) shows the tracklets obtained by previous methods which associates detection boxes whose scores are higher than a threshold, i.e. 0.5. The same box color represents the same identity.

and associating the high confidence detection object with a new object track responsive to the high confidence detection object not corresponding with any of the previously tracked objects. Algorithm 1, line 22,  [3. BYTE] Finally, we initialize new tracks from  the unmatched high score detection boxes Dremain after the first association.

Claim 3. Zhang further teaches wherein performing the secondary matching operation on low confidence detection objects further comprises: [Introduction] Then, we perform the second matching between the unmatched track lets, i.e. the tracklet in red box, and the low score detection boxes using the same motion similarity. 

comparing attributes of a low confidence detection object with attributes of previously tracked objects from a previous frame of the video content to determine whether the low confidence detection objects correspond to the previously tracked objects, the previously tracked objects being associated with an object track; [Figure 2] Frames t1, t2, t3 (series of frames)

[Abstract] we utilize their similarities with tracklets to recover true objects

associating the low confidence detection object with the object track of a previously tracked object that corresponds with the low confidence detection object; [Introduction] As shown in Figure 2 (c), two low score detection boxes are matched to the track lets by the motion model’s predicted boxes, and thus the objects are correctly recovered.

and discarding the low confidence detection object responsive to the low confidence detection object not corresponding with any of the previously tracked objects. 
[BYTE] After the association, the unmatched tracks will be deleted from the tracklets.

Claim 8. Zhang further teaches wherein the memory further includes instructions configured to cause the processor alone or in combination with other processors to perform operations of: performing a tuning operation to tune one or more hyperparameters of the object detection model to determine object-class specific hyperparameter values for the object detection model. [Robustness to detection score threshold] The detection score threshold high is a sensitive hyper-parameter and needs to be carefully tuned in the task of multi-object tracking…

Claim 10. Reviewed and analyzed in the same way as claim 1. See the above analysis and rationale. 

Claim 11. Reviewed and analyzed in the same way as claim 2. See the above analysis and rationale. 

Claim 12. Reviewed and analyzed in the same way as claim 3. See the above analysis and rationale. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over ByteTrack: Multi-Object Tracking by Associating Every Detection Box to Zhang et al., hereinafter, “Zhang” in view of US 10936902 B1 to Bagwell et al., hereinafter, “Bagwell”.
Claim 9. Zhang fails to explicitly teach to cause the application to perform one more actions based on the plurality of object tracks. Bagwell, in the field of using bounding box for object detection in image data, teaches further teaches wherein the memory further includes instructions configured to cause the processor alone or in combination with other processors to perform operations of: providing the plurality of object tracks to an application to cause the application to perform one more actions based on the plurality of object tracks. [col. 1, line 67- col. 2 line 2] The output bounding box may be used to track an object (e.g., update a state of an object tracker), generate a trajectory, or otherwise control the vehicle. 

Zhang [3. BYTE] The output of each individual frame is the bounding boxes and identities of the tracks T in the current frame.

Thus, before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of Zhang with the teachings of Bagwell [col. 2, line 4] to more accurately identify a bounding box to use for an object.




Allowable Subject Matter
Claims 4 and 13 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The innovation that makes claim 4 allowable is “determining that performing object detection on the frame would cause a frame rate of the object detection pipeline to fall below a threshold; and performing detection propagation to extrapolate object tracks for the plurality of objects from previously determined object tracks.
Claims 5-7 and 14-16 are allowed because they are dependents of claims 4 and 13, respectively. 
Claims 17-20 are allowed.
The innovation that makes claim 17 allowable is “responsive to determining that performing the object detection would not cause the frame rate to fall below the threshold, performing object detection and tracking comprising: performing a primary matching operation on high confidence detection objects to determine first object tracks of the high confidence detection objects by associating the high confidence detection objects with an object track of a plurality of object tracks, the high confidence detection objects being objects from the plurality of objects associated with a confidence score that satisfies a confidence threshold, the first object tracks tracking the high confidence detection objects across the plurality of frames, and performing a secondary matching operation on low confidence detection objects to determine second object tracks of the low confidence detection objects by associating the low confidence detection objects with an object track of the plurality of object tracks, the low confidence detection objects being objects from the plurality of objects associated with a confidence score that does not satisfy the confidence threshold, the second object tracks tracking the low confidence detection objects across the plurality of frames; responsive to determining that performing the object detection would cause the frame rate to fall below the threshold, performing detection propagation to extrapolate object tracks for the plurality of objects from previously determined object tracks to extend the first object tracks and the second object tracks; and outputting, from the object detection pipeline, the first object tracks and the second object tracks.”
Likewise claims 18-20 are allowed because they are dependents of claim 17.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 2024/0312027 A1 to Patsekin et al.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DELOMIA L GILLIARD whose telephone number is (571)272-1681. The examiner can normally be reached 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Villecco can be reached at (571) 272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DELOMIA L GILLIARD/Primary Examiner, Art Unit 2661

Read full office action

Prosecution Timeline

Apr 23, 2024

Application Filed

Mar 07, 2026

Non-Final Rejection — §102, §103

Apr 22, 2026

Interview Requested

Apr 28, 2026

Applicant Interview (Telephonic)

May 02, 2026

Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

18/315,597

Patent 12614296

METHOD AND APPARATUS WITH DEPTH INFORMATION ESTIMATION

2y 11m to grant Granted Apr 28, 2026

18/466,827

Patent 12608837

METHOD OF POSE TRACKING AND DEVICE USING THE SAME

2y 7m to grant Granted Apr 21, 2026

18/754,969

Patent 12610080

METHOD AND DEVICE FOR CODING/DECODING IMAGE USING INTRA PREDICTION

1y 9m to grant Granted Apr 21, 2026

18/128,934

Patent 12602805

DATA TRANSMISSION THROTTLING AND DATA QUALITY UPDATING FOR A SLAM DEVICE

3y 0m to grant Granted Apr 14, 2026

18/461,675

Patent 12602932

SYSTEMS AND METHODS FOR MONITORING USERS EXITING A VEHICLE

2y 7m to grant Granted Apr 14, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

90%

Grant Probability

99%

With Interview (+10.2%)

1y 12m (~0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 1091 resolved cases by this examiner. Grant probability derived from career allowance rate.