Office Action Analysis: 18600449 — STATEFUL AND END-TO-END MULTI-OBJECT TRACKING

Examiner Intelligence

BOYAR, NOAH WILLIAM View full profile →
Grants only 0% of cases
Career Allow Rate
0 granted / 0 resolved
-62.0% vs TC avg
Minimal +0% lift
Without
With
+0.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
5 currently pending
Career history
5
Total Applications
across all art units
Statute-Specific Performance

§101
13.3%
-26.7% vs TC avg
§103
46.7%
+6.7% vs TC avg
§102
26.7%
-13.3% vs TC avg
§112
13.3%
-26.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 0 resolved cases
Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The claims will be read under the broadest reasonable interpretation standard outlined in
MPEP § 2111.01.
	In line with paragraph 73 of the claimed invention’s specification, the examiner interprets “geometry features” as recited by claim 3 to include sizes and locations of bounding boxes. 
Claim Objections
	Claim 2 and 12 are objected to because of the following informalities: the phrase “3D objection detection model” should read “3D object detection model”.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1 and 3-9 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hung et. al “SoDA: Multi-Object Tracking with Soft Data Association” (Hereinafter, “Hung”, cited by Applicant). 
As to claim 1, Hung discloses a method comprising:

    PNG
    media_image1.png
    889
    482
    media_image1.png
    Greyscale
receiving, at a current time step, a set of new object detections, each new object detection being data characterizing features of a respective object that has been detected in an environment at the current time step ([3.1]): 

    PNG
    media_image2.png
    665
    393
    media_image2.png
    Greyscale

    PNG
    media_image3.png
    371
    471
    media_image3.png
    Greyscale
maintaining data that identifies one or more object tracks ([3.2]):

    PNG
    media_image4.png
    431
    632
    media_image4.png
    Greyscale

for each object track, selecting a subset of the new object detections as candidate object detections for the object track ([3.2]; [1]):

    PNG
    media_image5.png
    330
    572
    media_image5.png
    Greyscale

    PNG
    media_image6.png
    366
    570
    media_image6.png
    Greyscale

for each object track, processing an input derived from the candidate object detections for the object track and a track query feature representation for the object track using a track-detection interaction neural network to generate a respective association score for each candidate object detection; and ([3]; [A]):

    PNG
    media_image7.png
    329
    455
    media_image7.png
    Greyscale

    PNG
    media_image8.png
    505
    583
    media_image8.png
    Greyscale

    PNG
    media_image9.png
    689
    532
    media_image9.png
    Greyscale

    PNG
    media_image10.png
    633
    347
    media_image10.png
    Greyscale

    PNG
    media_image11.png
    585
    469
    media_image11.png
    Greyscale

    PNG
    media_image12.png
    124
    574
    media_image12.png
    Greyscale

    PNG
    media_image10.png
    633
    347
    media_image10.png
    Greyscale
determining, for each of the one or more object tracks, whether to associate any of the new object detections with the object track based on the respective association scores for the candidate object detections for the object tracks ([3]):

    PNG
    media_image13.png
    650
    458
    media_image13.png
    Greyscale

    PNG
    media_image12.png
    124
    574
    media_image12.png
    Greyscale

As to claim 3, Hung discloses the method of claim 1, wherein the features comprise geometry and appearance features of the respective object ([B]; [3.1]); 

    PNG
    media_image14.png
    509
    571
    media_image14.png
    Greyscale

    PNG
    media_image15.png
    335
    483
    media_image15.png
    Greyscale

As to claim 4, Hung discloses the method of claim 1, wherein determining, for each of the one or more object tracks, whether to associate any of the new object detections with the object track based on the respective association scores for the candidate object detections for the object tracks comprises: 

    PNG
    media_image16.png
    818
    574
    media_image16.png
    Greyscale

    PNG
    media_image17.png
    420
    576
    media_image17.png
    Greyscale
applying a Hungarian algorithm to the respective association scores for the candidate object detections for the object tracks to assign each new object detection to one of the object tracks or to a new object track ([3.2]; [3.3]):

As to claim 5, Hung discloses the method of claim 4, further comprising:
	in response to determining to assign a given new object detection to a new object track, adding the new object track to the maintained data ([3.3]; [3.2]):

    PNG
    media_image3.png
    371
    471
    media_image3.png
    Greyscale

    PNG
    media_image18.png
    600
    693
    media_image18.png
    Greyscale

As to claim 6, Hung discloses the method of claim 1, further comprising:
	processing each new object detection using a detection encoder to generate an embedding of the new object detection ([3.1]):

    PNG
    media_image11.png
    585
    469
    media_image11.png
    Greyscale

    PNG
    media_image19.png
    471
    469
    media_image19.png
    Greyscale

    PNG
    media_image20.png
    689
    478
    media_image20.png
    Greyscale

    PNG
    media_image21.png
    337
    469
    media_image21.png
    Greyscale

    PNG
    media_image14.png
    509
    571
    media_image14.png
    Greyscale

    PNG
    media_image22.png
    675
    501
    media_image22.png
    Greyscale

    PNG
    media_image23.png
    366
    471
    media_image23.png
    Greyscale
	As to claim 7, Hung discloses the method of claim 6, wherein the input derived from the candidate object detections for the object track and the track query feature representation for the object track comprises the embeddings of the candidate object detections for the object track and the track query feature representation for the object track (Fig. 6; [3]):

    PNG
    media_image24.png
    589
    485
    media_image24.png
    Greyscale

    PNG
    media_image25.png
    121
    478
    media_image25.png
    Greyscale

    PNG
    media_image26.png
    90
    468
    media_image26.png
    Greyscale

(As stated in paragraph 56 of the claimed invention’s specification, track query representation includes a numerical representation of the object’s position).

As to claim 8, Hung further discloses the method of claim 6, further comprising:
	generating a new query feature for each object track by processing an input comprising respective embeddings of detections that have been associated with the object track using a temporal fusion neural network ([2]; [1]; [3]; Fig. 3; Fig. 6):

    PNG
    media_image27.png
    664
    501
    media_image27.png
    Greyscale

    PNG
    media_image28.png
    613
    685
    media_image28.png
    Greyscale

    PNG
    media_image29.png
    480
    1423
    media_image29.png
    Greyscale

    PNG
    media_image30.png
    343
    339
    media_image30.png
    Greyscale

    PNG
    media_image31.png
    286
    345
    media_image31.png
    Greyscale

	As to claim 9, Hung discloses the method of claim 1, further comprising: 
	processing the feature representation of each object track using a track state decoder neural network to generate a predicted state of the object track at the current time point ([3.1]; 
[3.2]; Fig. 3):

    PNG
    media_image23.png
    366
    471
    media_image23.png
    Greyscale

    PNG
    media_image32.png
    169
    464
    media_image32.png
    Greyscale

    PNG
    media_image33.png
    144
    469
    media_image33.png
    Greyscale

    PNG
    media_image34.png
    266
    485
    media_image34.png
    Greyscale

(This claim is read in line with paragraph 61 of the claimed invention’s specification, which states that a state decoder can be a feed forward neural network trained by optimizing a prediction loss for the initial state predictions).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Hung in view of Caesar et. al “nuScenes: A multimodal dataset for autonomous driving (Hereinafter, “Caesar”, cited by Applicant).
With respect to claim 2, Hung teaches the method of claim 1 upon which claim 2 depends, Hung does not explicitly teach the method of claim 1, further comprising: receiving a laser sensor spin at the current time step; and applying a 3d objection detection model to the laser sensor spin to generate the set of one or more new object detections.
However, Caesar, in the same field of endeavor of object detection in autonomous vehicles, teaches the same ([2.1]; [3]; [4.1]):

    PNG
    media_image35.png
    160
    421
    media_image35.png
    Greyscale

    PNG
    media_image36.png
    168
    344
    media_image36.png
    Greyscale

    PNG
    media_image37.png
    186
    397
    media_image37.png
    Greyscale

    PNG
    media_image38.png
    424
    683
    media_image38.png
    Greyscale

(Bottom: PointPillars model as incorporated by Caesar). 
	It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to modify Hung to include the elements of lidar and 3d object detection modeling of Caesar. The motivation for doing so would be to implement a means to collect data in real-time, and construct a 3d model of surrounding objects in space for further analysis. Lidar and 3d object detection modeling is readily integrable into the system of Hung with predictable success, as Hung requires an undisclosed source of input data for its analysis. 
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Hung in view of Sun et. al “TransTrack: Multiple Object Tracking with Transformer” (Hereinafter, “Sun”, cited by Applicant).
With respect to claim 10, Hung teaches the elements of claims 1 and 9 upon which claim 10 depends. 
It is the examiner’s position that Hung further discloses the additional elements of claim 10 as well: the method of claim 9, further comprising:
for each object track, using the predicted state of the object track to select the candidate detections for the object track ([3.1]):

    PNG
    media_image39.png
    469
    469
    media_image39.png
    Greyscale

    PNG
    media_image40.png
    902
    469
    media_image40.png
    Greyscale

	Notwithstanding the above, Sun, in the same field endeavor of object tracking, also teaches the further limitation of claim 10 ([3.1]):

    PNG
    media_image41.png
    517
    474
    media_image41.png
    Greyscale

	It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to modify Hung to include the elements of object prediction as taught by Sun. Such a combination would allow for predictive updating of the detection boxes using information already collected by Hung, for the advantage of better stabilizing outputted results across frames. Like Hung, Sun also features a similar system of detection boxing and encoding/decoding, further facilitating the transfer of teachings.  
Claims 11-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hung in view of Narayanan et. al (US 20200086879 A1) (Hereinafter, “Narayanan”).
With respect to claim 11, it is functionally identical to claim 1, with the exception that the method is claimed “as a system comprising:
one or more computers; and
one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising:”
These limitations are not expressly taught by Hung. However, Narayanan, in the same field of endeavor of object detection and autonomous driving, provides for the same ([181]; [0043]).
It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to modify Hung to include the elements of computer implementation as taught by Narayanan. While not explicitly disclosed, Hung implies that a computer is used for the implementation of its methods ([4.1.1]):

    PNG
    media_image42.png
    314
    470
    media_image42.png
    Greyscale

	A person of ordinary skill in the art would understand that computers and code are capable of performing the method of Hung, and would be motivated to combine Hung with Narayanan for increased execution efficiency.
	With respect to claims 12-19, they are functionally identical to claims 2-9, with the exception of the underlying computer system of claim 11 (by which they depend). For the reasons discussed in the rejection of claim 11, it would have been obvious to combine Narayanan with Hung. The additional limitations do not dissuade from such a combination. 
	With respect to claim 20, it is functionally identical to claim 1, with the exception that the method is claimed as “one or more computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:”
	These limitations are not expressly taught by Hung. However, Narayanan, in the same field of endeavor of object detection and autonomous driving, provides for the same ([0043]).
	It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to modify Hung to include the elements of code taught by Naranayan. While not explicitly disclosed, Hung implies that a computer and code is used for the implementation of its methods ([4.1.1]).
	A person of ordinary skill in the art would understand that computers and code are capable of performing the method of Hung, and would be motivated to combine Hung with Narayanan for increased execution efficiency.
Additional References
Additionally cited references (see attached PTO-892) otherwise not relied upon above have been made of record in view of the manner in which they evidence the general state of the art.

Inquiry
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NOAH WILLIAM BOYAR whose telephone number is (571)272-8392. The examiner can normally be reached 8:30 – 5:00 EST, Monday – Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached at 571-272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/NOAH W BOYAR/Examiner, Art Unit 2669
/CHAN S PARK/Supervisory Patent Examiner, Art Unit 2669
Read full office action
Prosecution Timeline

Mar 08, 2024
Application Filed
Mar 03, 2026
Non-Final Rejection — §102, §103 (current)
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

1-2
Expected OA Rounds
Grant Probability
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 0 resolved cases by this examiner. Grant probability derived from career allow rate.
STATEFUL AND END-TO-END MULTI-OBJECT TRACKING