Last updated: April 19, 2026

Application No. 18/428,475

SYSTEMS AND METHODS FOR DETECTING AND TRACKING OBJECTS INCORPORATING LEARNED SIMILARITY

Final Rejection §103

Filed

Jan 31, 2024

Examiner

ISLAM, MEHRAZUL NMN

Art Unit

2662

Tech Center

2600 — Communications

Assignee

Toyota Jidosha Kabushiki Kaisha

OA Round

2 (Final)

Interview Optional

— +28.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 50 resolved cases, 2023–2026

Examiner Intelligence

ISLAM, MEHRAZUL NMN View full profile →

Grants 58% of resolved cases

Career Allow Rate

29 granted / 50 resolved

-4.0% vs TC avg

Strong +28% interview lift

Without

With

+28.3%

Interview Lift

resolved cases with interview

Typical timeline

3y 4m

Avg Prosecution

46 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

9.2%

-30.8% vs TC avg

§103

68.6%

+28.6% vs TC avg

§102

4.1%

-35.9% vs TC avg

§112

15.2%

-24.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 50 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Applicant’s response to the Non-final Office Action dated 12/31/2025, filed with the office on 01/30/2026, has been entered and made of record.

Status of Claims
Claims 1-20 are pending.  

Response to Arguments
Applicant's arguments filed on January 30, 2026 with respect to rejection of claims under 35 U.S.C. 103 has been fully considered; but they are not found persuasive.  Specifically, in page 9 of its reply, Applicant argues in second paragraph that independent claim 14 recites two distinct, tandem stages of feature extraction while the cited prior art reference Chen disclose only a single feature-extraction operation.  Examiner respectfully disagrees.  The plain interpretation of the two extraction stages in independent claim 14 includes: a. extracting first features captured from a scene to generate a first set of BEV images; and b. extracting second features from the first set of BEV images using a 3D backbone to generate second set of BEV images.  The cited prior art of record Chen discloses these two stages in ¶0007: “environmental state is detected based on a bird's eye view (BEV) map” and ¶0055: “extract the spatio-temporal features from the BEV maps”.  Therefore, the first stage uses sensor captured environmental state data to generate a BEV map and the second stage analyses the BEV map to extract features using a 3D backbone—¶0017: “The MotionNet system includes three parts: (1) data representation from raw 3D point clouds to BEV maps; (2) spatio-temporal pyramid network as a backbone; and (3) task-specific heads for grid cell classification and motion prediction”.  Therefore, Applicant’s arguments are not found persuasive.

Applicant further argues in page 9, sixth paragraph that the cited combination of prior art references does not disclose training a 3D detection head with a similarity objective (e.g., a cosine-similarity loss) for object tracking.  Examiner respectfully disagrees.  Training with a similarity objective (e.g., a cosine-similarity loss) is recited in claims 5, 12 and 18, and mapped to the disclosure in the cited prior art reference Ji.     Ji et al. (US 2025/0200751 A1, filed on 2023-12-15) teaches in ¶0065: “training system trains the cloud point processing neural network to optimize a loss. The loss can be a cosine similarity loss that measures the similarity between the target pointwise features 318 and the pointwise features 322”.  Therefore, Applicant’s arguments are not found persuasive.
Consequently, THIS ACTION IS MADE FINAL.	

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 2, 4, 6-9, 11, 13-15, 17, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 2021/0302992 A1), in view of Lee et al. (US 2021/0358296 A1). 

Regarding claim 1, Chen teaches, A system for (Chen, ¶0019: “a control system for controlling a motion of a vehicle”) detecting and tracking objects, (Chen, ¶0062: “detected bounding boxes are then fed into an object tracker to identify the one or more objects”) the system comprising: a processor; (Chen, ¶0039: “control system 100 includes an image processor”) and a memory storing machine-readable instructions (Chen, ¶0042: “The control system 100 includes a memory 108 that stores instructions”) that, when executed by the processor, cause the processor to: (Chen, ¶0042: “controller 104 may be configured to execute the stored instructions in order to control operations”) extract first features from time-sequential perceptual sensor data to generate (Chen, ¶0094: “multi-head neural network 110 executes feature extraction operation”) a first set of bird’s-eye-view (BEV) feature images; (Chen, ¶0007: “environmental state is detected based on a bird's eye view (BEV) map”) extract second features from the first set of BEV feature images (Chen, ¶0055: “extract the spatio-temporal features from the BEV maps”) using a three-dimensional (3D) detection backbone (Chen, ¶0017: “system includes three parts: (1) data representation from raw 3D point clouds to BEV maps; (2) spatio-temporal pyramid network as a backbone”) to generate a second set of BEV feature images, (Chen, ¶0104: “generates the extended BEV image”) wherein each BEV feature image in the second set of BEV feature images corresponds to a distinct time step in the time-sequential perceptual sensor data; (Chen, ¶0104: “extended BEV image includes…  a position of the pixel in the extended BEV image at a current time step… a time sequence of future positions of the pixel in subsequent time steps”) and consume the second set of BEV feature images (Chen, ¶0018: “the outputs of the three heads are provided to a motion planner”) using a neural-network 3D detection head that is trained (Chen, ¶0013: “The entire multi-head neural network is trained in an end-to-end manner”) (Chen, ¶0018: “vehicle receives the motion trajectory and controls the motion of the vehicle”; ¶0038: “The vehicle 116 may be an autonomous vehicle or a semi-autonomous vehicle”).  However, Chen does not explicitly teach, trained with a similarity objective and generating automatically labeled perception data to train one or more of an online perception model, an online prediction model, and an online planning model used to control an autonomous robot.   

In an analogous field of endeavor, Lee teaches, trained with a similarity objective (Lee, ¶0107: “two birds-eye view images 531, 532 are aggregated to train classifiers for the features. Aggregator 552 may be configured to group similar features”) and generating automatically labeled perception data to train one or more of an online perception model, (Lee, ¶0102: “The training data for the model may be autonomously labelled”) an online prediction model, and an online planning model (Lee, ¶0008: “Online versions typically model this state…. with a recurrent network trained by self-supervised labeling to predict future states”) used to control an autonomous robot. (Lee, ¶0040: “The systems and methods disclosed herein may be implemented for use in scene flow estimation for robotics, autonomous vehicles and other automated technologies”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Chen using the teachings of Lee to introduce generating automatic labels for training a classification/detection model.  A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of automatically classifying/detecting an object across frames based on learned similarity.  Therefore, it would have been obvious to combine the analogous arts Chen and Lee to obtain the invention in claim 1.  

Regarding claim 2, Chen in view of Lee teaches, The system of claim 1, wherein, in connection with generating the automatically labeled perception data, the 3D detection backbone, in processing the first set of BEV feature images in an offline processing environment, (Chen, ¶0053: “image processor 106 generates the BEV maps from the 3D point cloud frames by executing conventional image processing operations such as PointNet, and the like”) performs feature-level temporal aggregation (Chen, ¶0054: “points for static background are aggregated at a time of determining clues on motions of moving objects in the environment”) that includes both forward recurrence and backward recurrence to generate the second set of BEV feature images (Lee, ¶0119: “system can be configured to perform a check of the consistency between the forward and backward flows”) and each BEV feature image in the second set of BEV feature images incorporates information (Lee, ¶0109: “aggregator 552 may include some or all features directly from two birds-eye view images 531, 532 having BeV embeddings such that pillar features from the two (or more) BeV images are aggregated”) from all time steps in the time-sequential perceptual sensor data. (Lee, ¶0105: “two birds-eye view images 531, 532… one representing the first point cloud (e.g., the point cloud at time t−1) and one representing the second point cloud (e.g., the point cloud at time t”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Chen in view of Lee using the additional teachings of Lee to introduce feature-level temporal aggregation in both forward and backward occurrences.  A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of tracking through a sequence of images.  Therefore, it would have been obvious to combine the analogous arts Chen and Lee to obtain the invention in claim 2.  

Regarding claim 4, Chen in view of Lee teaches, The system of claim 1, wherein, in connection with controlling the autonomous robot in an online processing environment of the autonomous robot, (Chen, ¶0038: “Through the network 126, either wirelessly or through wires, the control system 100 may receive the input data”) the 3D detection backbone, in processing the first set of BEV feature images, performs feature-level temporal aggregation that includes forward recurrence to generate the second set of BEV feature images. (Chen, ¶0040: “a pixel is associated with a time sequence of future positions of the pixel in subsequent time steps representing a prediction of a future motion of the object. The image processor 106 determines the time sequence of future positions of at least some pixel based on the outputs from the motion prediction head”).

Regarding claim 6, Chen in view of Lee teaches, The system of claim 1, wherein the time-sequential perceptual sensor data includes one or more of camera images, Light Detection and Ranging (LIDAR) data, radar data, sonar data, map data, and audio data. (Chen, ¶0050: “a plurality of sensors on the vehicle 116 such as a light detection and ranging (LiDAR) sensor, a radio detection and ranging (RADAR) sensor, a camera, and the like”).

Regarding claim 7, Chen in view of Lee teaches, The system of claim 1, wherein the autonomous robot is one of an autonomous vehicle, a search and rescue robot, a delivery robot, an aerial drone, and an indoor robot. (Chen, ¶0097: “the vehicle 500 can be an autonomous vehicle or a semi-autonomous vehicle”).

Regarding claim 8, it recites a computer-readable medium including instructions corresponding to the elements of the system recited in claim 1.  Therefore, the recited instructions of the computer-readable medium of claim 8 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 1.  Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim.  In addition, Chen teaches, A non-transitory computer-readable medium for detecting and tracking objects and storing instructions that, when executed by a processor, cause the processor to: (Chen, ¶0109: “the program code or code segments to perform the necessary tasks may be stored in a machine readable medium. A processor(s) may perform the necessary tasks”).

Regarding claim 9, it recites a computer-readable medium including instructions corresponding to the elements of the system recited in claim 2.  Therefore, the recited instructions of the computer-readable medium of claim 9 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 2.  Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 2, apply to this claim.  

Regarding claim 11, it recites a computer-readable medium including instructions corresponding to the elements of the system recited in claim 4.  Therefore, the recited instructions of the computer-readable medium of claim 11 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 4.  Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim.  

Regarding claim 13, it recites a computer-readable medium including instructions corresponding to the elements of the system recited in claim 7.  Therefore, the recited instructions of the computer-readable medium of claim 13 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 7.  Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim.  

Regarding claim 14, it recites a method with steps corresponding to the elements of the system recited in claim 1.  Therefore, the recited steps of the method claim 14 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 1.  Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim.  In addition, Chen teaches, A method (Chen, ¶00111: “Embodiments of the present disclosure may be embodied as a method”).

Regarding claim 15, it recites a method with steps corresponding to the elements of the system recited in claim 2.  Therefore, the recited steps of the method claim 15 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 2.  Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 2, apply to this claim.  

Regarding claim 17, it recites a method with steps corresponding to the elements of the system recited in claim 4.  Therefore, the recited steps of the method claim 17 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 4.  Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim.  

Regarding claim 19, it recites a method with steps corresponding to the elements of the system recited in claim 6.  Therefore, the recited steps of the method claim 19 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 6.  Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim.  

Regarding claim 20, it recites a method with steps corresponding to the elements of the system recited in claim 7.  Therefore, the recited steps of the method claim 20 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 7.  Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim.  

Claims 3, 10 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 2021/0302992 A1), in view of Lee et al. (US 2021/0358296 A1) and in further view of Park et al. (US 2024/0020953 A1).

Regarding claim 3, Chen in view of Lee teaches, The system of claim 2, wherein the machine-readable instructions include further instructions that, when executed by the processor, cause the processor to.  However, the combination of Chen and Lee does not explicitly teach, improve robustness of the object tracker by applying global association to object comparisons output by the 3D detector head. 

In an analogous field of endeavor, Park teaches, improve robustness of the object tracker by applying global association to object comparisons output by the 3D detector head. (Park, ¶0021: “The MLP network encodes global contextual information with respect to the region—providing for accurate transformation when objects appear at different heights in the view”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Chen in view of Lee using the teachings of Park to introduce a global contextual information.  A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of improving the robustness of the object tracker.  Therefore, it would have been obvious to combine the analogous arts Chen, Lee and Park to obtain the invention in claim 3.  

Regarding claim 10, it recites a computer-readable medium including instructions corresponding to the elements of the system recited in claim 3.  Therefore, the recited instructions of the computer-readable medium of claim 10 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 3.  Additionally, the rationale and motivation to combine Chen, Lee and Park presented in rejection of claim 3, apply to this claim.  
Regarding claim 16, it recites a method with steps corresponding to the elements of the system recited in claim 3.  Therefore, the recited steps of the method claim 16 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 3.  Additionally, the rationale and motivation to combine Chen, Lee and Park presented in rejection of claim 3, apply to this claim.  

Claims 5, 12 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 2021/0302992 A1), in view of Lee et al. (US 2021/0358296 A1) and in further view of Ji et al. (US 2025/0200751 A1).

Regarding claim 5, Chen in view of Lee teaches, The system of claim 1.  However, the combination of Chen and Lee does not explicitly teach, wherein the similarity objective includes a cosine-similarity loss.

In an analogous field of endeavor, Ji teaches, wherein the similarity objective includes a cosine-similarity loss. (Ji, ¶0065: “training system trains the cloud point processing neural network to optimize a loss. The loss can be a cosine similarity loss that measures the similarity between the target pointwise features 318 and the pointwise features 322”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Chen in view of Lee using the teachings of Ji to introduce a cosine similarity loss.  A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of improving the detection accuracy of the object tracker.  Therefore, it would have been obvious to combine the analogous arts Chen, Lee and Ji to obtain the invention in claim 5.  

Regarding claim 12, it recites a computer-readable medium including instructions corresponding to the elements of the system recited in claim 5.  Therefore, the recited instructions of the computer-readable medium of claim 12 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 5.  Additionally, the rationale and motivation to combine Chen, Lee and Ji presented in rejection of claim 5, apply to this claim.  

Regarding claim 18, it recites a method with steps corresponding to the elements of the system recited in claim 5.  Therefore, the recited steps of the method claim 18 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 5.  Additionally, the rationale and motivation to combine Chen, Lee and Ji presented in rejection of claim 5, apply to this claim.  

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MEHRAZUL ISLAM whose telephone number is (571)270-0489. The examiner can normally be reached Monday-Friday: 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Saini Amandeep can be reached on (571) 272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MEHRAZUL ISLAM/Examiner, Art Unit 2662                                                                                                                                                                                                        

/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662

Read full office action

Prosecution Timeline

Jan 31, 2024

Application Filed

Dec 23, 2025

Non-Final Rejection — §103

Dec 29, 2025

Applicant Interview (Telephonic)

Jan 13, 2026

Interview Requested

Jan 29, 2026

Examiner Interview Summary

Jan 30, 2026

Response Filed

Feb 28, 2026

Final Rejection — §103

Mar 18, 2026

Interview Requested

Apr 15, 2026

Applicant Interview (Telephonic)

Apr 15, 2026

Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

17/373,920

Patent 12602808

METHOD FOR INSPECTING AN OBJECT

2y 5m to grant Granted Apr 14, 2026

17/956,175

Patent 12592075

REMOTE SENSING FOR INTELLIGENT VEGETATION TRIM PREDICTION

2y 5m to grant Granted Mar 31, 2026

18/092,195

Patent 12579695

Method of Generating Target Image Data, Electrical Device and Non-Transitory Computer Readable Medium

2y 5m to grant Granted Mar 17, 2026

18/276,381

Patent 12524900

METHOD FOR IMPROVING ESTIMATION OF LEAF AREA INDEX IN EARLY GROWTH STAGE OF WHEAT BASED ON RED-EDGE BAND OF SENTINEL-2 SATELLITE IMAGE

2y 5m to grant Granted Jan 13, 2026

17/386,003

Patent 12489964

PATH PLANNING

2y 5m to grant Granted Dec 02, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

58%

Grant Probability

86%

With Interview (+28.3%)

3y 4m

Median Time to Grant

Moderate

PTA Risk

Based on 50 resolved cases by this examiner. Grant probability derived from career allow rate.