DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant’s response to the Non-final Office Action dated 12/31/2025, filed with the office on 01/30/2026, has been entered and made of record.
Status of Claims
Claims 1-20 are pending.
Response to Arguments
Applicant's arguments filed on January 30, 2026 with respect to rejection of claims under 35 U.S.C. 103 has been fully considered; but they are not found persuasive. Specifically, in page 9 of its reply, Applicant argues in second paragraph that independent claim 14 recites two distinct, tandem stages of feature extraction while the cited prior art reference Chen disclose only a single feature-extraction operation. Examiner respectfully disagrees. The plain interpretation of the two extraction stages in independent claim 14 includes: a. extracting first features captured from a scene to generate a first set of BEV images; and b. extracting second features from the first set of BEV images using a 3D backbone to generate second set of BEV images. The cited prior art of record Chen discloses these two stages in ¶0007: “environmental state is detected based on a bird's eye view (BEV) map” and ¶0055: “extract the spatio-temporal features from the BEV maps”. Therefore, the first stage uses sensor captured environmental state data to generate a BEV map and the second stage analyses the BEV map to extract features using a 3D backbone—¶0017: “The MotionNet system includes three parts: (1) data representation from raw 3D point clouds to BEV maps; (2) spatio-temporal pyramid network as a backbone; and (3) task-specific heads for grid cell classification and motion prediction”. Therefore, Applicant’s arguments are not found persuasive.
Applicant further argues in page 9, sixth paragraph that the cited combination of prior art references does not disclose training a 3D detection head with a similarity objective (e.g., a cosine-similarity loss) for object tracking. Examiner respectfully disagrees. Training with a similarity objective (e.g., a cosine-similarity loss) is recited in claims 5, 12 and 18, and mapped to the disclosure in the cited prior art reference Ji. Ji et al. (US 2025/0200751 A1, filed on 2023-12-15) teaches in ¶0065: “training system trains the cloud point processing neural network to optimize a loss. The loss can be a cosine similarity loss that measures the similarity between the target pointwise features 318 and the pointwise features 322”. Therefore, Applicant’s arguments are not found persuasive.
Consequently, THIS ACTION IS MADE FINAL.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 2, 4, 6-9, 11, 13-15, 17, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 2021/0302992 A1), in view of Lee et al. (US 2021/0358296 A1).
Regarding claim 1, Chen teaches, A system for (Chen, ¶0019: “a control system for controlling a motion of a vehicle”) detecting and tracking objects, (Chen, ¶0062: “detected bounding boxes are then fed into an object tracker to identify the one or more objects”) the system comprising: a processor; (Chen, ¶0039: “control system 100 includes an image processor”) and a memory storing machine-readable instructions (Chen, ¶0042: “The control system 100 includes a memory 108 that stores instructions”) that, when executed by the processor, cause the processor to: (Chen, ¶0042: “controller 104 may be configured to execute the stored instructions in order to control operations”) extract first features from time-sequential perceptual sensor data to generate (Chen, ¶0094: “multi-head neural network 110 executes feature extraction operation”) a first set of bird’s-eye-view (BEV) feature images; (Chen, ¶0007: “environmental state is detected based on a bird's eye view (BEV) map”) extract second features from the first set of BEV feature images (Chen, ¶0055: “extract the spatio-temporal features from the BEV maps”) using a three-dimensional (3D) detection backbone (Chen, ¶0017: “system includes three parts: (1) data representation from raw 3D point clouds to BEV maps; (2) spatio-temporal pyramid network as a backbone”) to generate a second set of BEV feature images, (Chen, ¶0104: “generates the extended BEV image”) wherein each BEV feature image in the second set of BEV feature images corresponds to a distinct time step in the time-sequential perceptual sensor data; (Chen, ¶0104: “extended BEV image includes… a position of the pixel in the extended BEV image at a current time step… a time sequence of future positions of the pixel in subsequent time steps”) and consume the second set of BEV feature images (Chen, ¶0018: “the outputs of the three heads are provided to a motion planner”) using a neural-network 3D detection head that is trained (Chen, ¶0013: “The entire multi-head neural network is trained in an end-to-end manner”) (Chen, ¶0018: “vehicle receives the motion trajectory and controls the motion of the vehicle”; ¶0038: “The vehicle 116 may be an autonomous vehicle or a semi-autonomous vehicle”). However, Chen does not explicitly teach, trained with a similarity objective and generating automatically labeled perception data to train one or more of an online perception model, an online prediction model, and an online planning model used to control an autonomous robot.
In an analogous field of endeavor, Lee teaches, trained with a similarity objective (Lee, ¶0107: “two birds-eye view images 531, 532 are aggregated to train classifiers for the features. Aggregator 552 may be configured to group similar features”) and generating automatically labeled perception data to train one or more of an online perception model, (Lee, ¶0102: “The training data for the model may be autonomously labelled”) an online prediction model, and an online planning model (Lee, ¶0008: “Online versions typically model this state…. with a recurrent network trained by self-supervised labeling to predict future states”) used to control an autonomous robot. (Lee, ¶0040: “The systems and methods disclosed herein may be implemented for use in scene flow estimation for robotics, autonomous vehicles and other automated technologies”).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Chen using the teachings of Lee to introduce generating automatic labels for training a classification/detection model. A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of automatically classifying/detecting an object across frames based on learned similarity. Therefore, it would have been obvious to combine the analogous arts Chen and Lee to obtain the invention in claim 1.
Regarding claim 2, Chen in view of Lee teaches, The system of claim 1, wherein, in connection with generating the automatically labeled perception data, the 3D detection backbone, in processing the first set of BEV feature images in an offline processing environment, (Chen, ¶0053: “image processor 106 generates the BEV maps from the 3D point cloud frames by executing conventional image processing operations such as PointNet, and the like”) performs feature-level temporal aggregation (Chen, ¶0054: “points for static background are aggregated at a time of determining clues on motions of moving objects in the environment”) that includes both forward recurrence and backward recurrence to generate the second set of BEV feature images (Lee, ¶0119: “system can be configured to perform a check of the consistency between the forward and backward flows”) and each BEV feature image in the second set of BEV feature images incorporates information (Lee, ¶0109: “aggregator 552 may include some or all features directly from two birds-eye view images 531, 532 having BeV embeddings such that pillar features from the two (or more) BeV images are aggregated”) from all time steps in the time-sequential perceptual sensor data. (Lee, ¶0105: “two birds-eye view images 531, 532… one representing the first point cloud (e.g., the point cloud at time t−1) and one representing the second point cloud (e.g., the point cloud at time t”).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Chen in view of Lee using the additional teachings of Lee to introduce feature-level temporal aggregation in both forward and backward occurrences. A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of tracking through a sequence of images. Therefore, it would have been obvious to combine the analogous arts Chen and Lee to obtain the invention in claim 2.
Regarding claim 4, Chen in view of Lee teaches, The system of claim 1, wherein, in connection with controlling the autonomous robot in an online processing environment of the autonomous robot, (Chen, ¶0038: “Through the network 126, either wirelessly or through wires, the control system 100 may receive the input data”) the 3D detection backbone, in processing the first set of BEV feature images, performs feature-level temporal aggregation that includes forward recurrence to generate the second set of BEV feature images. (Chen, ¶0040: “a pixel is associated with a time sequence of future positions of the pixel in subsequent time steps representing a prediction of a future motion of the object. The image processor 106 determines the time sequence of future positions of at least some pixel based on the outputs from the motion prediction head”).
Regarding claim 6, Chen in view of Lee teaches, The system of claim 1, wherein the time-sequential perceptual sensor data includes one or more of camera images, Light Detection and Ranging (LIDAR) data, radar data, sonar data, map data, and audio data. (Chen, ¶0050: “a plurality of sensors on the vehicle 116 such as a light detection and ranging (LiDAR) sensor, a radio detection and ranging (RADAR) sensor, a camera, and the like”).
Regarding claim 7, Chen in view of Lee teaches, The system of claim 1, wherein the autonomous robot is one of an autonomous vehicle, a search and rescue robot, a delivery robot, an aerial drone, and an indoor robot. (Chen, ¶0097: “the vehicle 500 can be an autonomous vehicle or a semi-autonomous vehicle”).
Regarding claim 8, it recites a computer-readable medium including instructions corresponding to the elements of the system recited in claim 1. Therefore, the recited instructions of the computer-readable medium of claim 8 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 1. Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim. In addition, Chen teaches, A non-transitory computer-readable medium for detecting and tracking objects and storing instructions that, when executed by a processor, cause the processor to: (Chen, ¶0109: “the program code or code segments to perform the necessary tasks may be stored in a machine readable medium. A processor(s) may perform the necessary tasks”).
Regarding claim 9, it recites a computer-readable medium including instructions corresponding to the elements of the system recited in claim 2. Therefore, the recited instructions of the computer-readable medium of claim 9 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 2. Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 2, apply to this claim.
Regarding claim 11, it recites a computer-readable medium including instructions corresponding to the elements of the system recited in claim 4. Therefore, the recited instructions of the computer-readable medium of claim 11 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 4. Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim.
Regarding claim 13, it recites a computer-readable medium including instructions corresponding to the elements of the system recited in claim 7. Therefore, the recited instructions of the computer-readable medium of claim 13 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 7. Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim.
Regarding claim 14, it recites a method with steps corresponding to the elements of the system recited in claim 1. Therefore, the recited steps of the method claim 14 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 1. Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim. In addition, Chen teaches, A method (Chen, ¶00111: “Embodiments of the present disclosure may be embodied as a method”).
Regarding claim 15, it recites a method with steps corresponding to the elements of the system recited in claim 2. Therefore, the recited steps of the method claim 15 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 2. Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 2, apply to this claim.
Regarding claim 17, it recites a method with steps corresponding to the elements of the system recited in claim 4. Therefore, the recited steps of the method claim 17 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 4. Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim.
Regarding claim 19, it recites a method with steps corresponding to the elements of the system recited in claim 6. Therefore, the recited steps of the method claim 19 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 6. Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim.
Regarding claim 20, it recites a method with steps corresponding to the elements of the system recited in claim 7. Therefore, the recited steps of the method claim 20 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 7. Additionally, the rationale and motivation to combine Chen and Lee presented in rejection of claim 1, apply to this claim.
Claims 3, 10 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 2021/0302992 A1), in view of Lee et al. (US 2021/0358296 A1) and in further view of Park et al. (US 2024/0020953 A1).
Regarding claim 3, Chen in view of Lee teaches, The system of claim 2, wherein the machine-readable instructions include further instructions that, when executed by the processor, cause the processor to. However, the combination of Chen and Lee does not explicitly teach, improve robustness of the object tracker by applying global association to object comparisons output by the 3D detector head.
In an analogous field of endeavor, Park teaches, improve robustness of the object tracker by applying global association to object comparisons output by the 3D detector head. (Park, ¶0021: “The MLP network encodes global contextual information with respect to the region—providing for accurate transformation when objects appear at different heights in the view”).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Chen in view of Lee using the teachings of Park to introduce a global contextual information. A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of improving the robustness of the object tracker. Therefore, it would have been obvious to combine the analogous arts Chen, Lee and Park to obtain the invention in claim 3.
Regarding claim 10, it recites a computer-readable medium including instructions corresponding to the elements of the system recited in claim 3. Therefore, the recited instructions of the computer-readable medium of claim 10 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 3. Additionally, the rationale and motivation to combine Chen, Lee and Park presented in rejection of claim 3, apply to this claim.
Regarding claim 16, it recites a method with steps corresponding to the elements of the system recited in claim 3. Therefore, the recited steps of the method claim 16 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 3. Additionally, the rationale and motivation to combine Chen, Lee and Park presented in rejection of claim 3, apply to this claim.
Claims 5, 12 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 2021/0302992 A1), in view of Lee et al. (US 2021/0358296 A1) and in further view of Ji et al. (US 2025/0200751 A1).
Regarding claim 5, Chen in view of Lee teaches, The system of claim 1. However, the combination of Chen and Lee does not explicitly teach, wherein the similarity objective includes a cosine-similarity loss.
In an analogous field of endeavor, Ji teaches, wherein the similarity objective includes a cosine-similarity loss. (Ji, ¶0065: “training system trains the cloud point processing neural network to optimize a loss. The loss can be a cosine similarity loss that measures the similarity between the target pointwise features 318 and the pointwise features 322”).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Chen in view of Lee using the teachings of Ji to introduce a cosine similarity loss. A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of improving the detection accuracy of the object tracker. Therefore, it would have been obvious to combine the analogous arts Chen, Lee and Ji to obtain the invention in claim 5.
Regarding claim 12, it recites a computer-readable medium including instructions corresponding to the elements of the system recited in claim 5. Therefore, the recited instructions of the computer-readable medium of claim 12 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 5. Additionally, the rationale and motivation to combine Chen, Lee and Ji presented in rejection of claim 5, apply to this claim.
Regarding claim 18, it recites a method with steps corresponding to the elements of the system recited in claim 5. Therefore, the recited steps of the method claim 18 are mapped to the proposed combination in the same manner as the corresponding elements of the system claim 5. Additionally, the rationale and motivation to combine Chen, Lee and Ji presented in rejection of claim 5, apply to this claim.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MEHRAZUL ISLAM whose telephone number is (571)270-0489. The examiner can normally be reached Monday-Friday: 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Saini Amandeep can be reached on (571) 272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MEHRAZUL ISLAM/Examiner, Art Unit 2662
/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662